From: AAAI Technical Report WS-93-05. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Experiencesof using Verification Tools for Maintenanceof Rule-Based Systems. Mark A. Dahl Keith Williamson The Boeing Company P.O. Box 24346, MS73-43 Seattle, WA98124-0346 Abstract Thispaperreportson the use of tools that statically analyze rule-basesin order to detect logical errors, suchas redundancy, inconsistency, and incompleteness. The analysis uses a variant of the KB-Reduceralgorithm (Ginsberg1991).In contrast to accountsof similar tools, whichseldomdescribetools as usedin the field, the experience describedin this paperis derivedfromusing tools to analyzea large suite of knowledge bases in production use at Boeing. The tools serve to augmentthe ongoing process of knowledgebase maintenance.Common causes of errors detected are reported. This paper showsthat makingsuchtools practical is non-trivialbut rewarding. Introduction As observed in Preece and Shinghal, 1992, tools for finding anomalies in knowledgebases (KBs) have shown sufficient potential in KBverification and validation, to motivate researchers to build a series of tools over the past decade. Although Preece’s paper provides some data on the practical aspects of using such tools on KBs intended for productionuse, it also notes the general lack of such data. The goal of this paper is to provide additional experiential data for the benefit of potential tool builders and users. This paper describes experiences encountered using an anomaly detecting tool called KBR3,in maintaining a suite of knowledgebases that comprise a production application called Engineering Standards Distribution System (ESDS). ESDScurrently has 50 KBsin production which accumulatively contain tens of thousands of rules. Results of 7 KBsare covered. An overviewof the ESDSproject can be found in Dalai, 1993. An overview of the motivation for adopting a maintenance strategy which includes the KBR3tool is described in Dahl and Williamson, 1992. The rest of this paper is organized as follows. An overview of KBR3’sbasic functionality is given, followed by a description of error types detected by KBR3. Then the causes underlying someof these errors are reviewed. This is followed by issues involved in reporting errors, limitations of KBR3,and plans for future enhancement. 107 KBR3 Tool Overview The KBR3tool is based on the KB-Reducer algorithm originally developed by Ginsberg (see Ginsberg, 1991). For KBR3,the basic algorithm was enhanced to handle rule bases containing equations. This enhanced algorithm is formally described in Williamson and Dahl, 1993. Figure 1 provides a basic overview of the KBR3 analysis tools. A KBto be analyzed is first translated from its original source languageinto a application-neutral language tailored for KBR3.Then the "label" for each rule conclusion is computed. Each label is simply the set of input states (called "environments")that allow the conclusion to be deduced. These labels describe the KBfunctionality in terms of input/output pairs. Computingthem essentially "compiles out" all the deductive paths of the original KB(often referred to as the "extension" of the KB). The resulting labels are used by the "perform checks" modulein detecting the errors types enumerated in the next section. Finally, the "Report Interface" moduleallows tool users to integrate the results in the context of the KBsoriginal source language. ~ i i I KBR3h._l SourceKBs ’ lTranslat°rs ~1 r I Peff°rml I Checks I Errors~] Compute I Labels I ~Labels I" Report Interface I Figure 1 - KBR3tool architecture Context of tool usage To understand the observations madein this paper, it is important to knowthat the KBR3tools were used during KBmaintenance and usually after some amount of conventional KBtesting had been done. This conventional testing included syntax checking, checks for obvious constraint violations, and some manual testing of the maintenance changes. In some cases, more thorough testing wasdone, suchas regressiontesting or codeinspection. This paper focuses on KBmaintenancebecause the major use of the KBR3 tools have thus far been by KB maintainers. Besidestesting maintenancechanges, the tools havebeenhelpful in verifying a maintainer’sunderstandingof KBsoriginally developedby others (see Dalai and Williamson1992). The reason whysomeconventional testing was done before employingthe KBR3 tools wasdue to the large amountof time it takes to run the KBR3 analysis (see "Efficiency Limitations"). Thusthe KBR3 analysis was reservedfor findingthe moreelusive errors, suchas order dependenciesamongrules. Anincremental version of KBR3mayreduce these limitations (see "Future Directions"). Error Types There are eleven types of errors detected by KBR3.Of these, inconsistencyand redundancy are discussedin detail since they are the mostgeneral problemindicators and the mostchallengingfor users to deal with. Since the other error typesare relatively intuitive, this paper doesnot dwellon them.Formaldefinitions of all the error types can be foundin WilliamsonandDahl, 1993. Inconsistent rules Tworules are consideredinconsistentif they assert logically contradictory conclusions under the sameinput states. Rulesrl andr2 are inconsistentin the following example: section. In PreeceandShinghal,1992,it is notedthat the distinction betweenthese anomaliesandtheir causes"has often beenblurred in the literature". This blurring has promotedsomeinappropriate ideas on howto interpret errors andhowto automatetheir correction. For example,onepopularstrategy for error correction is to automatically removeall redundantrules. While this strategy preserves KBfunctionality, it could cover up any numberof problems. Suppose during maintenance, a under-qualified rule is introduced. This new rule mayunintentionallysubsume other valid rules, causing themto be redundant and candidates for removal! Thereverse strategy of removingthe mostgeneral rule can also backftre if the morespecific cases weremistakenly addedto a valid KBalready containing the general rule (of course, the general rule was never redundant anyway). Anotherexampleof this blurring is makingthe assumptionthat a logical inconsistencyfoundin a KBis due to an inconsistent mentalmodelon the part of the knowledgeengineer. For instance, in the followingexample, one strategy of automaticerror correction suggests the mostgeneralrule, r2, is at fault andshouldbe specialized.However, it is possiblethat the error is just a typo and rl’s antecedent should have been: NOTP AND Q. rl: IF P AND Q THEN R r2: IF P THEN NOT R Thenext two sections surveythe causes of inconsistencies and redundanciesuncoveredwith the KBR3 analysis tool. Causesof Inconsistency rl: IF p THEN q r2: IF s THEN NOT q r3: IF p THEN s Redundantrules Arule is consideredredundantif it can be removedfrom the knowledge base without effecting its functionality. This notion of redundancyis very general and has as special cases deadendrules, un-firable rules andsubsumedrules. Other error types Theother types of errors detected by KBR3 are: incompleteness,invalid conclusions,un-firablerules, rule subsumption,unusedvariables, un-computed variables, dead endrules, overlapping rules, andcircular reasoning. Errorsvs. Their Causes It is importantto remember that there are manypossible causesfor each of the error types discussedin the last 108 This section describesthe sevencausesof inconsistency that were uncoveredby using the KBR3tools on ESDS KBs:order-dependentrules, inconsistent KBspecifications, inputs correctly assumed to be unrealistic, inputs improperlyassumedunrealistic, under-qualified rules, overly complexrules with overlapping conditions, and typosor cut andpaste errors. Order-dependentrules Oneof the often cited benefits of rule-basedsystemis the modularity of rules. However,this modularity often breaksdownin practice due to unforeseenrule inter-actions. Onetype of rule inter-action that hinders maintenance occurs whenrules deducingvalues for the same attribute are written in an order-dependentfashion. As reported in Dahland Williamson,1992, order dependent rules are maintenancetraps since they leave knowledge implicit (e.g., it is not stated whichrules are dependent on whichothers and why). Unfortunately, it is difficult in practiceto preventthe introduction of order-dependentrules. This is because the inference engine that processes the roles has a fixed strategy which is usually understood and (often unwittingly) exploited when writing rules. Order dependent rules maynot be detected by running KBtest cases since these rules maynot cause the KBto produce erroneous answers. Fortunately, the inconsistency detection of the KBR3 analysis tool can uncover order-dependent rules. Whetherthis is done dependson an option set whenrunning the KBto KBR3language translators shownin figure 1. To suppress detection of order dependencyin the example below, the translator conjoins the negation of the antecedentof rule rl to the antecedentof rule r2, resulting in a new antecedent for rule r2 of: NOT(x 1) ANDy = 1. Without this option set, KBR3would find rule r2 inconsistent withrule rl. rl: IF x=l AND y=l THEN a=10 r2: IF y=l THEN a=20 r3: DEFAULT VALUE OF a=30 Inconsistent KBspecifications The most obvious cause of a logical inconsistency where the specifications on which the KBis based are themselves logically inconsistent (by "specification", we mean either formal documentation of mental models of the domainexpert or knowledgeengineer). Surprisingly, n0ngof the inconsistencies uncoveredso far by the KBR3tool are attributable to inconsistent specifications. This is probably because KBR3has thus far only been used for checking maintenance changes that have already been tested in other ways(e.g, regression testing, manualtesting). Of course, this is not to say that such inconsistencies will never be uncovered. This finding does, however, emphasizethe importance of distinguishing betweenerror types, as defined by their syntactical form, and the underlying causes for their introduction into a KB. Inputs correctly assumed to be unrealistic A logical inconsistency detected by KBR3can be caused when a set of inputs is allowed by the KBthat would never in practice be entered by any KBuser. A simple example might be: As suggested by the syntax in the exampleabove, rule r3 specifies default information and can never be considered inconsistent with other rules. The KBR3language provides a convenientway for explicitly representing default mechanismsvia a default rule that namesall other rules (in this case rl and r2) whoseconditions must false for the defaultrule to fire. With this arrangement, explicit order dependenciesare ignored while implicit ones are optionally uncovered.If, as sometimesis the case, an order dependencydetected by KBR3is caused by a rule acting to represent default information that can not be explicitly identified as such in the source language, then the resulting inconsistency error can be markedto be ignored by KBR3 in the future. For instance, rule r2 in the previous examplemayact as a more specific default which should be considered before r3. Onemayobserve that notations for specifying defaults are logically unnecessary since defaults can always be expressed as the negation of the disjunction of all the other cases. This, however, is a maintenance problem, since the conditions for defaults must be kept in sync with the non-default cases and amountsto an error-prone form of redundant information. In fact, the existence of this maintenanceproblem is probably the best indicator of whether a rule should be considered as representing default information. Using KBR3to detect rule order dependencies has turned out to be a very valuable tool for uncovering knowledgethat was left implicit. It is a unique test for KBquality and maintainability which is not addressed by other conventional testing methods. A majority of the inconsistencies found thus far have beenof this type. rl: IF airplane-model = "747" THEN travel-range = "long" r2: IF number-of-engines = 1 THEN travel-range = "short" KBR3wouldtreat rules rl and r2 as inconsistent, unless the KBexplicitly disallowed consideration of a single engine 747. This could be done with the explicit constraint: NOT (airplane-model = "747" AND number-of-engines <> 4). Apparentlythis type of inconsistency was prevalent in the KBsused to test the initial version of the KBReducer algorithm that was used at Bell Labs. In our experience of using KBR3on ESDSKBs, we have found relatively few inconsistencies that were due to such under-constrained combinationsof inputs. Inputs improperly assumed unrealistic Another more serious cause of inconsistencies found by KBR3is like the last, except that the KBbuilders incorrectly assumed an input set would be viewed as "unrealistic" by users. The KBR3tool found such an inconsistency in the first real KBit analyzed. A simplification of the rules involved was: rl: IF part-is-machined THEN NOT use-transparent-material r2: IF part-must-be-seen-through use-transparent-material THEN The problem was actually in the definition of machined part. The domainexpert did not expect that users would 109 r2: IF (Q OR (P AND NOT (vl IN-LIST L))) AND -65 <= X-max <= 400 AND -65 <= X-min <= 400 AND Y = yl AND Z = zl THEN Answer = answer-2 rl: IF ((P AND NOT (vl IN-LIST AND NOT (v2 IN-LIST L)) OR Q) AND -50 <= X-max <= 350 AND -50 <= X-min <= 350 AND Y = yl AND (Z IN-LIST (zl, z2) (Z = zl AND NOT R)) THEN Answer = answer-i Figure 2 - Overlapping ComplexRule Conditions consider windows as machined parts because of the modest cutting and drilling involved. The root of the problem was an imprecise definition of what constituted a machinedpart. Although we have so far only found a few inconsistencies of this type, the ability to detect themis a valuable contribution of the KBR3 tool. Causes All but two of the causes of inconsistencies covered in the last section can in theory also cause redundancy.The cause called "overly complex rules with overlapping conditions" does not apply because rules with overlapping conditions and identical actions are not technically redundant(there is a separate test for this). Obviously, the other cause that can not apply to redundancy is "Inconsistent KBSpecifications". From the KBsanalyzed, the causes of inconsistency that were actually found to also cause redundancywere: "Order-dependent rules", "Under qualified rules", and "Typos and cut and paste errors". The two causes involving unrealistic inputs were not seen, although there is no reason to think they never will. There were always less redundanciesreported than inconsistencies. While inconsistency always involves rules that conclude different values for the same input, redundancyinvolves more than just rules concluding the same values for the sameinput. Since a rule is redundantif it can be removedwithout affecting KBbehavior, redundancy also involves dead code in the form of un-firable rules or dead-endrules (i.e. rules not contributing to KBoutputs). For this reason, typos in attribute values were a common cause of redundancy, since they often produce un-firable rules. Under qualified rules Another cause of inconsistencies found by KBR3is the familiar problemof under-qualified rule conditions. One such error had the form: rl: IF i00 < x THEN r2: IF 125 < x THEN < y < y of Redundancy 500 AND P = 20 500 AND P AND Q = 50 The problem was that rl left out the condition: NOTQ. Thus while the rules seemedto cover overlapping conditions, the conditions should have been mutually exclusive. In the case alluded to above, the error mayhave not been found with conventional testing since the error’s only effect was to sometimesprovide overly safe recommendations. Overly complex rules with overlapping conditions Someknowledge engineers (and even some domain experts) "enjoy" writing complex boolean expressions. Unfortunately they rarely enjoy reading them. Such expression also tend to contain errors, especially when taken in combination. KBR3has uncovered such errors, usually as inconsistencies amongrules. Figure 2 shows the form of one case encountered. Other causes found for redundancyinclude: Intentional mirroring of redundant knowledge sources There were several cases where a KBintentionally had rules subsumedby others because the rules directly represented knowledgeobtained from different sources and direct traceability to those sources was an importantvalidation goal. Typos and cut and paste errors It is possible to imagineall kinds of logical errors that can be caused by editing mistakes. In practice, not many errors found by KBR3were attributed to editing mistakes. Mostthat were so attributed werenot identified as inconsistencies, but as redundancies. ii0 Intentional dead code SomeKBshad a lot of dead code (un-firable or dead-end rules) which was intentionally "dead", either because it was part of an in-work section of the KBor was temporarily disabled. rl: IF x=l THEN y=10 r2: IF x=l THEN y=20 r3: IF x=2 THEN y=20 r4: IF y=10 THEN z=100 r5: IF y=20 THEN z=200 r6: IF y=20 AND x <> 1 THEN z=300 Figure3 - Example of Error Propagation Unintentional Dead Code This formof deadcodewasoften created in the process of restructuring a KB.Sometimessuch code was initially left to fall backon in case the newcodedid not workout. It is helpful to identify this codesince it can be a stumblingblock for future maintenance. Invalid assumptions by KBR3 Onecause of redundancyreported by KBR3 had nothing to do with the KBsanalyzedbut rather wasan invalid assumptionmadeabout the inference engine that the KB waswritten for. TheKBR3 analysis mistakenlyassumed string comparisonsshouldbe case sensitive. This cause of "redundancy"is worthnoting since knowledge representation languagesand inference schemesused by real KBsare often rich and not fully documented. Issues in Error Reporting Twoproblemsin error reporting were experienced,both involvesingle errors whichgeneratemultipleerror messages. Error propagation Undersome circumstances, errors can be propagated fromthe rules generatingthemto other rules that depend on the formerrules. Takefor example,the set of rules shownin figure 3. In figure 3, the inconsistencybetweenrl and r2 would be propagated to rules r4 andr5. Notethat the error will not propagateto rules r4 andr6 becausethe condition"x <>1" preventsr2 contributingto r6. Topartially alleviate the problemhavingusers wade throughpropagatederrors, the errors are sorted by rule level, so that propagatederrors are alwaysreportedafter their primaryerrors. In practice, this approachis usually adequatebut incurs the risk of missingprimaryerrors that exist at the samelevel as a sea of propagated errors. Weare lookingfor moreeffective approachesthat would let us detect andremovepropagatederrors. It is possible that the approachtaken in Zlatareva, 1992,can handle this problem. Numberof actual vs. reported errors Beyondthe extra errors reported due to error propagation, the KBR3 algorithmwill report a single logical inconsistency manytimes, once for each assignment of 111 values to non-inputattributes that cause the inconsistencyto manifestitself. rl: IF A=B THEN R r2: IF A=B THEN -R In the aboveexample,if attributes AandB wereboth not inputs and could each take on 10 values, then KBR3 wouldreport 100 errors (one for each pair of A and values). Fortunately, these individual error reports are groupedtogether, allowingthe user to skip over mostof them. This practice does have a slight drawbacksince the user maynot discoverthat somerules havemorethan one inconsistencybetweenthem(e.g., supposerl and r2 both had an antecedent of: A=BORC=D) KBR3 Limitations and Work-Arounds TheKBR3 approachhas a couple of inherent limitations for whichwehavefoundpractical work-arounds. Problemsrelated to undecidability BecauseKBR3 analyzes KBscontaining algebraic equations, it cannotbe guaranteedto find all errors, andin fact mayat times report erroneouserrors. For example, since the present version of KBR3does not have any knowledge of trigonometricidentities, it won’trecognize that rules rl andr2 are inconsistent.Likewiseit will erroneouslyreport rules r3 andr4 as inconsistent. rl: r2: r3: r4: IF IF IF IF sin(90 + x) > 0 THEN cos(x) > 0 THEN x = 1 THEN y=sin(90+x) x = 1 THEN y=cos(x) Theapproachwehave taken to this problemis to add a limited suite of algebraicrules whichare expanded as the needarises. Fortunately, the KBsweare nowanalyzing only contain very simple algebraic expressions and the current set of rules haveprovenadequate. Efficiency limitations Wehave foundit computationalyprohibitive to run the KBR3analysis on manyof our large KBs. Wehave also foundthat size of a KBaloneis not a goodindicator of howlong the analysis maytake. For example,two small KBswere analyzed, both with roughly 150 rules. One took only 5 minutes, while the other took 120 hours! Whatappears moreimportantthan KBsize is the amount of rule inter-connectivity, length of inference chains, and the number of non-input attributes whieh have a large numberof explicitly stated values and which are also referenced in rule conclusions. The costlier KBscan still be partially analyzed. One approach we employis to use tools that derive a subset of a KBthat supports KBR3analysis of a requested set of attributes or rules. Whilean analysis is in progress, it is also possible to request obviously costly rules (and their dependants)to be skipped. found by running the KB, including hidden dependencies on rule ordering and under-constrained inputs. Interpreting the results of the analysis can sometimesbe challenging since there are manypotential causes for the errors detected, especially in the case of the general errors of inconsistency and redundancy. Due to the computational cost and time required by KBR3,it makes sense to first screen out obvious errors with less expensive tools or methods. An incremental version of KBR3 could reduce the need for this pre-screening. Future Directions References. There are manypotential enhancements which could be madeto the KBR3tool suite. Enhancementswe are considering include: Dahl, M. 1993. ESDS: Materials Technology KnowledgeBases Supporting Design of Boeing Jetliners. To appear in the proceedings of the 1993 conference of Innovative Applications of Artificial Intelligence (IAAI93). Incremental analysis The efficiency limitations mentionedin the last section motivate ongoingresearch into an incremental version of the KBR3analysis. Incremental analysis will make it unnecessaryto re-analyze portions of a KBunaffected by KBmaintenance changes. This should allow the KBR3 tool to be used morefrequently. Error interpretation/correction support The variety of causes we encounteredfor errors detected by KBR3tools have shownthat strategies for fixing errors must dependon the underlying causes of errors and not just the syntactical form of the errors. Therefore, we are not currently trying to automate error correction. Instead, we are working to provide information relevant to error interpretation via a rich user interface that incorporates hyper-text links. Test case generation Weare looking into using the results of KBR3analysis to augmentregression test suites with tests that cover newly added KBinput-output behavior. Integration/Redundancy Dahl, M. and Williamson. K. 1992. A Verification Strategy for Long-Term Maintenance of Large RuleBased Systems. In Workshop notes for the AAAI-92 Workshop on Verification and Validation of Expert Systems. Ginsberg, A. 1991. Knowledge-BaseReduction: A New Approach to Checking Knowledge Bases For Inconsistency & Redundancy. In Validating and Verifying Knowledge-Based Systems. IEEE Computer Science Press. Preece, A and Shinghal, R. 1992. Analysis of Verification Methods for Expert Systems. In Workshop notes for the AAAI-92 Workshop on Verification and Validation of Expert Systems. Stackowicz, R. 1991. Validation of Knowledge Based Systems. Lockheed Missile & Space Co. Technical Report #RL-TR-91-361. Williamson, K. and Dahl, M. 1993. Knowledge-Based Reduction for Rule Bases Containing Equations. To appear in Workshop notes for the AAAI-93 Workshopon Verification and Validation of Expert Systems. management tools Weare looking into tools based on KBR3analysis which automatically detect redundancies across KBsas well as general frameworks to manage replicated knowledge. Suchtools will probably not be practical without an incremental version of the KBR3analysis. Conclusions Verification tools such as KBR3are proving to be effective in detecting otherwise hard-to-find anomalies in knowledgebases, despite the computational cost of performing the analysis. Moreover, the analysis can uncover potential maintenance traps that are usually not 112 Zlatareva, N. 1992. Knowledge Refinement via KnowledgeValidation. In Workshopnotes for the AAAI. 92 Workshopon Verification and Validation of Expert Systems.