From: AAAI Technical Report WS-94-03. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. EXCEPTION DAGS AS KNOWLEDGESTRUCTURES Brian R Gaines KnowledgeScience Institute University of Calgary Calgary, Alberta, Canada T2N 1N4 gaines @cpsc .ucalgary.ca Abstract: The problem of transforming the knowledge bases of performance systems using induced rules or decision trees into comprehensible knowledge structures is addressed. A knowledgestructure is developedthat generalizes and subsumesproduction rules, decision trees, and rules with exceptions. It gives rise to a natural complexity measurethat allows them to be understood, analyzed and comparedon a uniformbasis. This structure is a rooted directed acyclic graph with the semantics that nodes are concepts, someof whichhave attached conclusions, and the arcs are ’isa’ inheritance links Withdisjunctive multiple inheritance. A detailed exampleis given of the generation of a range of such structures of equivalent performance for a simple problem, and the complexitymeasureof a particular structure is shownto relate to its perceived complexity. The simplest structures are generated by an algorithm that factors commonconcepts from the premises of rules. A more complexexampleof a chess dataset is used to showthe value of this technique in generating comprehensibleknowledgestructures. INTRODUCTION This paper presents th e continuation of research reported at previous KDD workshopson the development of knowledgestructures through the inductive modeling of databases (Gaines, 1991c; Gaines, 1991b). Thf issue addressed is that of the Comprehensibilityof the modelto people, that is, whether the contents of a model that performswell can be transformed into a communicableknowledgestructure that provides insights into the basis of the performance. The Induct methodology(Gaines, 1989) for deriving rules directly from a database through statistical analysis has proven to be effective in modelinglarge databases such as the 47,000 cases of the Garvan thyroid medical records over ten years (Gaines and Compton,1993). The simplicity of the computations underlying Induct enables it to operate at ve~ high speed, on the one hand supporting interactive developmentOf rules, and on the other enabling longitudinal data analysis studies to be undertaken such as the incremental modelingof the Garvandata over a ten year period to determine its stationarity. The performance of the derived models compares favorably with those developed by humanexperts and by other empirical induction methodologies. However,the modelsproduced of significant databases typically have such large numbersof rules that they are not comprehensible to people as meaningful knowledgefrom which they can gain insights into the basis Of decision making. This problem is common to both the manuallyand inductively derived rule sets, and seemsintrinsic to the use of production rules as the basis of performancesystems (Li, 1991). Similar problemsoccur with equivalent decision trees wherethe basis of decision makingis not apparent whenthere are large numbersof nodes. It is not obvious that an excellent performance system necessarily implies the existence of a comprehensible knowledgestructure to be discovered, Humanpractical reasoning is in major part not knowledge-based (Gaines, 1993b), and in the knowledge acquisition community’expertise transfer’ paradigmshave been replaced in recent years by i~’knowledgemodeling’ paradigms(Schreiber, Wielinga and Breuker, 1993) that impute the resultant overt knowledge to the modeling process, not to some hypothetical knowledgebase within the expert. Clancey, whohas played a major role in promoting this paradigm shift (Clancey, 1989; 1993), has done so in part based on his experience in developing overt knowledge structures from MYCIN rules to support GUIDON (Clancey, 1987a) as a teaching system based on MYCIN.In critiquing MYCIN’srules as not being a comprehensible knowledge structure, Clancey remarks: KDD-94 AAAI-94 Workshop on Knowledge Discovery in Databases Page 13 "Despite the nowapparent shortcomingsof MYCIN’s rule formalism, wemust remember that the programwas influential because itworked well. The uniformity of representation, so muchthe cause of the ’missing knowledge’and ’disguised reasoning’ described here, was an important asset. Withknowledgeso easy to encode, it was perhaps the simple parametrization of the problemthat madeMYCIN a success. Theproblem could be built and tested quickly at a time whenlittle was knownabout the structure and methodsof human reasoning..... Wecan treat the knowledge base as a reservoir of expertise, somethingnever beforecapturedin quite this way,anduse it to suggestbetter representations."(Clancey,1987b) The developmentof excellent performance ~stems will remain a major practical objective regardless of the comprehensibility of the basis of their performance. However,the challenge of increasing human knowledgethrough developing understanding that basis remains a significant research issue in its own right. Canone take a complexknowledgebase that is difficult to understand and derive from it a better and more copmprehensiblerepresentation as Clancey suggests? If so, to what extent can the derivation process be automated7 This paper presents techniques for restructuring production rules and decision trees to generate more comprehensibleknowledgestructures. In particular, a knowledgestructure is presented that generalizes and subsumesproduction rules, decision trees, and rules with exceptions. It gives rise to natural complexity measures that allow themto be understood, analyzed and comparedon a uniform basis. This structure-is a rooted directed acyclie graph with the semantics that nodes are concepts, someof which have attached conclusions, and the arcs are ’isa’ inheritance links with disjunctive multiple inheritance. KNOWLEDGE STRUCTURES In attempting to improvethe comprehensibility of knowledgestructures it wouldbe beneficial to have an operational and psychologically well-founded measure of comprehensibility. However, there are in general no such measures. This is not to say that one haS to fait back on subjective judgmentalone. There are general considerations that smaller structures tend to be more comprehensible, coherent structures more meaningful, those using familiar concepts more understandable, and so on. The internal analog weights and connections of neural networks with graded relations of varying significance are at one extreme of incomprehensibility. Compactsets of production rules are betterbut not very muchso if there are no clear relations between premises or obvious bases of interaction between the rules. Taxonomic structu~s with inheritance relations between concepts, and concept definitions based on meaningful properties, are probably most assimilable by people, and tend to be the basis of the systematization of knowledge in much of the scientific literature. Ultimately, human judgment determines what is knowledge,but it is not a suitable criterion as a Starting point for discovery since the most interesting discoveries are the ones that are surprising. The initial humanjudgmentof what becomesaccepted as an excellent knowledgestructure maybe negative. The process of assimilation and acceptance takes time. Onefeature of knowledgestructures that is highly significant to discovery systems is that unicity, the existence of one optimal or preferred structure, is the exception rather than the rule. There will be many possible structures with equivalent performance as models, and it is inappropriate to attempt achieve uaicity by an arbitrary technical criterion such as minimalsize on somemeasure. The relative evaluation of different knowledgestructures of equivalent performanceinvolves complexcriteria whichare likely to vary from caseto case. For example, in somesituations there maybe preferred concepts that are deemed moreappropriate in being usual, fatal!" iar, theoretically more interpretable or coherent, and so on. A structure based on these concepts maybe.preferred over a smaller one that uses less acceptable concepts. Thus, a discoverysystem that is able to present alternatives and accept diverse criteria for discrimination betweenthem maybe preferred over one that attempts to achieve unicity through purely combinatorial or statistical criteria. Onthe other hand, it is a significant research objective in its ownright to attempt to discover technical criteria that have a close correspondenceto humanjudgment. These have been the considerations underlying the research presented in this paper: to generate knowledgestructures that have a form similar to those preferred by people; to explore a variety of possible structures and accept as wide a range of possible of external criteria for assessing them automatically or manually; and to develop principled statistical criteria that correspond to such assessmentsto the extent that this is possible. Page 14 AAAI-94 Workshop on Knowledge Discovery in Databases KDD-94 KNOWLEDGE STRUCTURES The starting point for the research has been empirical induction tools that generate production rules or decision trees. Oneearly conclusion was that a hybrid structure that had someof the characteristics of both rules and trees was preferable to either alone. This structure can be viewedeither as a generalization of rules allowing exceptions, or as a generalization of trees such that the criteria at a node do not haveto be based on a single property,or be mutually exclusive, and the trees are generalized to partial orders allowing branches to cometogether again. These generalizations allows substantially smaller knowledge structures to be generated than either rules or trees alone, and overcomesproblemsof trees in dealing with disjunctive conceptsinvolving replication of sub-trees. Figure 1 exemplifies this knowledgestructure, termed an exception directed acyclic graph (EDAG). may be read as a set of rules with exceptions or as a generalized decision tree. Its operational interpretation is that one commenceswith the root node at concept 0, and places conclusion O on a provisional conclusion list. Followingthe arrow downto the left to concept 1, one evaluates the concept and, if it applies, replaces the provisional conclusion from that path with the specified conclusion. Then one follows the arrows downagain to concepts 3 and 4 applying the same logic. Multiple arrows maybe conceptualized as being evaluated in parallelBit makes no difference to the results if the graph is followeddepth-first or breadth first. I conceptO"~ conclusion 0/ (bothoptional) concept1 conclusion 1 concept 3 i. concept 4conclusion3 "concluslon4 concept 8 conclusion8 concept9 concept 2 conclusion-5 concept 6 concept 7 conclusion 7 J concept10 1 concept12 concept 13 I conclusion Figure 1 Knowledgestructure--exception directed acyclic graph Several features of EDAGs maybe noted: ¯ Concepts at a branch do not have to be mutually exclusive so multiple conclusions can arise. An interesting exampleis conclusion 5 whichis concludedunconditionally once concepts 0 and 2 apply. ¯ As shownat concept 4, the structure is not binary. There maybe morethan two nodes at a branch. ¯ As shownat conclusion 11, the slmc~e is not a tree. Branchesmayrejoin, corresponding to rules with a commonexception or commonconclusion. ¯ As shownat concept 2, a concept does not have to have a conclusion. It mayjust be a common factor to all the concepts of its child nodes. As in decision trees, not all concepts in an EDAG are directly premisesof rules. ¯ As shownat conclusion 11, a conclusion does not have to have a concept. It may just be a common conclusionto all the conceptson its parent nodes. KDD-94 AAAI-94 Workshop on Knowledge Discovery in Databases Page 15 ¯ As shownat the node between concepts 6 and 12, it maybe appropriate to insert an empty node to avoid arrows crossing. ¯ The notion of "concept" used is that of any potentially decidable predicate; that is, one with truth values, true, false or unknown.The unknown case is significant because, as illustrated later, conclusions from the EDAG as a whole may be partially or wholly decidable even though some concepts are undecidable in a particular casemusuallybecause someinformation is missing for the case. A set of production rules without a default and without exceptions forms a trivial EDAG with one, empty initial node having an arrow to each rule. AnID3’style decision tree forms an EDAG that is restricted to be a tree, with the set of concepts fanning out from a node restricted to be tests of all the values of a particular attribute. Rules with exceptions form an EDAG in which every node has a conclusion. Rules with exceptions with their premises factored for commonconcepts (Gaines, 1991b) form a general EDAG. The direction of arrows in Figure 1 indicates the decision-making paths. However,if the arrows are reversed they becomethe ’isa’ arrows of a semantic network showinginheritance amongconcepts, with multiple inheritance read disjunctively. For examplethe completeconcept leading to the conclusion 11 is (concept 0 ^ concept I ^ concept 4) ^ (concept 9 v concept 10). The complete concept for nodes with child nodes involves the negation of the disjunction of the child nodes, butnot of their children. For example, the complete concept leading to the conclusion 1 is (concept 0 ^ concept 1 ^ --,(concept 3 v concept 4)), with concepts 8 through 10 playing no role. This last result is important to the comprehensibility of an EDAG.The ’meaning’ of a node can be determinedin terms of its path(s) back to the root and its child nodes. The other parts of the tree are irrelevant to whetherthe conclusion at that node will be selected. Theyremain potentially relevant to the overall Conclusionin that they maycontribute additional conclusions if the structure is not such that conclusions are mutually exclusive. Froma psychological perspective, the interpretation of the EDAG maybe seen as each node providing a well-defined ’context’ for its conclusion, whereit acts as an exception to nodes back to the root aboveit, and it has as exceptions the nodes immediately below it. This notion of context is that suggested by Comptonand Jansen (1990)in their analysis of ’ripple-down rules’, the difference being that EDAG contexts are not necessarily mutuallyexclusive. KNOWLEDGE STRUCTURES Induct is used to generate EDAGs through a three-stage process. First, it is used to generate rules with exceptions. Second, the premises of the rules are ,factored to extract common sub-concepts (which are not necessarily unique, e.g. A ^ B, B ^ C and C ^ A maybe factored in pairs by A, B or C). Third, common exception structures are merged. The second and third stages of factoring can be applied to any set of production rules represented as an EDAG,and it is appropriate to do so to C4.5 and PRISMrules when comparingmethodologies and illustrating EDAGs as knowledgestructures. The familiar contact lens prescription problem (Cendrowska,1987) will be used to illustrate the EDAG a knowledgestructure. Figure 2showsthe ID3 decision tree for the lens problem as an EDAG,and Figure 3 showsthePRISMproduction rules as an EDAG. The representation in itself adds nothing to the results, but the examples illustrate the subsumption of these two commonknowledge structures. The representation of the rules can be improved by factoring the premises to produce the EDAG shownin Figure 4. Figures 2 through 4 are trivial EDAGs in the sense in that no exceptions are involved. Moreinteresting examples can be generated using C4.5’s methodologyof: reducing the number of production rules by specifying a default conclusion; removing the ruleswith that conclusion; and makingthe other rules exceptions to the default. This can be applied to the PRISMrules in Figure 4 by making "none" the default and removingthe four rules with "none"as a conclusion on the left. Both PRISMand C4.5 then generate the same set of rules with exceptions whichcan be factored into the EDAG shownin Figure 5. Page 16 AAAI-94 Workshop on Knowledge Discovery in Databases KDD-94 { } {,..,0,=’;O,,o:r..<’ o.<’ }I,..,,,.,,.,or,.,..., J =,0°==:=,0~.,° } {.=0°.,.°.no,=,0,o 1 age=young " 1 fage=pre-presbyoplc~( ........ ,, ....,,, 1 (~.M,.w,.,u,,.,-,.,,,,~rm,".,,,.e/f lelqm,,~oft ) t hme=aoft J t "~"-~’ .... y-,v,,, j Lv.,~,-,,~ ,w,-,.,v-, ,--.,v ~ prescription=myope lens,, hard J F age=presbyopic~ IFage=pre-presbyoplc~~ age=young preecdptlon=myope ~ ~rescrlpUon=hypermetrope~l lens,none )t lene=herd lene-none. Jt ,,he=con it. lena=no,, Jr, t’ Figure 2 ID3 contact lens decision tree as EDAG t | lene=none /I I I I I/ . age=young ,. i : II " I i r i tear production=normal 1 f astigmatism=astigmatic ’ " I < I prescnptlon=nypermetrope I I ,,,," ¯I I astigmatism I not astigmatic I prescription=hypemietrope/ age=presbyoplc I [ / I .... lens=none t ~ l, lene=soft ) J~ ’, - ~" aatigmattsm=asilgmallc /~ tearpr°cl~al IIF tearproducUon=normal [ prescdptlon=hypermetrope / I astigmatism--not astigmatici [astigmatism=not astigmatic/ ~pre-presbyopic l/ age=young ¯ [ . age=pre-presbyoplCi[ / lens=eoft ,ens=ioft j ,J k t ’ lens=none Jt Figure 3 PRISMcontact lens production rules as EDAG { } I tear production=normal] :raslig n~lilsm=not astigmatic} { astigmalism=astigmatlc} I lene=none 1 { ,.n..o. ) { ,....,,.ro Figure 4 PRISMcontact lens production rules, factored as EDAG KDD-94 AAAI-94 Workshop on Knowledge Discovery in Databases Page 17 I I lens=none 1 tear production=normal I Iastlgmat/sm-notastigmaticl I astigmatism---astigmatic 1 age=young I pre’presbyOplcl’)l(l:iescd ption=myope 1(a’~ge=yOung 31 [prescdptl°n=hypermetr°;ll I lens=soft [ 1 I leneihard 1 Figure 5 PRISM/C4.5 contact lens rules with default, factored as EDAG Induct generates multi-level exceptions in that an exception mayitself have exceptions. Figure 6 shows two EDAGs generated by Induct from the 24 lens cases. Onthe left of each EDAG the "soft" conclusion is generatedby a simple concept that has an exception. The left EDAG is generatedby Induct fromthe 24 possible eases. Theright EDAG is generatedby Induct whenthe data is biased to have a lower proportion of the "hard"exceptions (46 eases usedn2*24with the 2 examples of the "hard"exception removed fromthe secondsetmlogieally unhanged,but the statistical test that Inductuses to determinewhetherto generate an exception is affectedmtheexceptions are nowstatistically more"exceptionable"). I I I lens=none 1 ( ] lone,soft prescdption=myope ags=presbyoplc . lens,none , astigmatism=astigmatic age=young¯ , ’} prescrlptlon=myope ~ "~ lenkhsrd lens=none 1 tear production=normal 1 ~as~mat~e==r~to~tStig maticl[ astig mla;inss=m~r~g rustic 1 I ¯ prescdption=myope ~ (’prescrip.tlon=hypermetrope lens=none ¯ ag.e=presbyopic ,~ mns=none .) /L 1 Figure 6 Twoforms of Induct rules with exceptions, factored as EDAG COMPLEXITY MEASURES FOR EDAC~ The results shownin Figures 2 through 6 makeit clear that manydifferent EDAGs with the same performancecan be derived from existing empirical induction methodologies. It is also apparentthat someare substantially simpler than others, and that the simpler one present a morecomprehensible knowledgestructure. Figures 5 and 6, in partieular, illustrate that simple comprehensibleknowledge structures are not unique---eachrepresentsa reasonablebasisfor understandingandexplainingthe contact lens decision methodology.Figure6 left mayappearmarginallybetter than the alternatives in Figures 5 and 6 right, but one can imagine a debate betweenexperts as to which is best~Figure 5 has only one layer of exceptionsmFigure 6 right is morebalancedin its treatmentof astigmatism. Page 18 AAAI-94 Workshop on Knowledge Discovery in Databases KDD-94 It is interesting to derive a complexity measurefor an EDAG that correspondsas closely as possibleto the psychologicalfactors leading to judgmentsof the relative e0mplexityOf different EDAGs. As usual, the basis for a structural complexityi measureis enumerative;: that iwe count the componentsin the representation (Gaines, 1977). Theobviouscomponentsare the n0de Symbols(boxes), the arc symbols (arrows):and the conceptand conclusionclauses. In addition, since the graphsare not trees and branches can rejoin with possible line crossings causing vis~ confusfmn,the cyclone number(Berge, 1958) ¯ considered. It mayberegardedas counting(undated) cycles or as indicating the excess of arcs over those necessaryto join nodes.Inductalso allowsthe optionof adisjunctionof valueswithina clause, and sucha disjunctionis weightedby the numberof valuesmentioned,e.g. xffi5 counts1, x¢ {5, 8} counts2, andx~ {5, [8, ’12]}counts3 (where[8,12] specifiestheinterval from8 through12). Thederivation of an overall complexitymeasurefrom these component counts has to take into account the required graph reductions which should lead to Complexitymeasure reductions. The basic requirementsare shownin Figure 7,and it is apparent that clause reduction should dominate,cycle reductionneedsto be taken into accountwith lesser weight,andnodereductionwith still lesser weight. Onesuitable formulais: complexityof an EDAG = (clauses x 4 + cycles x 2 + nodes= 1)/5 (1) wherethe -1 and scaling by 5 are normalizationconstants. Condensation 2 -> 1 Nodes Arcs 1 -> 0 0->0 cC~sules ses 2 -> 2 Complexity1.8-> 1.6 Premise Facto#ng Conclusion Factoring Nodes 3 -> 4 Nodes 3 -> 4 Arcs 2 -> 3 Arcs 2 -> 4 0->0 ~ C aC~uum 0->1 a~uses 8 -> 7 6 -> 5 ¯ Complexity6.8-> 6.2 Complexity5.2 -> 5.0 ¯ Cycle Reduction Nodes 5 -> 6 Arcs 6 -> 6 2->1 cC~ades uses 8 -> 8 Complexity8.0 -> 7.8 © Figure 7 Graphreductions required to correspond to complexity measurereductions For standard decision trees and productionrules there are relations betweenthe numbersin formula(1) that allowoneto give the complexitymeasuremorespecific interpretations. complexityof decision tree = numberof nodes+ 0.8 x numberof terminal nodes (2) complexityof productionrules = 0.8 x numberof clauses + 0.2 x numberof rules (3) Theseboth seemintuitively reasonable:asreflecting natural components of the structural complexityof trees and rule sets. However,they are somewhatarbitrary, and it mouldbe appropriate to conduct empirical psychologicalexperimemswith a range of EDAGs and evaluate the correlation betweenthe undei’standing of knowledgen~asured in various waysand the complexitymeasuredevelopedhere. Sufficient data wouldalso allowother complexitymeasuresto be evaluated. Figure 8 showsthecomponent counts and the com~tedcomplexitymeasuresfor the solutions to the lens problems shown in Fibres 2 through 6. ~e C6mputed,eomplexity measure does appear to have a reasonablecorrespondence tO the variations in complexitythat are subj~tivelyapparent. KDD-94 AAAI-94 Workshop on Knowledge Discovery in Databases Page 19 Hgure Example Cyd~ Clauses Nodes N : 3 4 s D3 PRISM PRISM,factored 15 PRISM/C4.5 default, lO 10 14 14 9 X=A’N+I 0 . . 0 17 4 2 C 23 . 34 19 11 11 11 8 1 Induct, factored , ,. , 8 i i % ¯ o 13 7 6 Induct varlet, factored .i , Hgure 8 Comparisonofcomplexity of contact lens EDAGs °, Complexity (4C+2X+N d)/s 21.2 29.0 19.4 11.4 10.6 i 11.6 A CHESS EXAMPLE Thecontact lens exampleis useful for illustration but too simplefor the significanceof the complexity reduction to be evaluated. This section presents a somewhatmoreComplex examplebased on one of Quinlan’s(1979) chess datasets that Cendrowska (1987) also analyzedwith_Prism.Thedata consists 647crees of a chessend gamedescribedin termsof four 3-valued:and three 2-valuedattributes leadingto one of twoconclusions.All the solutions describedare 100%correct. Figure 9 showsthe 30 node decision treeproducedby ID3.andFigure 10 showsthe 15 rules producedby C4.$for the chess data. Cendrowska (1987) reports a substantiallyiarger tree and the samenumber misswith someadditional clauses. Figure 11 left and center showsthe rules producedby C4.5factored in 2 ways.TheEDAG at the center introducesan additional (empty)nodeto makeit clear that the possible exceptionsare the sameonboth branches. Figure 11 right showsan EDAG producedby Induct. AII three EDAGs are interesting in not beingtrees and involvingdifferent simplepresentationsof the sameknowledge. . line = 2: safe (324.0) llne = 1: r>>k = 2: safe (162.0) r>>k = 1 : r>>n = 2: safe (81.0) r>>n = 1: k-nffii: n-wk = 2: safe (9.0) n-wk = 3: safe (9.0) n-wk = I: k-r = i: safe (2.0) k-r = 2: lost (3.0) k-r = 3: lost (3.0) k-n = 2: n-wk = 2: safe (9.0) n-wk = 3: safe (9.0) n-wk = 1: k-r = 2: lost (3.0) k-r = 3: lost (3.0) k-rffi1: r-wk = I: lost (I.0) r-wk ffi 2: safe (I.0) r-wk = 3: safe (1.0) k-n = 3: k-r = 2: lost (9.0) k-r = 3: lost (9.0) k-r = 1: r-wk = 1: lost (3.0) r-wk ffi 2: safe (3.0) r-wk = 3: safe (3.0) Figure 9 ID3chess decision tree Page 20 AAAI-94 Workshop on Knowledge Discovery in Databases KDD-94 k-r = 2 & n-wk k-r = 3 & n-wk n-wk = 1 & r-wk k-n = 3 & r-wk k-n = 3 & k-r k-n = 3 & k-r = 1 & line = 1 & line = 1 & line -- 1 & line = 2 & line = 3 & line = = = = = = 1 1 1 1 1 1 & & & & & & k-n k-n k-r k-r k-n k-n r>>k r>>k r>>k r>>k r>>k r>>k = = = = = = = = = = = = 1 1 1 1 2 2 1 1 1 1 1 1 & & & & & & & & & & & & r>>n r>>k line n-wk n-wk r-wk r-wk n-wk n-wk r>>n r>>n r>>n r>>n r>>n r>>n = = = = = = = = = = = = = = = 2 2 2 2 3 2 3 2 3 1 1 1 1 1 1 -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> safe safe safe safe safe safe safe safe safe lost lost lost lost lost lost Figu~ 10 C4.5 chess rules sa~ [..-] ] [-] f ,,n.-t 1 / r>>k=t J f line-t 1 | r>>k=t / L n,>n-t L ,>>,.tj J .t ~, ,} , [ ~.°.s 1[°.,.,] [ ~.°:~ 1[~,:,] {-.][¯ ,, ~ -~>~+ [" ] {, ,1 [’*,*] c ] {k-r’2,3 ] { r-wk-1 ] {-] Figm-e11 Chessrules with exception, factored as alternative EDAGs Figure ,,, ,i .. I 10,,, ¯ 9. ¯, 7 . ’ ,, IlL I 1112 ,I,!R KDD-94 Nodes Example Ce.,ndrowska tree . i N 79 ¯ 7S 15 Cyd~ Clauses X=A-N+I 0 0 C 26 ¯ PRISM production roles 16 C4.5productionrules 15 o 31 30 f ID3decisiontree 0 II Ii 9 11 ¯3 PRISM default, factored !’ 9 3 , C4.5,default, factoire, d i 8 C4.5 default+, factored 9 2 , i i ~ Inductexception,s,factored 7 7 1 i . Figure 12 Comparison Of complexityof lens EDAGs 130 ¯ 73 7O 50 12 10 10 11 AAAI-94 Workshop on Knowledge Discovery in Databases Complexity (4C+2X+N -1)/S 119.6 61.4 59.0 46.0 12.4 10.4 10.2 10.4 Page 21 Figure 12 showsthe complexities of various EDAGs solving the chess problem. The decision tree publishedby Cendrowska (]987)!s morecomplexthan that Producedby ID3. Pp,.ISMgenerates the same 15 productionrules as does C4.5except that twoofthe rules have redundantclauses. This has a small effect ontherelative complexityof the productionrules, but a larger effect on that of the factoredrules since those frompRISM do not ¯factor as well becauseof the redundantclauses. Thethree solutions shownin Figure11 are Similar in complexityas one mightexpect. Theyare different, yet equally valid, waysof representingthe solution. Thereduction betweenthe tree in Figure 9 or the rules in Figure 10 and the EDAGs in Figures 11 does seemto go Somewayto meetingMichie’sobjections to trees or rule sets as knowledge structures that Quinlan(I99I)cites in the preface of the bookfromKDD’89. It is moreplausible to imaginea chess expert usingthe decisionproceduresof Figure11 thanthoseof Figures9 or 10. :: INFERENCE WITH EDAGs--UNKNOWN VALUES Inference with EDAGs whenall concepts are decidable is simple and has already been described. H0wcver,iinference whensomeconceptsare unknown as tol/their truth values involves somesubtleties that are significant. Considerthe EDAG of Figure 13 whereConcept-1is unknown but Concept-2is true and the attached Conclusionis the sameas the default. Onecan then infer Conclusion= Aregardless of whatthe truth value of Concept-1 mightbe. I nou.o° .1 ,., I f D ej coept-1-- uknown ) . lC0nelualon. A) Concept-2 = "l’rue Infer Conclusion ffi A, eventhoughConcept-1 is unknown Figure 13 Inference with unknownvalues The required reasoning is trivial tO implementandis incorpo~ in the EDAG-inference procedure of (Gaines, 1991a;Gaines, i993a), a KL-ONE~Iike knowledgerepre~ntation server. Theevaluation a conceptas true, false or unknown is common in such ¯systems. It is simpleto markthe EDAG such that :a nodeis disregardedif: it has an unknown Conceptbut has a child nodethat is defmitelytrue; or it has the tameconclusionas another nodeWhoseconceptis true. .... Theimportanceof pruningthe list of possible conclusionsis that it is the basis of acquiringfurther information,for example,by askingthe user of the system.It is ~mportantnotto request informationthat maybe expensiveto obtain but is irrelevant to the conclusion, Cendrowska (1987) makesthis point ¯ strongly.in comparing modularrules withdeci’siontrees; that, for example,in contactlens prescriptionthe i’ Bmentof tear produa!’on is tir~e-consuming and unpi~ and shouldbe avoided if possible. She notes that it is not required in the 3 rules producedby PRISM shownat the lowerleft of Figure 3 but is ¯i ~. the first amilmteUutted i~ the ID3decisiontree of Figure2. Shearguesthat the tree requiresthe testing of this attn’Imteunnecessarily in certaincases. However.the three decidable cases with unknownvalues of tear production (corresponding to the ~ premisesof the three rules on the lower left of Figure 3)are all correctly evaluated by the EDAG of Figure 2 using the reasoningdefined above, asthey are by all the other EDAGs of Figures 3 through7. The’problem Cendrowska raises is a question of properlyusing the tree as a basis for inference, rather than a distinction betweentrees and productionrules. Page 22 AAAI-94 Workshop on Knowledge Discovery in Databases KDD-94 .~" ¯ . .- Since the complexityfigures in Figure 8 indicate that the PRISM roles are. on one reasonablemeasure, more¯complexthan the ID3~e, it wouldseemthatthe argumentsfor roles beingbetter than trees are not justified. Certainlythe highly restrictive/standard ~eision tree can be improvedin Comprehensibility by the use of exceptionsandfactoring, but whetherthe resultant structure is a moregeneralrule, or a set of ¯ m!eswith ezaeptions,is a matterof perspective. TI~ simple pruning proceduredescribed aboveis sufficient todeal with the tear production problem raised by Cendrowska. However,it is inadequatetO properlyaccountfor all tbe nine decidablecases with unknownvalues (correspondingto:tbe premisesof the rules in Figure 3). For example,someonewhose tear production is normal, astig~sm is asti~ and age isyoung but whoseprescription is unknown ghouldbe inferred as lens is hard. HOwever, this inference cannotbe madewith the ID3tree of Figure2 using the proceduredescribedabovealone. KRScopeswith this situation by keepingtrack of the relation betweenChildnodasthat are suchthat 0he of themmustbe true, In the case just definedit infers that the collusion is lens is hard becausethe twonodeswith lens = hard conclusionsat the lowerright of Figure 2 are both possible, one of themmustbe true, and both havethe sameconclusion. KRS also disregardsa nodeif the conclusionis alreadytrue of the entity beingevaluated,againto prevent an unnecessary attemptto acquirefurther information.rThefour strategies describedare suchas to reduce the list of possible conclusionsto the logical minimum, and form a completeinference strategy for EDAGs. It shouldbe notedthat the completeness of inference is dependenton the possibility of keepingtrack of relations betweenchild nodes.Thisis Simplefor the class!ficati0nof single individualsbasedon attributevalue data. It is far more complexfor EDAG-baSed reference with arbitrary KL,ONE knowledge structures involvingrelated individuals, whereit is difficult to keep¯track Ofall the relations between partially openinferences. This correspondstO managing the, set Of possible extensionsof the knowledge base generatedby resolvingthe unknown concepts,and is inherently intractable. CONCLUSIONS _ The problem of transforming performance systems based on induced rules or decision trees into ¯ comprehensibleknowledgestructures havebeen addressed. AknowledgeStructure, the EDAG, has been develol~dthat generahzesand subsumesproductionruleS, decision, trees, and rules with exceptions. It : gives rise to a natural complexitymeasurethat allows themto be understood,analyzedand comparedon a " uniformbasis. AnEDAG is a rooted directed acyciic graph=wi~the semanticsthat nodes are concepts, ! "axe’ hsa" Rantnhen l~:with multiple some of which have attached conclusions ce disjunctive ¯. " i ii , and the arcs inheritance. A detailed examplehas been givenof tbe generation ofa range of such structures of equivalent performancefor a simpleproblem~and the Complexitymeasureof a particular structure have beenshownto relate to its perceivedcomplexity.Thesimplest structures are generatedby an algorithm - that factors co~nconcepts from the: premises:of rules. Asecond exampleof a chess dataset was used to showthe value of this techniquein generatingcomprehensibie knowledge structures. It is suggestedthat the jtechniquesdescribedin this paperwill be useful in allowingknowledge structures ! developedby empirical induction that have goodperformancebut are not humanly.comprehensible to be U~nsfonned into ones that maybe morecomprehensible. Areferee raised the questionof the relative effectiveness of c4.5 and Induct, and it is appropriateto comment briefly on this. Therewere two motivationsunderlying the developmentof Induct. First, to developa high speed induction enginefor usein interactive knowledge acquisition with large datasets. ,%Condito g~--n~teknowledge structures that reflected humanconceptualframeworks. " Thefirst objective was addressedby extendingthe techniques of PRISM to generate rules directly from the data rather than by post-processing:Ofa decision tree, and by Usingdata structures (bit maps)that speededthe computationsrequired. Onlarge datasets InduCtis generallyover !00 times faster than C4.5 ¯ ’i : in generatingrules with equivalentperformance.However,the complexityanalysis of the algorithmsis ¯ similar, and the twocouldbe madecomparablein speedby applyingthebit mapdata structures in a CA.5 implementation. KDD-94 AAAI-94 Workshop on Knowledge Discovery in Databases Page 23 Thesecond objective wasaddressedby: deVelopingrules with exceptionsusing a statistical methodology to determinewhetheran over-general:rule ihavingerror cases .is to be specialized by addingadditional r premiseclauses, or left as it is Withthe exceptioneases beingcoveredby additionalrules. Thestatistical . test used is to Calculate whethera rule is successful by chance (Gaines, 1989), and to comparethis probabilityfor the rule witherrors with. those for therulesobtained byaddingclausesto it. What.this paper showsis that, for somedatasets, the secondobjective can also be achievedby postprocessingthe rule Sets developedusing algorithms:Such as those of C4,5to generate EDAGs that are ~mbstanfiallysimpler andhavea hierarchical structure similarto that of humanconceptualframeworks~ Experimentsare underwayto evaluate the post.processing:Whenapplied rules derived fromdata having m~ydecision outcomesWheremulti’level exceptions seemto generate the most compact,and meaningful knowledge structures. ACKNO~~ This workwasfundedin part bythe Natural Sciencesand EngineeringResearchCouncilof Canada.I am grateful to Paul Compton and Ross Quinlanfor access to their research and for discussions that have influencedthis work.I amalso grateful to the anonymous referees for helpful comments. REFERENCES Berge, C. (1958). The Th~ryof Graphs. London,Methuen. Cendrowska,J. (1987).An algorithm for inducing modularrules. International Journal of ManMachineStudies 27(4) 349,370. Clancey, W.J. (1987a), From GUIDON to NEOMYCIN and HERACLES in twenty short lessons. " Lamsweerde, A. andDufour,P., Eds. Current Issues in Expert Systems. 79-123. AcademicPress. Clancey~ WJ. (1987b)...!fmowledge-Based Tutoring: TheGUIDON Program. MITPress. Clancey, W.L(1989). Viewingknowledgebases asqualitative models,IEEEExpert 4(2) 9-23. Clancey~ W.J. (I993), The knowledgelevel reinterpreted: modeling socio-technical systems. International Journal of Intelligent Systems8(1) 33-49. Compton,P. and Jansen, R. (1990): A philosophical basis for knowledgeacquisition. Knowledge Acquisition2(3) 241,258. Gaines, B.R~(1977). Systemidentification, approximationand complexity. International Journal General Systems2(3) 241-258. Gaines,B.R. (1989). Anounceof knowledge is wortha ton of data: quantitative studies of the trade-off betweenexpertise and data based on statistically well-foundedempirical induction. Proceedingsof the Sixth ~ternational Workshopon MachineLearn!ng~ pp.156-I59. MorganKaufmann, Gaines, B.R. (199!a). EmpiriCal:investigations of knowledge representation servers: Designissues and applications experience with KRS.ACMSIGART Bulletin 2(3) 45-56. Gaines, B.R. (1991b). Refining induction into knowledge.Proceedings of the AAAIWorkshop Knowledge Discoveryin Databases. pp, l-10. AAAI. Gaines, B.R. (1991c). The~ betweenknowledgeand data in data acquisition. Piatetsky-Shapiro, and Frawley, W., F_~I. Kmowledge Discoveryin Databases.pp.491-505.MITPress. Gaines, B.R. (1993a). A class library implementation-ofa principled open architecture knowledge ~.~ representation server with plug-in data types. IJCAI,93: Proceedings of the Thirteenth luternanonal JointConference on pp.Y4- . Morgan Kaufmann. Gaines, B.R. (1993b). Modelingpracficalreasoning. Int. Journal of In, regent Systems8(1) 51-70. Gaines, B.R. and Compton,P. (!993). Indu#tion of recta,knowledge:about knowledgediscovery. IEEE .... Trmmmiom on Knowledgeand Data Engineering 5(6) 990-992. Li, X. (1991). What’sso bad aboutrule-based programming? IEEESoftware 103-105. Quinlan,LR. (|979). Discoveringrules by induction fromlarge collections of examples.Michie,D., Ed. Expert Systems.inthe MicroElectronic Age.pp.I68-201.Edinburgh,F_xliinburghUniversityPress. Quinlan, J.R. (1991). Foreword.Piatetsky-Shapiro, G, and Prawley, W., F_~l. Knowledge Discovery Databases.pp.iX-xii, Cambridge,Massachusetts,MITPress. $chreiber, A.Th.~ Wiefinga, BJ. and Breuker, J~.A., F~t. (1993). KADS:A Principled Approach Knowledge.basedSystem Development.London, AcademicPress. Page 24 AAAI-94 Workshop on Knowledge Discovery in Databases KDD-94