EXCEPTION DAGS AS KNOWLEDGE STRUCTURES

From: AAAI Technical Report WS-94-03. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.
EXCEPTION DAGS AS KNOWLEDGESTRUCTURES
Brian R Gaines
KnowledgeScience Institute
University of Calgary
Calgary, Alberta, Canada T2N 1N4
gaines @cpsc .ucalgary.ca
Abstract: The problem of transforming the knowledge bases of performance systems using
induced rules or decision trees into comprehensible knowledge structures is addressed. A
knowledgestructure is developedthat generalizes and subsumesproduction rules, decision trees,
and rules with exceptions. It gives rise to a natural complexity measurethat allows them to be
understood, analyzed and comparedon a uniformbasis. This structure is a rooted directed acyclic
graph with the semantics that nodes are concepts, someof whichhave attached conclusions, and
the arcs are ’isa’ inheritance links Withdisjunctive multiple inheritance. A detailed exampleis
given of the generation of a range of such structures of equivalent performance for a simple
problem, and the complexitymeasureof a particular structure is shownto relate to its perceived
complexity. The simplest structures are generated by an algorithm that factors commonconcepts
from the premises of rules. A more complexexampleof a chess dataset is used to showthe value
of this technique in generating comprehensibleknowledgestructures.
INTRODUCTION
This paper presents th e continuation of research reported at previous KDD
workshopson the development
of knowledgestructures through the inductive modeling of databases (Gaines, 1991c; Gaines, 1991b).
Thf issue addressed is that of the Comprehensibilityof the modelto people, that is, whether the contents
of a model that performswell can be transformed into a communicableknowledgestructure that provides
insights into the basis of the performance.
The Induct methodology(Gaines, 1989) for deriving rules directly from a database through statistical
analysis has proven to be effective in modelinglarge databases such as the 47,000 cases of the Garvan
thyroid medical records over ten years (Gaines and Compton,1993). The simplicity of the computations
underlying Induct enables it to operate at ve~ high speed, on the one hand supporting interactive
developmentOf rules, and on the other enabling longitudinal data analysis studies to be undertaken such
as the incremental modelingof the Garvandata over a ten year period to determine its stationarity. The
performance of the derived models compares favorably with those developed by humanexperts and by
other empirical induction methodologies.
However,the modelsproduced of significant databases typically have such large numbersof rules that
they are not comprehensible to people as meaningful knowledgefrom which they can gain insights into
the basis Of decision making. This problem is common
to both the manuallyand inductively derived rule
sets, and seemsintrinsic to the use of production rules as the basis of performancesystems (Li, 1991).
Similar problemsoccur with equivalent decision trees wherethe basis of decision makingis not apparent
whenthere are large numbersof nodes.
It is not obvious that an excellent performance system necessarily implies the existence of a
comprehensible knowledgestructure to be discovered, Humanpractical reasoning is in major part not
knowledge-based (Gaines, 1993b), and in the knowledge acquisition community’expertise transfer’
paradigmshave been replaced in recent years by i~’knowledgemodeling’ paradigms(Schreiber, Wielinga
and Breuker, 1993) that impute the resultant overt knowledge to the modeling process, not to some
hypothetical knowledgebase within the expert. Clancey, whohas played a major role in promoting this
paradigm shift (Clancey, 1989; 1993), has done so in part based on his experience in developing overt
knowledge structures from MYCIN
rules to support GUIDON
(Clancey, 1987a) as a teaching system
based on MYCIN.In critiquing MYCIN’srules as not being a comprehensible knowledge structure,
Clancey remarks:
KDD-94
AAAI-94 Workshop on Knowledge Discovery
in Databases
Page 13
"Despite the nowapparent shortcomingsof MYCIN’s
rule formalism, wemust remember
that the programwas
influential because itworked well. The uniformity of representation, so muchthe cause of the ’missing
knowledge’and ’disguised reasoning’ described here, was an important asset. Withknowledgeso easy to
encode, it was perhaps the simple parametrization of the problemthat madeMYCIN
a success. Theproblem
could be built and tested quickly at a time whenlittle was knownabout the structure and methodsof human
reasoning..... Wecan treat the knowledge
base as a reservoir of expertise, somethingnever beforecapturedin
quite this way,anduse it to suggestbetter representations."(Clancey,1987b)
The developmentof excellent performance ~stems will remain a major practical objective regardless of
the comprehensibility of the basis of their performance. However,the challenge of increasing human
knowledgethrough developing understanding that basis remains a significant research issue in its own
right. Canone take a complexknowledgebase that is difficult to understand and derive from it a better
and more copmprehensiblerepresentation as Clancey suggests? If so, to what extent can the derivation
process be automated7
This paper presents techniques for restructuring production rules and decision trees to generate more
comprehensibleknowledgestructures. In particular, a knowledgestructure is presented that generalizes
and subsumesproduction rules, decision trees, and rules with exceptions. It gives rise to natural
complexity measures that allow themto be understood, analyzed and comparedon a uniform basis. This
structure-is a rooted directed acyclie graph with the semantics that nodes are concepts, someof which
have attached conclusions, and the arcs are ’isa’ inheritance links with disjunctive multiple inheritance.
KNOWLEDGE
STRUCTURES
In attempting to improvethe comprehensibility of knowledgestructures it wouldbe beneficial to have an
operational and psychologically well-founded measure of comprehensibility. However, there are in
general no such measures. This is not to say that one haS to fait back on subjective judgmentalone. There
are general considerations that smaller structures tend to be more comprehensible, coherent structures
more meaningful, those using familiar concepts more understandable, and so on. The internal analog
weights and connections of neural networks with graded relations of varying significance are at one
extreme of incomprehensibility. Compactsets of production rules are betterbut not very muchso if there
are no clear relations between premises or obvious bases of interaction between the rules. Taxonomic
structu~s with inheritance relations between concepts, and concept definitions based on meaningful
properties, are probably most assimilable by people, and tend to be the basis of the systematization of
knowledge in much of the scientific literature.
Ultimately, human judgment determines what is
knowledge,but it is not a suitable criterion as a Starting point for discovery since the most interesting
discoveries are the ones that are surprising. The initial humanjudgmentof what becomesaccepted as an
excellent knowledgestructure maybe negative. The process of assimilation and acceptance takes time.
Onefeature of knowledgestructures that is highly significant to discovery systems is that unicity, the
existence of one optimal or preferred structure, is the exception rather than the rule. There will be many
possible structures with equivalent performance as models, and it is inappropriate to attempt achieve
uaicity by an arbitrary technical criterion such as minimalsize on somemeasure. The relative evaluation
of different knowledgestructures of equivalent performanceinvolves complexcriteria whichare likely to
vary from caseto case. For example, in somesituations there maybe preferred concepts that are deemed
moreappropriate in being usual, fatal!" iar, theoretically more interpretable or coherent, and so on. A
structure based on these concepts maybe.preferred over a smaller one that uses less acceptable concepts.
Thus, a discoverysystem that is able to present alternatives and accept diverse criteria for discrimination
betweenthem maybe preferred over one that attempts to achieve unicity through purely combinatorial or
statistical criteria. Onthe other hand, it is a significant research objective in its ownright to attempt to
discover technical criteria that have a close correspondenceto humanjudgment.
These have been the considerations underlying the research presented in this paper: to generate
knowledgestructures that have a form similar to those preferred by people; to explore a variety of
possible structures and accept as wide a range of possible of external criteria for assessing them
automatically or manually; and to develop principled statistical criteria that correspond to such
assessmentsto the extent that this is possible.
Page 14
AAAI-94 Workshop on Knowledge Discovery
in Databases
KDD-94
KNOWLEDGE
STRUCTURES
The starting point for the research has been empirical induction tools that generate production rules or
decision trees. Oneearly conclusion was that a hybrid structure that had someof the characteristics of
both rules and trees was preferable to either alone. This structure can be viewedeither as a generalization
of rules allowing exceptions, or as a generalization of trees such that the criteria at a node do not haveto
be based on a single property,or be mutually exclusive, and the trees are generalized to partial orders
allowing branches to cometogether again. These generalizations allows substantially smaller knowledge
structures to be generated than either rules or trees alone, and overcomesproblemsof trees in dealing with
disjunctive conceptsinvolving replication of sub-trees.
Figure 1 exemplifies this knowledgestructure, termed an exception directed acyclic graph (EDAG).
may be read as a set of rules with exceptions or as a generalized decision tree. Its operational
interpretation is that one commenceswith the root node at concept 0, and places conclusion O on a
provisional conclusion list. Followingthe arrow downto the left to concept 1, one evaluates the concept
and, if it applies, replaces the provisional conclusion from that path with the specified conclusion. Then
one follows the arrows downagain to concepts 3 and 4 applying the same logic. Multiple arrows maybe
conceptualized as being evaluated in parallelBit makes no difference to the results if the graph is
followeddepth-first or breadth first.
I
conceptO"~
conclusion
0/
(bothoptional)
concept1
conclusion
1
concept
3
i. concept
4conclusion3 "concluslon4
concept
8
conclusion8
concept9
concept
2
conclusion-5
concept
6
concept
7
conclusion
7
J
concept10
1
concept12
concept 13 I
conclusion
Figure 1 Knowledgestructure--exception directed acyclic graph
Several features of EDAGs
maybe noted:
¯ Concepts at a branch do not have to be mutually exclusive so multiple conclusions can arise. An
interesting exampleis conclusion 5 whichis concludedunconditionally once concepts 0 and 2 apply.
¯ As shownat concept 4, the structure is not binary. There maybe morethan two nodes at a branch.
¯ As shownat conclusion 11, the slmc~e is not a tree. Branchesmayrejoin, corresponding to rules with
a commonexception or commonconclusion.
¯ As shownat concept 2, a concept does not have to have a conclusion. It mayjust be a common
factor to
all the concepts of its child nodes. As in decision trees, not all concepts in an EDAG
are directly
premisesof rules.
¯ As shownat conclusion 11, a conclusion does not have to have a concept. It may just be a common
conclusionto all the conceptson its parent nodes.
KDD-94
AAAI-94 Workshop on Knowledge Discovery
in Databases
Page 15
¯ As shownat the node between concepts 6 and 12, it maybe appropriate to insert an empty node to
avoid arrows crossing.
¯ The notion of "concept" used is that of any potentially decidable predicate; that is, one with truth
values, true, false or unknown.The unknown
case is significant because, as illustrated later, conclusions
from the EDAG
as a whole may be partially or wholly decidable even though some concepts are
undecidable in a particular casemusuallybecause someinformation is missing for the case.
A set of production rules without a default and without exceptions forms a trivial EDAG
with one, empty
initial node having an arrow to each rule. AnID3’style decision tree forms an EDAG
that is restricted to
be a tree, with the set of concepts fanning out from a node restricted to be tests of all the values of a
particular attribute. Rules with exceptions form an EDAG
in which every node has a conclusion. Rules
with exceptions with their premises factored for commonconcepts (Gaines, 1991b) form a general
EDAG.
The direction of arrows in Figure 1 indicates the decision-making paths. However,if the arrows are
reversed they becomethe ’isa’ arrows of a semantic network showinginheritance amongconcepts, with
multiple inheritance read disjunctively. For examplethe completeconcept leading to the conclusion 11 is
(concept 0 ^ concept I ^ concept 4) ^ (concept 9 v concept 10).
The complete concept for nodes with child nodes involves the negation of the disjunction of the child
nodes, butnot of their children. For example, the complete concept leading to the conclusion 1 is
(concept 0 ^ concept 1 ^ --,(concept 3 v concept 4)), with concepts 8 through 10 playing no role.
This last result is important to the comprehensibility of an EDAG.The ’meaning’ of a node can be
determinedin terms of its path(s) back to the root and its child nodes. The other parts of the tree are
irrelevant to whetherthe conclusion at that node will be selected. Theyremain potentially relevant to the
overall Conclusionin that they maycontribute additional conclusions if the structure is not such that
conclusions are mutually exclusive.
Froma psychological perspective, the interpretation of the EDAG
maybe seen as each node providing a
well-defined ’context’ for its conclusion, whereit acts as an exception to nodes back to the root aboveit,
and it has as exceptions the nodes immediately below it. This notion of context is that suggested by
Comptonand Jansen (1990)in their analysis of ’ripple-down rules’, the difference being that EDAG
contexts are not necessarily mutuallyexclusive.
KNOWLEDGE
STRUCTURES
Induct is used to generate EDAGs
through a three-stage process. First, it is used to generate rules with
exceptions. Second, the premises of the rules are ,factored to extract common
sub-concepts (which are not
necessarily unique, e.g. A ^ B, B ^ C and C ^ A maybe factored in pairs by A, B or C). Third, common
exception structures are merged. The second and third stages of factoring can be applied to any set of
production rules represented as an EDAG,and it is appropriate to do so to C4.5 and PRISMrules when
comparingmethodologies and illustrating EDAGs
as knowledgestructures.
The familiar contact lens prescription problem (Cendrowska,1987) will be used to illustrate the EDAG
a knowledgestructure. Figure 2showsthe ID3 decision tree for the lens problem as an EDAG,and Figure
3 showsthePRISMproduction rules as an EDAG.
The representation in itself adds nothing to the results,
but the examples illustrate
the subsumption of these two commonknowledge structures.
The
representation of the rules can be improved by factoring the premises to produce the EDAG
shownin
Figure 4.
Figures 2 through 4 are trivial EDAGs
in the sense in that no exceptions are involved. Moreinteresting
examples can be generated using C4.5’s methodologyof: reducing the number of production rules by
specifying a default conclusion; removing the ruleswith that conclusion; and makingthe other rules
exceptions to the default. This can be applied to the PRISMrules in Figure 4 by making "none" the
default and removingthe four rules with "none"as a conclusion on the left. Both PRISMand C4.5 then
generate the same set of rules with exceptions whichcan be factored into the EDAG
shownin Figure 5.
Page 16
AAAI-94 Workshop on Knowledge Discovery
in Databases
KDD-94
{
}
{,..,0,=’;O,,o:r..<’
o.<’
}I,..,,,.,,.,or,.,...,
J
=,0°==:=,0~.,°
}
{.=0°.,.°.no,=,0,o
1
age=young
" 1 fage=pre-presbyoplc~( ........ ,, ....,,, 1 (~.M,.w,.,u,,.,-,.,,,,~rm,".,,,.e/f
lelqm,,~oft ) t hme=aoft J t "~"-~’ .... y-,v,,,
j Lv.,~,-,,~ ,w,-,.,v-, ,--.,v ~ prescription=myope
lens,, hard J
F age=presbyopic~ IFage=pre-presbyoplc~~ age=young
preecdptlon=myope
~ ~rescrlpUon=hypermetrope~l
lens,none )t lene=herd
lene-none. Jt ,,he=con it. lena=no,,
Jr,
t’
Figure 2 ID3 contact lens decision tree as EDAG
t
|
lene=none
/I I I
I
I/ .
age=young
,. i :
II
"
I
i
r
i
tear production=normal
1
f astigmatism=astigmatic
’ " I
< I prescnptlon=nypermetrope
I I
,,,," ¯I I astigmatism
I
not astigmatic
I prescription=hypemietrope/
age=presbyoplc I [
/
I ....
lens=none
t
~
l,
lene=soft
)
J~
’,
- ~" aatigmattsm=asilgmallc
/~ tearpr°cl~al
IIF tearproducUon=normal
[ prescdptlon=hypermetrope
/ I astigmatism--not
astigmatici [astigmatism=not
astigmatic/
~pre-presbyopic l/
age=young
¯ [ . age=pre-presbyoplCi[
/
lens=eoft
,ens=ioft
j
,J k
t ’ lens=none
Jt
Figure 3 PRISMcontact lens production rules as EDAG
{
}
I tear production=normal]
:raslig n~lilsm=not astigmatic} { astigmalism=astigmatlc}
I
lene=none
1
{ ,.n..o. ) { ,....,,.ro
Figure 4 PRISMcontact lens production rules, factored as EDAG
KDD-94
AAAI-94 Workshop on Knowledge Discovery
in Databases
Page 17
I
I
lens=none
1
tear production=normal
I
Iastlgmat/sm-notastigmaticl I astigmatism---astigmatic
1
age=young
I pre’presbyOplcl’)l(l:iescd ption=myope
1(a’~ge=yOung
31
[prescdptl°n=hypermetr°;ll
I
lens=soft
[
1
I
leneihard
1
Figure 5 PRISM/C4.5
contact lens rules with default, factored as EDAG
Induct generates multi-level exceptions in that an exception mayitself have exceptions. Figure 6 shows
two EDAGs
generated by Induct from the 24 lens cases. Onthe left of each EDAG
the "soft" conclusion
is generatedby a simple concept that has an exception. The left EDAG
is generatedby Induct fromthe 24
possible eases. Theright EDAG
is generatedby Induct whenthe data is biased to have a lower proportion
of the "hard"exceptions (46 eases usedn2*24with the 2 examples of the "hard"exception removed
fromthe secondsetmlogieally unhanged,but the statistical test that Inductuses to determinewhetherto
generate an exception is affectedmtheexceptions are nowstatistically more"exceptionable").
I
I
I
lens=none
1
(
]
lone,soft
prescdption=myope
ags=presbyoplc
.
lens,none ,
astigmatism=astigmatic
age=young¯ , ’}
prescrlptlon=myope
~ "~
lenkhsrd
lens=none
1
tear production=normal
1
~as~mat~e==r~to~tStig
maticl[ astig
mla;inss=m~r~g
rustic
1
I ¯ prescdption=myope ~ (’prescrip.tlon=hypermetrope
lens=none
¯ ag.e=presbyopic
,~
mns=none
.)
/L
1
Figure 6 Twoforms of Induct rules with exceptions, factored as EDAG
COMPLEXITY MEASURES FOR EDAC~
The results shownin Figures 2 through 6 makeit clear that manydifferent EDAGs
with the same
performancecan be derived from existing empirical induction methodologies. It is also apparentthat
someare substantially simpler than others, and that the simpler one present a morecomprehensible
knowledgestructure. Figures 5 and 6, in partieular, illustrate that simple comprehensibleknowledge
structures are not unique---eachrepresentsa reasonablebasisfor understandingandexplainingthe contact
lens decision methodology.Figure6 left mayappearmarginallybetter than the alternatives in Figures 5
and 6 right, but one can imagine a debate betweenexperts as to which is best~Figure 5 has only one
layer of exceptionsmFigure
6 right is morebalancedin its treatmentof astigmatism.
Page 18
AAAI-94 Workshop on Knowledge Discovery
in Databases
KDD-94
It is interesting to derive a complexity
measurefor an EDAG
that correspondsas closely as possibleto the
psychologicalfactors leading to judgmentsof the relative e0mplexityOf different EDAGs.
As usual, the
basis for a structural complexityi measureis enumerative;: that iwe count the componentsin the
representation (Gaines, 1977). Theobviouscomponentsare the n0de Symbols(boxes), the arc symbols
(arrows):and the conceptand conclusionclauses. In addition, since the graphsare not trees and branches
can rejoin with possible line crossings causing vis~ confusfmn,the cyclone number(Berge, 1958)
¯ considered. It mayberegardedas counting(undated) cycles or as indicating the excess of arcs over
those necessaryto join nodes.Inductalso allowsthe optionof adisjunctionof valueswithina clause, and
sucha disjunctionis weightedby the numberof valuesmentioned,e.g. xffi5 counts1, x¢ {5, 8} counts2,
andx~ {5, [8, ’12]}counts3 (where[8,12] specifiestheinterval from8 through12).
Thederivation of an overall complexitymeasurefrom these component
counts has to take into account
the required graph reductions which should lead to Complexitymeasure reductions. The basic
requirementsare shownin Figure 7,and it is apparent that clause reduction should dominate,cycle
reductionneedsto be taken into accountwith lesser weight,andnodereductionwith still lesser weight.
Onesuitable formulais:
complexityof an EDAG
= (clauses x 4 + cycles x 2 + nodes= 1)/5
(1)
wherethe -1 and scaling by 5 are normalizationconstants.
Condensation
2 -> 1
Nodes
Arcs
1 -> 0
0->0
cC~sules
ses
2 -> 2
Complexity1.8-> 1.6
Premise Facto#ng Conclusion Factoring
Nodes
3 -> 4
Nodes
3 -> 4
Arcs
2 -> 3
Arcs
2 -> 4
0->0
~
C aC~uum 0->1
a~uses
8 -> 7
6 -> 5
¯
Complexity6.8-> 6.2
Complexity5.2 -> 5.0
¯
Cycle Reduction
Nodes
5 -> 6
Arcs
6 -> 6
2->1
cC~ades
uses
8 -> 8
Complexity8.0 -> 7.8
©
Figure 7 Graphreductions required to correspond to complexity measurereductions
For standard decision trees and productionrules there are relations betweenthe numbersin formula(1)
that allowoneto give the complexitymeasuremorespecific interpretations.
complexityof decision tree = numberof nodes+ 0.8 x numberof terminal nodes
(2)
complexityof productionrules = 0.8 x numberof clauses + 0.2 x numberof rules
(3)
Theseboth seemintuitively reasonable:asreflecting natural components
of the structural complexityof
trees and rule sets. However,they are somewhatarbitrary, and it mouldbe appropriate to conduct
empirical psychologicalexperimemswith a range of EDAGs
and evaluate the correlation betweenthe
undei’standing of knowledgen~asured in various waysand the complexitymeasuredevelopedhere.
Sufficient data wouldalso allowother complexitymeasuresto be evaluated.
Figure 8 showsthecomponent
counts and the com~tedcomplexitymeasuresfor the solutions to the lens
problems shown in Fibres 2 through 6. ~e C6mputed,eomplexity measure does appear to have a
reasonablecorrespondence
tO the variations in complexitythat are subj~tivelyapparent.
KDD-94
AAAI-94 Workshop on Knowledge Discovery in Databases
Page 19
Hgure
Example
Cyd~ Clauses
Nodes
N
:
3
4
s
D3
PRISM
PRISM,factored
15
PRISM/C4.5
default,
lO
10
14
14
9
X=A’N+I
0 .
.
0
17
4
2
C
23 .
34
19
11
11
11
8
1
Induct, factored , ,. ,
8
i i % ¯
o
13
7
6
Induct varlet, factored
.i ,
Hgure 8 Comparisonofcomplexity of contact lens EDAGs
°,
Complexity
(4C+2X+N
d)/s
21.2
29.0
19.4
11.4
10.6
i
11.6
A CHESS EXAMPLE
Thecontact lens exampleis useful for illustration but too simplefor the significanceof the complexity
reduction to be evaluated. This section presents a somewhatmoreComplex
examplebased on one of
Quinlan’s(1979) chess datasets that Cendrowska
(1987) also analyzedwith_Prism.Thedata consists
647crees of a chessend gamedescribedin termsof four 3-valued:and three 2-valuedattributes leadingto
one of twoconclusions.All the solutions describedare 100%correct.
Figure 9 showsthe 30 node decision treeproducedby ID3.andFigure 10 showsthe 15 rules producedby
C4.$for the chess data. Cendrowska
(1987) reports a substantiallyiarger tree and the samenumber
misswith someadditional clauses.
Figure 11 left and center showsthe rules producedby C4.5factored in 2 ways.TheEDAG
at the center
introducesan additional (empty)nodeto makeit clear that the possible exceptionsare the sameonboth
branches. Figure 11 right showsan EDAG
producedby Induct. AII three EDAGs
are interesting in not
beingtrees and involvingdifferent simplepresentationsof the sameknowledge.
.
line = 2: safe (324.0)
llne = 1:
r>>k = 2: safe (162.0)
r>>k = 1 :
r>>n = 2: safe (81.0)
r>>n = 1:
k-nffii:
n-wk = 2: safe (9.0)
n-wk = 3: safe (9.0)
n-wk = I:
k-r = i: safe (2.0)
k-r = 2: lost (3.0)
k-r = 3: lost (3.0)
k-n = 2:
n-wk = 2: safe (9.0)
n-wk = 3: safe (9.0)
n-wk = 1:
k-r = 2: lost (3.0)
k-r = 3: lost (3.0)
k-rffi1:
r-wk = I: lost (I.0)
r-wk ffi 2: safe (I.0)
r-wk = 3: safe (1.0)
k-n = 3:
k-r = 2: lost (9.0)
k-r = 3: lost (9.0)
k-r = 1:
r-wk = 1: lost (3.0)
r-wk ffi 2: safe (3.0)
r-wk = 3: safe (3.0)
Figure 9 ID3chess decision tree
Page 20
AAAI-94 Workshop on Knowledge Discovery in Databases
KDD-94
k-r = 2 & n-wk
k-r = 3 & n-wk
n-wk = 1 & r-wk
k-n = 3 & r-wk
k-n = 3 & k-r
k-n = 3 & k-r
= 1 & line
= 1 & line
= 1 & line
-- 1 & line
= 2 & line
= 3 & line
=
=
=
=
=
=
1
1
1
1
1
1
&
&
&
&
&
&
k-n
k-n
k-r
k-r
k-n
k-n
r>>k
r>>k
r>>k
r>>k
r>>k
r>>k
=
=
=
=
=
=
=
=
=
=
=
=
1
1
1
1
2
2
1
1
1
1
1
1
&
&
&
&
&
&
&
&
&
&
&
&
r>>n
r>>k
line
n-wk
n-wk
r-wk
r-wk
n-wk
n-wk
r>>n
r>>n
r>>n
r>>n
r>>n
r>>n
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
2
2
2
2
3
2
3
2
3
1
1
1
1
1
1
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
safe
safe
safe
safe
safe
safe
safe
safe
safe
lost
lost
lost
lost
lost
lost
Figu~ 10 C4.5 chess rules
sa~
[..-]
]
[-]
f
,,n.-t
1
/ r>>k=t J
f line-t
1
| r>>k=t /
L n,>n-t
L ,>>,.tj
J
.t ~, ,}
, [ ~.°.s
1[°.,.,] [ ~.°:~
1[~,:,]
{-.][¯ ,, ~ -~>~+
[" ]
{, ,1
[’*,*]
c ]
{k-r’2,3
] { r-wk-1
]
{-]
Figm-e11 Chessrules with exception, factored as alternative EDAGs
Figure
,,, ,i
..
I
10,,, ¯
9.
¯, 7 . ’
,, IlL
I
1112
,I,!R
KDD-94
Nodes
Example
Ce.,ndrowska
tree
. i
N
79
¯ 7S
15
Cyd~
Clauses
X=A-N+I
0
0
C
26
¯ PRISM
production
roles
16
C4.5productionrules
15
o
31
30 f
ID3decisiontree
0
II
Ii
9
11
¯3
PRISM
default,
factored
!’
9
3
, C4.5,default,
factoire, d i
8
C4.5
default+, factored
9
2
,
i
i
~ Inductexception,s,factored
7
7
1
i
. Figure 12 Comparison
Of complexityof lens EDAGs
130
¯ 73
7O
50
12
10
10
11
AAAI-94 Workshop on Knowledge Discovery in Databases
Complexity
(4C+2X+N
-1)/S
119.6
61.4
59.0
46.0
12.4
10.4
10.2
10.4
Page 21
Figure 12 showsthe complexities of various EDAGs
solving the chess problem. The decision tree
publishedby Cendrowska
(]987)!s morecomplexthan that Producedby ID3. Pp,.ISMgenerates the same
15 productionrules as does C4.5except that twoofthe rules have redundantclauses. This has a small
effect ontherelative complexityof the productionrules, but a larger effect on that of the factoredrules
since those frompRISM
do not ¯factor as well becauseof the redundantclauses. Thethree solutions
shownin Figure11 are Similar in complexityas one mightexpect. Theyare different, yet equally valid,
waysof representingthe solution.
Thereduction betweenthe tree in Figure 9 or the rules in Figure 10 and the EDAGs
in Figures 11 does
seemto go Somewayto meetingMichie’sobjections to trees or rule sets as knowledge
structures that
Quinlan(I99I)cites in the preface of the bookfromKDD’89.
It is moreplausible to imaginea chess
expert usingthe decisionproceduresof Figure11 thanthoseof Figures9 or 10.
::
INFERENCE WITH EDAGs--UNKNOWN VALUES
Inference with EDAGs
whenall concepts are decidable is simple and has already been described.
H0wcver,iinference whensomeconceptsare unknown
as tol/their truth values involves somesubtleties
that are significant. Considerthe EDAG
of Figure 13 whereConcept-1is unknown
but Concept-2is true
and the attached Conclusionis the sameas the default. Onecan then infer Conclusion= Aregardless of
whatthe truth value of Concept-1 mightbe.
I nou.o°
.1
,., I
f
D
ej coept-1-- uknown
)
. lC0nelualon. A) Concept-2 = "l’rue
Infer Conclusion
ffi A, eventhoughConcept-1
is unknown
Figure 13 Inference with unknownvalues
The required reasoning is trivial tO implementandis incorpo~ in the EDAG-inference
procedure of
(Gaines, 1991a;Gaines, i993a), a KL-ONE~Iike
knowledgerepre~ntation server. Theevaluation
a conceptas true, false or unknown
is common
in such ¯systems. It is simpleto markthe EDAG
such that
:a nodeis disregardedif: it has an unknown
Conceptbut has a child nodethat is defmitelytrue; or it has the
tameconclusionas another nodeWhoseconceptis true.
.... Theimportanceof pruningthe list of possible conclusionsis that it is the basis of acquiringfurther
information,for example,by askingthe user of the system.It is ~mportantnotto request informationthat
maybe expensiveto obtain but is irrelevant to the conclusion, Cendrowska
(1987) makesthis point
¯
strongly.in comparing
modularrules withdeci’siontrees; that, for example,in contactlens prescriptionthe
i’
Bmentof tear produa!’on is tir~e-consuming and unpi~ and shouldbe avoided if possible. She
notes that it is not required in the 3 rules producedby PRISM
shownat the lowerleft of Figure 3 but is
¯i ~. the first amilmteUutted
i~ the ID3decisiontree of Figure2. Shearguesthat the tree requiresthe testing of
this attn’Imteunnecessarily
in certaincases.
However.the three decidable cases with unknownvalues of tear production (corresponding to the
~ premisesof the three rules on the lower left of Figure 3)are all correctly evaluated by the EDAG
of
Figure 2 using the reasoningdefined above, asthey are by all the other EDAGs
of Figures 3 through7.
The’problem
Cendrowska
raises is a question of properlyusing the tree as a basis for inference, rather
than a distinction betweentrees and productionrules.
Page 22
AAAI-94 Workshop on Knowledge Discovery in Databases
KDD-94
.~" ¯ .
.-
Since the complexityfigures in Figure 8 indicate that the PRISM
roles are. on one reasonablemeasure,
more¯complexthan the ID3~e, it wouldseemthatthe argumentsfor roles beingbetter than trees are not
justified. Certainlythe highly restrictive/standard ~eision tree can be improvedin Comprehensibility
by
the use of exceptionsandfactoring, but whetherthe resultant structure is a moregeneralrule, or a set of
¯ m!eswith ezaeptions,is a matterof perspective.
TI~ simple pruning proceduredescribed aboveis sufficient todeal with the tear production problem
raised by Cendrowska.
However,it is inadequatetO properlyaccountfor all tbe nine decidablecases with
unknownvalues (correspondingto:tbe premisesof the rules in Figure 3). For example,someonewhose
tear production is normal, astig~sm is asti~ and age isyoung but whoseprescription is unknown
ghouldbe inferred as lens is hard. HOwever,
this inference cannotbe madewith the ID3tree of Figure2
using the proceduredescribedabovealone. KRScopeswith this situation by keepingtrack of the relation
betweenChildnodasthat are suchthat 0he of themmustbe true, In the case just definedit infers that the
collusion is lens is hard becausethe twonodeswith lens = hard conclusionsat the lowerright of Figure
2 are both possible, one of themmustbe true, and both havethe sameconclusion.
KRS
also disregardsa nodeif the conclusionis alreadytrue of the entity beingevaluated,againto prevent
an unnecessary
attemptto acquirefurther information.rThefour strategies describedare suchas to reduce
the list of possible conclusionsto the logical minimum,
and form a completeinference strategy for
EDAGs.
It shouldbe notedthat the completeness
of inference is dependenton the possibility of keepingtrack of
relations betweenchild nodes.Thisis Simplefor the class!ficati0nof single individualsbasedon attributevalue data. It is far more complexfor EDAG-baSed
reference with arbitrary KL,ONE
knowledge
structures involvingrelated individuals, whereit is difficult to keep¯track Ofall the relations between
partially openinferences. This correspondstO managing
the, set Of possible extensionsof the knowledge
base generatedby resolvingthe unknown
concepts,and is inherently intractable.
CONCLUSIONS
_ The problem of transforming performance systems based on induced rules or decision trees into
¯ comprehensibleknowledgestructures havebeen addressed. AknowledgeStructure, the EDAG,
has been
develol~dthat generahzesand subsumesproductionruleS, decision, trees, and rules with exceptions. It
: gives rise to a natural complexitymeasurethat allows themto be understood,analyzedand comparedon a
"
uniformbasis. AnEDAG
is a rooted directed acyciic graph=wi~the semanticsthat nodes are concepts,
! "axe’ hsa"
Rantnhen
l~:with
multiple
some
of
which
have
attached
conclusions
ce disjunctive
¯.
"
i ii , and the arcs
inheritance. A detailed examplehas been givenof tbe generation ofa range of such structures of
equivalent performancefor a simpleproblem~and the Complexitymeasureof a particular structure have
beenshownto relate to its perceivedcomplexity.Thesimplest structures are generatedby an algorithm
- that factors co~nconcepts from the: premises:of rules. Asecond exampleof a chess dataset was used
to showthe value of this techniquein generatingcomprehensibie
knowledge
structures.
It is suggestedthat the jtechniquesdescribedin this paperwill be useful in allowingknowledge
structures
! developedby empirical induction that have goodperformancebut are not humanly.comprehensible
to be
U~nsfonned
into ones that maybe morecomprehensible.
Areferee raised the questionof the relative effectiveness of c4.5 and Induct, and it is appropriateto
comment
briefly on this. Therewere two motivationsunderlying the developmentof Induct. First, to
developa high speed induction enginefor usein interactive knowledge
acquisition with large datasets.
,%Condito g~--n~teknowledge
structures that reflected humanconceptualframeworks.
" Thefirst objective was addressedby extendingthe techniques of PRISM
to generate rules directly from
the data rather than by post-processing:Ofa decision tree, and by Usingdata structures (bit maps)that
speededthe computationsrequired. Onlarge datasets InduCtis generallyover !00 times faster than C4.5
¯ ’i : in generatingrules with equivalentperformance.However,the complexityanalysis of the algorithmsis
¯
similar, and the twocouldbe madecomparablein speedby applyingthebit mapdata structures in a CA.5
implementation.
KDD-94
AAAI-94 Workshop on Knowledge Discovery in Databases
Page 23
Thesecond
objective wasaddressedby: deVelopingrules with exceptionsusing a statistical methodology
to determinewhetheran over-general:rule ihavingerror cases .is to be specialized by addingadditional
r premiseclauses, or left as it is Withthe exceptioneases
beingcoveredby additionalrules. Thestatistical
. test used is to Calculate whethera rule is successful by chance (Gaines, 1989), and to comparethis
probabilityfor the rule witherrors with. those for therulesobtained
byaddingclausesto it.
What.this paper showsis that, for somedatasets, the secondobjective can also be achievedby postprocessingthe rule Sets developedusing algorithms:Such as those of C4,5to generate EDAGs
that are
~mbstanfiallysimpler andhavea hierarchical structure similarto that of humanconceptualframeworks~
Experimentsare underwayto evaluate the post.processing:Whenapplied rules derived fromdata having
m~ydecision outcomesWheremulti’level exceptions seemto generate the most compact,and meaningful
knowledge
structures.
ACKNO~~
This workwasfundedin part bythe Natural Sciencesand EngineeringResearchCouncilof Canada.I am
grateful to Paul Compton
and Ross Quinlanfor access to their research and for discussions that have
influencedthis work.I amalso grateful to the anonymous
referees for helpful comments.
REFERENCES
Berge, C. (1958). The Th~ryof Graphs. London,Methuen.
Cendrowska,J. (1987).An algorithm for inducing modularrules. International Journal of ManMachineStudies 27(4) 349,370.
Clancey, W.J. (1987a), From GUIDON
to NEOMYCIN
and HERACLES
in twenty short lessons.
"
Lamsweerde,
A. andDufour,P., Eds. Current Issues in Expert Systems. 79-123. AcademicPress.
Clancey~ WJ. (1987b)...!fmowledge-Based Tutoring: TheGUIDON
Program. MITPress.
Clancey, W.L(1989). Viewingknowledgebases asqualitative models,IEEEExpert 4(2) 9-23.
Clancey~ W.J. (I993), The knowledgelevel reinterpreted: modeling socio-technical systems.
International Journal of Intelligent Systems8(1) 33-49.
Compton,P. and Jansen, R. (1990): A philosophical basis for knowledgeacquisition. Knowledge
Acquisition2(3) 241,258.
Gaines, B.R~(1977). Systemidentification, approximationand complexity. International Journal
General Systems2(3) 241-258.
Gaines,B.R. (1989). Anounceof knowledge
is wortha ton of data: quantitative studies of the trade-off
betweenexpertise and data based on statistically well-foundedempirical induction. Proceedingsof
the Sixth ~ternational Workshopon MachineLearn!ng~ pp.156-I59. MorganKaufmann,
Gaines, B.R. (199!a). EmpiriCal:investigations of knowledge
representation servers: Designissues and
applications experience with KRS.ACMSIGART
Bulletin 2(3) 45-56.
Gaines, B.R. (1991b). Refining induction into knowledge.Proceedings of the AAAIWorkshop
Knowledge
Discoveryin Databases. pp, l-10. AAAI.
Gaines, B.R. (1991c). The~ betweenknowledgeand data in data acquisition. Piatetsky-Shapiro,
and Frawley, W., F_~I. Kmowledge
Discoveryin Databases.pp.491-505.MITPress.
Gaines, B.R. (1993a). A class library implementation-ofa principled open architecture knowledge
~.~ representation server with plug-in data types. IJCAI,93: Proceedings of the Thirteenth
luternanonal
JointConference
on
pp.Y4- . Morgan
Kaufmann.
Gaines, B.R. (1993b). Modelingpracficalreasoning. Int. Journal of In, regent Systems8(1) 51-70.
Gaines, B.R. and Compton,P. (!993). Indu#tion of recta,knowledge:about knowledgediscovery. IEEE
.... Trmmmiom
on Knowledgeand Data Engineering 5(6) 990-992.
Li, X. (1991). What’sso bad aboutrule-based programming?
IEEESoftware 103-105.
Quinlan,LR. (|979). Discoveringrules by induction fromlarge collections of examples.Michie,D., Ed.
Expert Systems.inthe MicroElectronic Age.pp.I68-201.Edinburgh,F_xliinburghUniversityPress.
Quinlan, J.R. (1991). Foreword.Piatetsky-Shapiro, G, and Prawley, W., F_~l. Knowledge
Discovery
Databases.pp.iX-xii, Cambridge,Massachusetts,MITPress.
$chreiber, A.Th.~ Wiefinga, BJ. and Breuker, J~.A., F~t. (1993). KADS:A Principled Approach
Knowledge.basedSystem Development.London, AcademicPress.
Page 24
AAAI-94 Workshop on Knowledge Discovery in Databases
KDD-94