From: AAAI-87 Proceedings. Copyright ©1987, AAAI (www.aaai.org). All rights reserved. Improving Inference Throu Clustering Douglas Department Fisher of Information and Computer University of California Irvine, California abstract 92717 conceptual Conceptual clustering is an important way to summarize data in an understandable manner. However, the recency of the conceptual clustering paradigm has allowed little exploration of conceptual clustering as a means of improving performance. Science clustering capabilities. clustering, classifications produced ing systems by capturing means of guiding inference nodes and generating inter-correlations at classification inferences as a by-product tree of classifica- tion. Results from the domains of soybean and thyroid disease diagnosis support the success of this approach. can be a basis seen object properties. This paper tual clustering. environmental paper learning through finement is concerned automated [Dietterich, with improving knowledge 19821. Learning acquisition filters perforand re- and incorpo- measure, rates environmental observations into a knowledge base that is used to facilitate performance at some task. Assumptions Section the environnnent, knowledge base, ramifications and performance task all have important on the design of a learning algorithm. This paper is concerned with conceptual clustering, a task of machine learning that has not been traditionally discussed in the larger context of intelligent processing. Conceptual clustering systems [Michalski and Stepp, 1983; Fisher, 1985; Cheng and Fu, 19851 accept a number of object descriptions (events, observations, facts), and produce a classification scheme over the observed objects. Importantly, conceptual clustering methods do not require the guidance of a teacher to direct the formation of the classification (as with learning bm examples), but use an evaluation function to discover classes with good conceptual description. These evaluation functions generally favor classes exhibiting many differences between objects of different classes, same class. surrounding portant and few differences As with other forms of learning, the conceptual implications Perhaps ing clustering between clustering task objects of the the context can have im- on the design of these systems. the most important is the performance contextual factor surround- task that benefits from of classification is manifest in recent [Clancey, concerned that inference called during issues - in classification classification.’ maximize 3 describes der of the paper from knowledge the COBWEB focuses classification ticularly on soybean COBWEB egory utility of object the amount used This 19851, of information of class membership. algorithm. The remain- on the utility of COBWEB gen- trees for inference, concentrating par- disease diagnosis. uses a measure of concept [Gluck and Corter, classes trees The following [Gluck and Corter, category utility that can be inferred erated as a 19841. with performance with the utility of COBWEB classes of un- discussions section motivates and develops an evaluation function by COBWEB to guide class and concept formation. favors about The generality cluster- inference describes the COBWEB system for concepCOBWEB’s design was motivated by both and performance concerns. Bowever, this is primarily to facilitate lUachine by conceptual for effective as classification of problem-solving particular, mance do [1985] have used clustering techniques to organize expert system knowledge. Generalizing on their use of conceptual This paper presents COBWEB, a conceptual clustering system that organizes data to maximize inference abilities. It does this attribute While most systems not explicitly address this task, exceptions do exist. In particular, Cheng and l?u [1985] and l?u and Buchanan and concepts. quality called cat- 19851 to guide formation While our primary inter- est in category utility is that it favors classes that maximize inference ability, Gluck and Corter originally derived category utility as 8 means observed during human from occurs a psychological construct in hierarchical be where inference of predicting classification. effects called the b&c classification abilities certain These schemes effects stem keel that and seems to are maximized. ‘COBWEB is ralso distinguished from other systems in that it is incremental. Issues surrounding COBWEB’s performance as an incremental system are given in [Fisher, 19871. Category utility rewards information rich categories Category and for each category easily developed) as a tradeoff of intra-class object similarity and inter-class dissimilarity. Objects are described in attribute terms of (nominal) mation attribute and Size = large). and class, the same value (xi), [Lebowitz, class similarity 19821 similarity the larger the value pair, Ai = Kj, is measured in terms is measured classes that in terms share this value, ,E;Z=l Ci Cj P(& is a tradeoff and of partition = Kj)P(Ck(Ai tions, to increase of predictability This function predictiveness quality. values subsume exists Inforprob- conjunctive) Probabilistic these types of logical represent afrom probabilistic This increased generality comes constant of storage requirements. COBWEB that than incrementally chical classification is and predic- propriate into a is formed, where each node is a proba- path, updating distributional information one of several possible along operators the in- More precisely, = V&)P(CklAi objects bilistic concept representing an object class (e.g., Birds BodyCover = feathers (1.0) and Transport = fly (0.88) and . ..). The incorporation of an object is basically a process of classifying the object by descending the tree along an ap- infrequently as rewarding class partitions. incorporates classification hierarchy. Given an initially empty hierarchy, over an incrementally presented series of objects, a hierar- A. Placing an Object in an Existing Class = I$) = (B aYes rule), so by substitution Perhaps the of objects equals class. values guessed for an arbitrary That the most place is, after updating at the root, the distribution may into which child ‘best’ is tentatively placed in each child. Finally, Gluck and Carter define category utility to be the increase in the expected number of attribute val- a given node is evaluated using category utility. The node to which adding the object results in the best partition is ues that the best existing can be correctly the expected edge (C<Cj = guessed number of correct P(& = Kj)2). (P(Ck) C; Cj P(Ai of a partition {Cl, = . ..> Cn3, over CV({Cr, [CL P(CAG) Ci Cj P(A = KjlCk)2] - Ci Cj Cz, . . . . C,,)) P(& n = qj)’ . The denominator, n, is the number of categories in a partition, and averaging over categories allows comparison of different size partitions. 2This assumes a probabditty matching guessing strategy, meaning that an attribute value is inessed with a probability equal to its probabiity of occuaring, as opposed to a probability maximizing strategy which assumes that the most frequently occurring value is always guessed (see [Fisher, 19871). $62 Machine Learning & Knowledge Acquisition from adding the object to host for the new object. 33. Creating a New Class guesses with no such knowl- Formally, that results of attribute be incorporated To determine the object a partition in an existing ber of class Ck. a V<jlCk)‘) given knowledge The partition way of updating a new object the object children. hosts a new object, mem- natural is to simply one of the root’s Ci Cj P(A = vijlck) 2 is the expected number of attribute values that can be correctly distinguish a simple mapping the way, and performing at each level. can also be regarded = KjlG) above function value distributions is termed 19811. are = Vij(Ck), predictability note that for any i, j, and I%,P(Ai p( Ck)P(Ai as there the proportionality Specifically, and predictiveness occurring of object representation and Medin, at the cost of storing the probabilities, each of which can be computed from two integer counts, thus only increasing in contrasting = &)-?‘(A the class conditioned ference potential on attribute to logical representations. summed for all classes’(k), attributes (i), and values (j). The probability P(Ai = V&) weights the importance of individual values, in essence saying that it is more important tiveness of frequently occurring values. [Smith and thus the more predictive the value is of the class. Attribute value predictability into a measure Such a category concept represent ations of P(Ck IA; = Ki); the fewer objects as is P( A; = V;j ICk) for all abilistic concepts from the logical (generally representations typically used in AI systems. and thus the more predictable is of class members. Inter- value this probability, combined Color = red 0f a partition, values. a probabilistic probability P(A; = V&JCk); the larger this t e greater the proportion of class members l-f probability, sharing For an attribute Ck, intra-class of a conditio - value pairs (e.g., given P(Ck) is known utility can be computed is thus of generic value, but it can also be viewed (and more In addition to placing objects in existing classes, there is a way to create new classes. Specifically, the quality of the partition resulting from placing the object in the best existing host is compared to the partition resulting from creating a new singleton class containing the object. Class creation is performed if it yields a better partition (by category utility). This operator allows COBWEB to adjust the number of classes at a partition to fit the regularities of the environment; the number of classes is not bounded by a system parameter (e.g., as in CLUSTER/2). 6. Merging and Splitting While operators themselves 1 and 2 are effective in many cases, by they are very sensitive to initial input order- Table P: COBWEB FUNCTION COBWEB (Object, 1) Update counts of the Root 2) IF Root is a leaf THEN Return expanded Root Object ( of tree )) COBWEB that one whose efficacy properties best hosts [Pearl, of the following that a new class if appropriate 2b) Merge nodes if appropriate COBWEB (Object, 2c) Split a node if appropriate COBWEB (Object, and call COBWEB Each Root) or c) then call thora To guard COBWEB against the effects of initially also includes and splitting. two operators The function of a level (of n nodes) the resultant partition of merging skewed data, Rot, These disease for node merging and ‘combine’ them in hopes that (of n-l nodes) is of better quality. An experiment to see whether The merging of two nodes simply involves creating a new node and combining attribute distributions of the nodes for effective being merged. (but of the newly attempted The two original nodes Although node. created on all possible are made children merging could be node pairs every time an object 5th instance, not incorporated) classified Phytoph- of 36 attributes = rotted, in which soybean to COBWEB classification After . ..) unseen could be used every cases were classified with respect no information disease in order incorporating to the classification up until that point. contained Rot, Rot).3 diagnosis. the remaining Four were also included a total presented bhe resultant tree conseruceed Root Root-condition was conducted disease 19841. in the data - Diaporthe making = Charcoal cases were incrementally domains, along 35 attributes. designations = low, by of classifica- disease cases [Stepp, Rhizoctonia description, Precipitation and and organized The utility were represented Diagnostic-condition is to take two nodes important was tested in the several Charcoal in each object ing. system. was described diseases roe. (e.g., heuristic that or ‘hidden causes’ can be extracted clustering case (object) stem rot, (Object, on regularities a set of 47 soybean soybean from cate- independent Cheng and Fu, 19851 in the environment, these regularities including tend to maximize on the assumption are dependent 1985; that that can be inferred This is a domain tion trees for inference node) 2d) IF none of the above (2a,b, classifications depends a conceptual and call Merged forms of information gory membership. leaf to accommodate and perform 2a) Create structure the amount the new object Find that child of Root ELSE control Test instances regarding being 3iagnosCc is observed, such a strategy would be unnecessarily redundant and costly. Instead, when an object is incorporated, only merging the two best hosts (as indicated by category condiCon’, utility) is evaluated. Besides node merging, of the classification tree. This leaf represented that previously observed object that best matched the test object. node splitting may also serve but the value of this attribute was inferred as a Specifically, classification eerbyproduct of classification. minated when the test object was matched against a leaf to increase partition quality. A node of a partition (of n nodes) may be deleted and its children promoted, resulting in a partition of n+m-1 nodes, where the deleted node had The diagnostic m children. Splitting is considered only for the children the best host among the existing categories. had been incorporated. The graph of Figure I gives the results of the experThe graph shows that after 5 randomly selected iment. COBWEB’s control structure is summarized As an object is incorporated, at most plied at each tree level. Compositions operators can be viewed as transforming tion tree. Fisher hill-climbing possible [1987] adopts (without top-down erators of merging bidirectionally or bottom-up in this space, through keeps update average branching is the tree in a but the inverse op- allow COBWEB thus allowing operator to move an approxima- application. cost small (B”ZogBn factor classified Fisher ing robustness. to building fashion, and splitting strategy weaknesses COBWEB trees. tion .of backtracking of previously a single classifica- the view that are not restricted strictly in Table 1. one operator is apof these primitive backtracking) through the space of In order to maintain robust- class&a&ion ness, operators of This where B is the of the tree and n is the number objects), while maintaining learn[1987] addresses the strengths and of this approach in more detail. condition be the corresponding was terminated of the test object condition after’one was guessed to of the leaf. The experiment half of the domain (of 47 cases) instances, the classification could be used to correctly diagnose disease (over the remaining 42 unseen cases) 88% of the time. After 10 instances, 100% correct diagnosis was achieved impressive, In fact, and maintained. While these they follow from the regularity when COBWEB results seem of this domain. was run on the data with no in- formation of Diagnostic condition at all, the four classes were ‘rediscovered’ as nodes in the resultant tree. This indicates that Diagnostic of attribute correlated the various correlations. network condition of attributes, Diagnostic participates In organizing conditions in a network classes around classes corresponding are generated (Figure the to 2). 3While Diagnostic condition was included in each object descrip tion, it was simply treated as another attribute. Diagnostic condition was not treated as a teacher imposed class designation as in learning from examples. Fisher 463 % of Correct Diagnoses % of Correct loo- cobweb 90. 80. 80. 70. 60. 70. 50. 60. 40. 50. 30. 40. 20. 30. lo- 20. 10. OY 5 10 # of Incorporated Figure Predictions 1001 90. 1: Success at inferring b5 20 25 oybean Cases ‘Diagnostic condition’ -’ #5of Cl$lerve$5Cbj?$s 2’ Figure 3: Prediction over all attributes that is averaged over all attributes, A;, not equal to AM. This measures the average increase in the ability to guess a value of AM given one knows the value of a second atattributes, A;, = VMj,lAi = xji) = P(AM = VMjM) for all Ai, and thus P(AM = VMjMIAi = Vrji)" - P(AM = Vnaj,)” = 0. tribute. then If AM is independent dependence In Figure 2: A partial tree over soybean The success at inferring - Diagnostic condition relationship between an attribute’s dependence afforded by the COBWEB tree over the frequency-based as a function cases of all other P(AM 4 the advantage classification Figure is 0 since of attribute dependence. method Each is shown point repre- sents implies a on other attributes and the utility of COBWEB classification trees for induction over that attribute. To further characterize this relationship, the induction test conducted for Diag- one of the 36 attributes used to describe soybean The graph indicates a significant positive correlacases. tion between an attribute’s dependence on other attributes and the degree that COBWEB trees facilitate correct inference. For example, dependencies Diagnostic with many most predictable other condition attributes participates attribute. nostic condition was repeated for each of the remaining 35 attributes. The results (including Diagnostic condition) were averaged over all attributes and are presented in Figure 3. On average, correct induction of attribute values for unseen objects levels off at 88% using the COBWEB generated classification tree. To put these results into perspective, Figure 3 also graphs the averaged results of a simpler, but reasonable ,% Increase in Correct Prediction 100 90 80 70 DiagnostiKcondition 60 inferencing strategy. This ‘frequency-based’ method dictates that one always guess the most frequently occurring value of the unknown attribute. Averaged results using qm 50 40 Root-condip ’ a I this strategy level off at 72% correct prediction, placing it at 16% under the more complicated classification strategy. While averaged results est is determining are informative, a relationship dependencies and the ability tribute’s value. Dependence of an attribute, Ai, is a given as a function the primary between to correctly inter- attribute inter- predict an at- AM, on other attributes, of C~M[P(AM = VSkjx(Ai = J&i)’ - P(AM = VMjM)2], 464 Machine learning & Knowledge Acquisition 10 0 i Figure : I 1 0.15 q.05 Attnbute Depe: clence 4: Prediction as function of attribute dependence -1ol 0 w in and is also the V. Concluding Remarks [Clancey, 19841 W. J. Clancey. Classification problem ing. To summarize, CQBWEB the soybean captures tween attributes, sification ference and summarizes tree nodes. of attributes in attribute obtained data strongly the important that these correlations beat clas- In doing so, COBWEB promotes in proportion participation inter-correlations. using thyroid Experimentation suggests inter-correlations to their Similar disease data results [Fisher, above assumed 19871. proceeded a missing attribute value. Further studies have indicated however, that for properties of previously the inductive task of predicting unseen objects, one eighth ference classification the depth of the tree to obtain results. This ing intermediate attribute sonably need only proceed behavior node value prediction accurate). emerges default values In COBWEB, in- as a result of us- to determine when is cost effective a level where the attribute to about comparable (cheap, default approximates but rea- values occur conditional at inde- pendence from other attributes - in this case, knowing the value of other attributes will not aid in further classification, and prediction might as well occur at this level. In this light, COBWEB Finally, ful tool but can be viewed as an incremental version of a system satisficing this work casts by Pearl conceptual for problem-solving, well-defined by assigning performance task and [1985]. clustering as a use- it the generic, of inferring unknown attribute values. The future should find that as conceptual clustering methods utilize more complex representation languages be interpreted solving tasks. [Stepp, 19841, as improving so too more can their behavior sophisticated Kaufmann, [Dietterich, problem- William task for conceptual Schlimmer, reviewers clustering. Pat Langley, for numerous work was supported Rogers Thanks a performance also go to Hall, and anonymous ideas and helpful comments. by Hughes Aircraft Jeff AAAI This Company. Chapter 14: Lear&g and P. Cohen & E. Feigenbaum Kaufmann, (Eds.), Los Altos, intelligence. CA: Inc. Ninth International gence (pp. 691-697). Conference & B. Langley. ApProceedings of the on Artificial Los Angeles, Intelli- CA: Morgan Kauf- mann. [Fu and Buchanan, 19851 L. Fu & B. Buchanan. Learning intermediate concepts in constructing a hierarchical knowledge base. Proceeding8 of the Ninth Hnterna- tional Joint 659-666). Conference Los Angeles, [Gluck and Carter, on Artificial CA: Morgan Intelligence (pp- Kaufmann. 19851 M. Gluck & J. Corter. tion, uncertainty, Informa- and the utility of categories. Proceedings of the Seventh Annual Conference of the Cognitive Science Society (pp. 283-287). Irvine, CA: Lawrence Erlbaum Associates. [Lebowitz, 19821 M. L e b owitz. eralizations. [Michalski tering Cognition and Stepp, tomated Correcting erroneous $t R. Stepp. of classifications: versus numerical gen- and Brain Theory, 5, 367-381. 19831 R. Michalski construction on Pattern 409. Conceptual Auclus- taxonomy. IEEE Transactions and Machine Intelligence, 5, 396- Analysis 19851 J. Pearl. Learning hidden causes from em- Joint Proceedings of the Ninth International Conference on Artificial Intelligence (pp. 567- 572). Los Angeles, [Smith with Dennis Kibler suggested inference. [Fisher and Langley, 19851 D. Fisher proaches to conceptual clustering. pirical Acknowledgements William [Fisher, in press] D. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning. [Pearl, Discussions 19821 T. Dietterich. inductive solv- on Arti- Inc. The handbook of artificial have been classification all the way to leaves before predicting in- Proceedings of the National Conference ficial Intelligence (pp. 49-55). Austin, TX: data. and Medin, CA: Morgan 19811 E. gories and concepts. versity Smith Kaufmann. & D. Medin. Cambridge, MA: Cate- Harvard Uni- Press. [Stepp, 19841 R. Stepp. Conjunctive conceptual clustering: A methodology and experimentation (Technical Report UIUCDCS-R-84-1189), IL: University Doctoral Dissertation, of Illinois, Department Urbana, of Computer Sci- ence. eferences [Cheeseman, 19851 P. Cheeseman. In defense of probability. Proceedings of the Ninth International Joint Con- ference on Artificial ’Angeles, CA: Morgan Intelligence (pp. 1002-1009). Los Kaufmann. [Cheng and Fu, 19851 Y. Cheng & King-sun Fu. ceptual clustering in knowledge organization. Transactions on Pattern ligence, 7, 592-598. Analysis and Machine Con1EE.E Intel- Fisher 465