From: Proceedings of the Eleventh International FLAIRS Conference. Copyright © 1998, AAAI (www.aaai.org). All rights reserved. Concept Formation taking into account Property Relevance UNIFORUniveraidadede Fortalem Av. Washington Seams1321 Centrede Ci~nciasTecnoldgicas-CCT Fortaleza- CEBrazil vaseo@~c.br Abstract Weproposea methodto incromantallycomputeand use, duringthe conceptformation,the relevanceof a concept property.Thisrelevanceis computed throughthe accountof the propertycorrelationwithotheronesandit is usedbythe conceptquality functionin order to improvepredictive accuracy.Theproposedapproachis analyzedcontenting boththe wedictionpowerof the generatedconceptsandthe time and space complexityof the c4mceptformation algorithm. Initial resultsshow that, in the taskof prediction of valuesfor severalattributes, the proposedmethod has improved the wedictionpow~of the generatedconcepts. Introduction Thegeneral aimof conceptformationis to construct, based on entity descriptions (observations), a (usually hierarchical) categorization of entities. Eachcategoryis provided with a definition, called a concept, which summarizes its elements.Further aim is to use conceptsto categorize newentities and to makepredictions concerning unknown values of attributes of these entities. Therefore, quality of a conceptcan be measured in termsof its ability to make good predictions about unknog~ values of attributes (the prediction power). Unlike supervised systems, in whichconcept quality is measuredfrom the capacity in discoveringa value for a single property, in conceptformation,the quality of a conceptis measuredby its capacity in allowingprediction of values for several attributes. An importantaspect that must be considered in concept formation concerns the relevance (sometimes called salience) of particular attributes/values (or properties). Cognitivepsychology(Tversky1978), (Seifert 1989), machinelearning (Gennari, Langley and Fisher 1989), (Stepp and Michalsld 1986), (Decaestecker 1991) researchers havepointed out the necessity to determine howmucha certain propertyis relevant withina concept. In this paper, we propose a method to compute the relevance of a concept property basedon the correlation betweenproperties of the entities coveredby the concept. We describe our approach using a COBWEB-based algorithm, called FORMVIEW, whichcan generate several hierarchies of categories describingdifferent perspectives (Vasco97). In a multi-perspectivecontext, the relevance a propertyis crucial becauseit determinesthe hierarchical organization of categories. Since FORMVIEW uses a probabilistic representation,correlation betweenproperties is computedfromconditional probabilities. However,the proposed approach is generic and can be employedin algorithms,whichmayuse other representations. The prediction power of the category hierarchies generated by the proposed method is computed and compared with those generated by COBWEB.In particular, we analyze the capacity of the categories generatedby these algorithmsin prediction of values for several unknown properties of entities. Weshowthat, in such a situation, property relevance based on the correlation betweenproperties improvesthe prediction power of hierarchies, which are produced by concept formationsystems. Concept Formation Systems Concept formation (CF, for short) or incremental conceptualclustering systems(Stepp andMichalski1986), (Fisher 1987) recognizeregularities amonga set of nonpreclussified entities and inducea concepthierarchy that summarizes these entities. ACFalgorithmis reducedto a search,in the spaceof the all possibleconcepthierarchies, for that onethat coversthe observedentities andoptimizes an evaluation function measuringa quality criterion. In conceptformationentities are treated oneafter anotheras soonas they are observedand the classification of new entities is madeby their adequacy for the existing conceptualcategories. Typicallyconceptrepresentationis probabilistic (Fisher 1987), in whichconceptshavea set of attributes and all possible values for them.Eachconcepthas the probability that an observationis classified into the conceptandeach valueof a conceptattribute has associateda predictability anda predictiveness(Fisher 1987).Thepredictability is the conditionalprobabilitythat an observationx has valuev for an attribute a, giventhat x is a member of a category(2, or P(affivlC). Thepredictivenessis the conditionalprobability that x is memberof C given that x has value v for a or P(Cla--v). FrameI sketchesan algorithmfor conceptformation. |Copyright O 1998 AmericanAssociation for Artificial Intelligence (www.aaai.erg). All rlghts mervcd. Machine Learning225 FUNCTION PRINCIPAL(Root, Observation) I. IncorporateObservationin Root 2. Choosethe best operatorto employon the partition P of the Root’sdirect sub-concepts,among the following: a) IncorporateObservationinto a conceptof P b) Createa newconceptunderRootto receive Observation c) Mergethe 2 best concepts of P in a newconcept that includes Observation d) Split a conceptof P in its children, addingObservationto the best of these 3. If the operation2b, read anotherobservation Else return to 1 with Root = the concept of P in which Observationwasinserted relationship between attribute dependenceand the abifity to correctly infer an attribute’s value using a probabilistic concept hierarchy. Wecan thus determine those attributes that depend on others and, as a consequence, those that influence the prediction of others. Byan influent attribute, we mean that, if we know its value, it allows a good prediction about the value of others. Wehave thus defined the total influence Tinfl of an attribute Akon others A~as the following: FrameI a CF’$structure of control Our claim is that attribute dependencegives a measureto ponder attribute relevance, which, in this context, means howmuchan attribute correlates with others. Concept Formation taking into account Relevance of a Property Mdep(Am, Tinfl(A*)= A(p,)p(p,~C)p(C]p,)- the Computing property relevance for each concept To compute an attribute dependence, we have defined the probability of predicting a property p given another property p’ ; P(p[p ’). Actually, for each conceptC within a hierarchy, we have P(p[p’ and C). The acquisition of this probability is problematic, since it cannot be computed only from the predictability and predictiveness stored by FORMVIEW. Instead of keeping all the 2x2-property correlation, which would take too much space, or of computing such a correlation for each new observation. which would be computationally expensive, we have defined a procedure that implements a tradeoff between time and space requirements. AI A2 A3 A4 Obsl vl v3 v5 v7 Obs2 v2 v4 v6 v8 Obs3 vl v3 v6 v7 Obs4 v2 v3 v6 v8 a2 Iv3 v4 ~ P(C)p(p#) al vl v2 a2 v3 v4 a3 v5 v6 a4 v7 WhereA(pJ = the relevanceof the category property Pi + P(p~)). Computing property relevance To compute property relevance, FORMVIEW uses a strategy that relies on attribute dependence(or attribute correlation) in the way that was defined by Fisher (Fisher 1987). Formally, the dependence of an attribute A,. on other n attributes A; can be definedas : v8 ¢ WhereV#, signifies the jith value of attribute At and Aj Am. In fact, this function measuresthe average increase in the ability to guess a value of A, given the value of a second attribute Ai. Weconsider that this strategy accounts for the 226 Vasco where Ak ~eA,. n Using the relevance of a property in concept formation FORMVIEW’s concept formation process is similar to that illustrated in Section 2. It constructs probabilistic concepts while privileging their prediction power. The category quality function is, like manyof its predecessors, based on the Gluck and Corter’s work on cognitive psychology (Gluck and Cotter 1985), who have defined a function discover, within a hierarchical classification tree, the basic level category. Category quality is defined as the increase in the expected numberof properties that can be correctly predicted given knowledgeof a category over the expected number of correct predictions without such knowledge. FORMVIEW, in addition, takes into account the relevance of the propertics. Weconsider tlmt the increase in the expected number of properties to be predicted from a category depends on the relevance of its properties. Formally,the utility of a category UCais defined as below: UCs (C)= At) .=l 2 1 3 1 1 ............ a3 a4 v$ v6 v7 v8 1 1 2 2 2 1 2 2 1 1 1 -i ...... ~]1 3 __[ 1 2 ............. Frame 2 Example ofu triangular array used by FORMVIEW after 4 observations Our procedure consists of maintaining two triangular arrays which keep the 2x2-property correlation" T-root and T-son. These arrays keep such a correlation for all the observations already seen. T-root is updated once for each new observation. It allows computingthe relevance of the root’s properties. T-son accompaniesside by side the path followed by the observation during its categorization. It is updated at each hierarchical level until the observation arrives at the leaves. Frame2 exemplifiesthe formatof the usedarraysafter four observations. Procedures for computing the relevance of a property FORMVIEW procedures for computing dependence and correlation are shownin Frame3. Theywereinserted into the procedure which does the incorporation of an observation(steps 4 and5 in Frame3). Step four concerns the update of the arrays, which maintain the frequeney of a property correlation. Two proceduresare responsiblefor that activity: Update.Array and RefineTson. In Update.Array, the frequency of a property correlation is computedfor all the concept properties. Theretrieve of rids frequencyfor a specific propertyis donein step 1.1.1. FunctionShift allowsit to access the cell whichstores the correlation betweenthe properties. Let us supposethe exampleshownin Frame3. For the properties (Al--vl) and(A4--v7),Sh/fl returns 1 and shifts 6 columns,whichcorrespondto the samof all the valuesof attributes less than (consideringthe array’s order) atlribute A4. Thus,wecan access cell (1,7) of array. Step five in Frame3 concerns the computationof the relevanceof each conceptattribute of the hierarchy. The procedureComputeRelevance computesthe relevance of a propen’yusingthe conditionalprobabilitythat an entity has a property given that another property is known.These properties are computedfromthe frequencyof a property correlation representedin T-son. The procedure RefineTso, refines T-son each time FORMVIEM descends the concept hierarchy. This refinementconsists of updatingT-sonin orderto let it only with the account of the existing correlation betweenthe properties of observations covered by the current root concept. For each concept, T-son’s actualization is based on the followingstrategy: if the quantity of observations coveredby a conceptis greater than the total of its brothers (children of the concept’sfather), T-sonis updatedfrom those observations which are covered by these later. Otherwise,T-sononly stores rite propertycorrelation from observationscoveredby a concept. O, O=~.g~) Ine~zpozate(C, i. Update C’s conditional 2. Include O in list 3. Update the node the number 4. If C is root probabilities of observations {father(C)=~ 4.3 RefineTsons 4.4 UpdateArray 5. Computerelevance (T-son) O} (Array, by O) (T-son, (T-son, UpdateAzza¥ by C covered nil 4.1 UpdateArray (T-root, 4.2 T-son = T-root Else covered of observations C, C brothers, O) O) 1. For each property Pt of O 1.1 For each property p= of O (z ~ i) 1 1.1.1.Add attribute)+Sub(pi~s attribute)}+ /* Sub(x} attribute to Sub(p=’s (shift(Sub(p=’s value}))) = retrieves the array subscribe of an or a value /* shift(Sub(x)) the Array( ( (Sub{p, valuel}-ll, attributes RefineTson = account with (Array, sub less the number than values for Sub(x) C, FC, O) 1. if IC’s observations[ > [FC’s observations[ 1.1 Subtract in Array the correlations observations covered by FC from Else 1.2 Create Array from observations with account covered by FC of correlations (~I~ teRel evmnae (Array} i. For each i.i For property the p, of a concept other properties p= of this same concept (p=~ p~l i. 1. I. SubAtmin Sub(p,~s ffi 1.1.2.SubValmin having min (SubIp=" s attribute}, attribute)) = Sub of the attribute value SubAtmin l. i. 3. SubAtmax = max (Sub (pf’s attribute), Sub(p,~s attribute) l.l.4.SubValmax ffi Sub of the attribuze value that has SubAtmax = I.i. 5. P(pllp.) Array( ( (shift (SubAtmin) + SubValmin ), (shift ISubAtmax) +SubValmax) ) Array( (shl ft (Sub(p,’s Performance Task In our process of evaluation of FORMVIEW, we pay attention to the prediction powerof a hierarchy generated by it. Thebasic idea is to submita set of ~questions~ (normally, incompleteobservations) to the system, whose answer is based on the generated representation. The quality of the representationis measuredaccordingto its capacity to give ~good~answers(i.e. to infer values for attributes). We have used three test domains: two animal classification domains(ZOOdomains)andthe Pittsburgh’s bridges domain. attribute) ) +Sub(px value) ), (shift(Sub(px’s attribute))+Sub(px~s l.l.6.Pertp, = Pertpt + (P(PlIP%)’ 1.2 Pertpl = Pertpl/Nb propertles-i value))) - p(p=]=} Frame3 Proceduresfor computing a propertyrelevance The ZOO domains The zoo domainwastaken from the UCImachine-learning dataset consisting of 53 observations with 18 mlributes describinganimals. Wehavedivided this dataset into two sets of observations described by 10 and 11 properdes, respectively. In fact, wehaveadaptedthis domainthrough Machine Learning227 the insertion of attributes in order to represent two perspectives:the pet andthe physiologicperspectives. Predictionof several propertiesin this domainwasuseful to showmore clearly the contribution of the use of predictiveinfluenceas a heuristic to compute the relevance of a property. Indeed, hierarchies generated from this heuristic have a better prediction power than those generated by other systems. It is due to the fact that concepts are organizedaroundproperties havinga strong predictive influence. Thus,whenone infers the value of a property, he/she increases the probability to infer values for other properties influencedby the first one. Figure1 andFigure2 illustrate tests donein two ZOO domains. they are informed by the user guiding the process of construction of the artifact. Figure 3 shows the predictive accuracy of FORMVIEW a~ainst that of COBWEB. The results of these tests indicated us the adequacyof FORMVIEW in constructing hierarchiesfor the tasks of design. Actually,specification properties are muchcorrelated and, consequently, very influent. This causesgoodinferenceson properties of the product. 100% Oe,~ ’| :[at :tat 4at Sat I I I 40 50 0,2 [UICOBWEB I I I 80 90 | 100 Complexity 0 10 20 30 40 N. Observations Fig, 1 Predictionof severalattributesin FORMVIEW and COBWEB : ZOOPhysiologic 2at 3at 4at Bat 0,7 LI;)COBWEB .llo., F| I 60 70 Fig, 3 PredictionpowerfromPittsburgh domain 0,6 o. I 10 20 30 0,4 0,3 10 20 30 N. Observations 40 Fig. 2 Predictionof severalattributesin FORMVIEW and COBWEB : ZOOPet The Pittsburgh Bridges domain The prediction of several properties often appears in problemsof conception. Therefore, wedecidedto use the domainof Pittsburgh’s bridges in our tests. Design domainsare characterizedby havinga set of specification propertieswhichdefine the user’s needsanda set of design propertieswldchdescribethe artifact’s clmracteristics.The bridge domaincontains descriptionsof 108bridges built in Pittsburgh since 1818.Eachobservationis describedby 12 properties of whichsevenare specification properties and five are design properties. This domain ~ largely explored by (Reich 1994 and Reich and Fenves 1991), wherethey showthe suitability of COBWEB-like systems to designdomains. Wehaveagain defined attribute relevance as a function of the attribute dependence. Thetask consists of inferring values for all the productproperties. Wesupposethat the specificationpropertiesare less susceptibleof noises since 228 Vasco To examinethe time and space complexity required by FORMVIEW’s approach, we consider: n the numberof categorized entities L the averagebranchfactor of the concepttree The cost of the procedure for computing property relevance is boundto the cost of updatingT-root and 7: son. It should be reminded that, FORMVIEW stores, in these arrays, the 2x2-correlationfor eachconceptproperty. First, we determinethe nocessaty space to stock these correlations. Let cell be the unit wherethe frequencyof correlation between twopropertieswill be kept. In eacharray that maintainsthe frequencyof correlation betweenproperties, for n observations, whichhave in average nbAT at~butes with in average nb V values, we nbAT~nbV need E i i=1 cells, that is to say, o (nb A T x n b V )= cells. Asfor the time complexity,wehavethe cost to updateTroot and T-son. whena newobservation is categorized. Thecost of T-root’s updateis the sameas that required in space. T-son’s updatingis moreexpensivethan that of Troot becauseit mustbe doneto eachlevel of the hierarchy (on average time log~). For each level, only the frequenciesof correlation betweenobservationproperties coveredby tile current nodemustbe represented. Thus,Tson mustbe actualized mtimes (re<n), wheremrepresents the minimum betweenthe nuntber of observations which are not covered by the current node and the numberof observations covered by the current node. In the worst case, we have m = n12, whichwould makethe geometric progression (n/2, n/4, n/8..... 1) for all the depthof the tree. Thecost of T-sonactualizationis thus the order: 0(2""x (nbAT x nbV)2). Finally, it is necessary to mention that thc cost of computing the predictive influence for every conceptalso Decaestecker,C. 1993.Apprentissage et outils statistiques en classification conceptuelle iner~mentale. Revue dTntelligence Artificielle, 7(1). Fisher, D.H. 1987. Knowledge Acquisition via Incremental ConceptualLearning. MachineLearning,2(2). o ,by ),(2"" + los:)) Fisher, D.; paT~mni Mand Langley,P. eds. 1991.Concept Formation: Knowledgeand Experience in Unsupervised Related Work Learning. MorganKaufmann. Gennari, J.H.; Langley, P.; and Fisher, D. eds. 1989. Thedefinition of a propertyrelevancehas beentreated in early incremental concept formation systems. ADECLU Modelsof Incremental Concept Formation. Artificial Intelligence,40. (Decaestecker1993)uses a statistical measureto quantify the correlation betweenthe property of a conceptand the Giuck, M. A. and Cortex, J.E.1985. Information, variable "membership of the concept". It maintainsa 2x2uncertainty, andthe utility of categories. Proceedingsof contingence table for each property of each concept. the 7th Annual Conference of the Cognitive Science However,there is no account of the correlation between Society. Irvine, CA,LawrenceErlhaum. the properties. Hanson,S. and Bauer, M. 1989. ConceptualClustering, In ECOBWEB (Reich and Fenves 1991), property Categorization, and Polymorphy.MachineLearning, 3: relevance is taken into account in the categorization 343-372. process in the same way we have implemented here. Michalski,R.; Carbonnel,J. and Mitchell, T. eds. 1986: However,the informationof whichproperties are relevant MachineLearning, An Intelligence Approach. Vol II. is given by the user. Clnster/CA(Stepp and Michalski MorganKaufmann,CA. 1986) eq, mlly uses the information about property relevance defined by the user in the GDN(Goal Reich, Y. and Fenves, S. 1991. TheFormationand Useof Dependence Network). Early versions of FORMVIEW AbstractConceptsin Design.In (Fisher 91). also follow this sameidea (Vasco,Faucherand Chouraqui Reich, Y. 1994. Macro and Micro Perspectives of 1995), (Vasoo,Faucherand Chouraqui1996). Muiti~,-ategy Learning. In Michalsld andTecuci eds, The non-incremental system WITT(Hanson and Bauer MachineLearning: A Multistrategy Approach.Voi. IV. 1989) computesthe correlation between properties to MorganKauffmann. create categories. It keeps for each concept and each Seifert, C. 1989. A Retrieval ModelUsing Feature propertypair a contingencetable. Sucha table containsthe Selection. Proc. of the Sixth International Workshop on frequencyof simulUmeous occurrence of property pairs. Machine Lemning. Morgan Kanffmann. For A attributes, WrITkeeps for each concept A(A-I)/2 Smith, E. and Medin,D.L. 1981. Categoriesand Concepts. contingencetables. This can be a very tough requirement Library of Congress Cataloging in Publication Data. in termsof space. CognitiveScienceseries 4. Stepp, R. and Michalski,R. 1986.ConceptualClustering: Conclusion and Future Research Inventing goal-oriented classifications of structured objects, In Michalski,R., Carbonnel,J., Mitchell, T. ed~ We have defined a method to compute and use the relevance of a prope~yin concept formationsystems. The MachineLearning, An Intelligence Approach. Vol II. first results obtainedwith the use of propertyrelevanceare MorganKaufmann,CA. encouraging. They shownus that, FORMVIEW’s approach Tversky,A. and Gaff, I. 1978. Studies of similarity. In can provide raprcsentations it generates with better Rosch,E. and Lloyd,B. eds, Cognitionand Categorization, prediction powerthan those generated from systemsthat 79-98, Erlbaum. do not take into account the relevance of properties. Vasco, J.J.P.F., Fancher, C. and Chouraqui, E. 1995. However,additional tests are necessary, mainly with Knowledge Acquisition based on ConceptFormationusing regards to evaluate the performance of FORMVIEW with a Multi-PerspectiveRepresentatiotl Florida AI Research more data. In order to analyze the generality of the Symposium,Melbourne,FL. proposed method, future research consists of the Vasco, J.J.F., Faucher, C. and Chouraqui, E. 1996. A application of this methodto systemswhichuse different representations. KnowledgeAcquisition Tool for Multi-perspective Concept Formation. In Proceedings of European References KnowledgeAcquisition Workshop- EKAW-96. Shadbolt, Shreiber and O’Hara, eds. Springer Verlag. Decaestecker, C. 1991. Description Contrasting in Vasco,J.J.F. 1997.Formationde conceptsclans le contexte Incremental Concept Formation. In proceedings of des langages de schemas. PhD.Digs., Universit~ d’Aix EuropeanWorkingSession Learning. Marseille III, IUSPIM/DIAM. requires timeeffort of theorder o (~bAr x .by )’. The total cost of the procedure for computing the predictiveinfluenceof propertyis: o(2"" ~ (,,bAr ~ ,,bY)’ los~ (. bAr .b xV ’)) - Machine Learning229