From: AAAI-86 Proceedings. Copyright ©1986, AAAI (www.aaai.org). All rights reserved. CONCEPTUAL CLUSTERING USING RELATIONAL INFORMATION Hernd Nordhauscm Department of Information IJniversity aid Irvine, CA ArpaNet: in conceptual ~Iasscs color from 92717 or size. between can rnatiolk the objects being clustering clustered Using conceptual as clustering involves as well 0 1’US is able algoinfornot methods. c~~p~,uaIly similar classes of Lhose classes. In recent search in the area of’ staveral years have focused such as color cation. Only Stepp of object on feature liowever, mation no & Michalski to classify dresses (acid this issue uot sirnply scriptioll), We thus extend 171 have re- For a survey see /Xl. All of of the a coherent description classifi- left this of objects, narrow i.e., and the relalionship thus at- the set of objects. 0 PIJ S implemented by using relations object components the definition relational but of other create relationul This paper in Prolog, infor- describes which ad- over the set of objects as in structural of objects, of conceptual to form de- classes. clustering [6] informalion. conceptual have is able to generate to be given description the next In tailing describing section l A set of relations between 0 <:riteria to evaluate two proposals to the A tlierarchy 508 / SCIENCE of classes and a characterization of the from relations and parent. The system each class having tem divides and 0 P IJ S system, de- recursively we give consists set, the such a hierarchical conceptual the classes features tree of classes, until a final these After of the classes of a given 0 I’ II S continues and refining parts, these classes divide the classes. clustering are until formed described the until of generating new attributes 0 I’ II S consists in detail cluster- classes value for all cur- the cycle algorithm class new attributes attributes, have the same arr the list of current all members new classes, partition- as coio~ or size such tl~e previously The sys- exclusive value for the given features), llsirlg objects, as eclt or description. set into mutually (i.e., of the objects describing to form classes. refines with System the object generates is exhausted section We conclude this work. OPUS a unique At first, att,ributes. generutor; the third In the system. over divides gener.Lted. sections. and the 0 1’ IJ S system the object ing is found. the to form a classification a set of features be used to further c Iasses that New we describe and a set of relations distinct l In contrast, systems. is not sufficient. for extending input attributes t”illd: to of the objects to illustrate to be classified, rent of a classification inability if it determines 2. The all rrletribers the objects Lhe quality is the attributes of new attributes. two applications ilig algorithrrl the objects defin- deficiency used to characterize to such the use of relations generation are A set of features Another of their features. have the same l effectively as chunks formed are defined but of genetics, in terms systems to dis- features not only all attributes sys- is able offspring, and purebreds. 0 1’US attributes 0 A set of objrcts peas objects attributes system in the domain of their elitni- clustering have the same clustering used as attributes (i ivc>Il: which to classify 0 P TJ S system this I;or exatnpte, of hybrids among used objects new att,ributes; the current the conventional systerns, also in terms ing the class The far has as well as features lo include active to form classes. systeti1 a systt~rn called been descriptions to form cotnponents thc>schcorrlponer~ts systems, or size, ttorlrdirl and used structural tributcls has clustering. clustering objcc.ts, a characterization there of conceptual conceptual thcscl systetns and producing into con- color information, of previous the other relations. classifications objects unlike between relational grouping tems; different 1. Introduction (:onceptual a deficiency tinguish this clustering nates which tree of classes. systems, is able to find object such a system to form a hierarchy with conventional on creating set of features, new attributes. the system possible focused we describe conceptual define has a fixed paper of the objects Ilrllikc~ previous rithlrl with In this uses relations ah fcal,ures Usirlg relational clustering objects Science lrvine berndCQic:s.uci.edu Abstract Work Computer of California, and the ‘in the cannot of two attribute following ‘l‘t1tl 0 I’IJ S clustering M A G t4: cluslering schetrle atgorithrri is to build a hierarchical (clusters) trtbcb of mutually for a given has associated object set. attribute/value ‘t’tttb IIic~rarchy tree is built Ilotltb iI1 the tree, btlst partitions is based It CJ M- exclusive Each ‘I’tle simpficity object classes of the set the algorithm down fashion. selects to some which alone /6j/;the Af’ter an attribute has been selected, vided inLo mutually exclusive the same value for the chosen tree is labeled at that node, common calIt vided to all members using once again the irig algorithtn cannot until further (liven an object attribute reruaiuing attributes. best In order 0 1’US clustering, that associated are procedure is point di- 0 1’US it and tertninates. partitions A cotnplex the quality a complex is the logical over the remaining we have the object attribute/value set {K, pairs to the set over the to measure forms we want of for each implication attributes I,, M, N, O} for attributes A, the higher will t,ch. A good classification a tligh iirlci between her of terms consists measure of a logical of B, elcmc~nts of UPLuttribute of all of the plicity 01 an uttribute and (1711 and [rn]), ‘t‘he computat,ion the complexes for attribute A for values inter systetri are sets.) The ‘I’tle (I) j(Lj > {(I) - lyj v 1x1) A (2) /h( > {(U ‘I’llaL is, if an object implies that has a value criteria, alld t.he itIter j771,7L] jllll J illI)> that The difference, A, it B, and a value corriplexc3 are used to 0 t’ II S uses two of the cluster which ‘l’tir descriptiorl we 110~ discuss. (Values of has possible have. In difference of two a selector element is, an element of attributes of the in t II<>0 [‘IT s srrrlilurity bt:twt*t~rr two selector el- to be 5111~(c,, cz) i::l:::~ ..* rfonirtt~i?rl sirrlilurrty of’ a rc~f’erenco elenlellt e of a sc- cl and ez, is defilletl .S1 to selector value is trIax{ sirri(e, c:k)}, for all ek L .S,. is the avtbrage of Lhe tilaxinIuIr1 t:, all selector S, t:lt:rrlerits similarities Now, thtl degree of srrrlifurzty of’co~~~plcx (,“k to Cl, denoted .Sl~rlk~, is idcrlticat pteX Ck attril)utc LO lrftrr the X is the avt’rdg:” are corriptexc5 Referring irig valuc5 I’,, , w tlcrcl i and j art’ all t he selectors ‘l’htl tfeyrcl: uj ciifleretwe parl>. COlIl~JlcX Finally, of selc>cLor S, to selector of S,. the avcragct ov(lr at I of 111 for attribute of a11 attributts. the simplicity cluster v C. 0 1’II S forms thtlsta cornptexes of all attributes. dt~l~~rt~~inc?the quality clusLc:rirlg - Im] v InJ)} it has a value of 121 fi>r attribute of /?I/or I ~nj t‘or aLtribute for all values (C - 1x1) A (C selector C can cluster We dcfiric involved. of an attrib\rte. Iector B and C are: of the selector are three attribute sim- The that and there that of the is more domain ements, this data, attribute. to be the negative of the second to be an elcmcri t of a selector (;ivtrrl The of’ the complexity is ’ :~, because lm,7~j) by the the number A is - 1:. complexes lu/ and lb] over attributes i.e., of the selector. of that is defined of u selec- divided have, is of an attribute complex (I), the second selector has a complexity value of :I I. The value of complexity for attribute A is the av:1 for er-age of 1, I, i, and i which is ii. Thus the silllplicity attribute alrd, C as follows’: could selectors A cotttplex complexity is the average (2) in our example jrnj, values The 161. ‘I‘he complexity (In], descriptions to maxirriize selectors. Il:acti selector for the attribute values two elements, the value of the tiunl- of’ t Ile selector the selector complexity complex class difference ttre possible of domain complexity among of disjointness of an attribute. disjurlctiou. of terms values this degree is a norrrialixed product from linkclti by internal the disjoint- overlap classes. in the complexes a list of elerrierlts were used measures clusler classifi- criterion has simple dt>gree of inter the distance number If the new attributes that diflerence it A second and arbitrary ‘I’he less values attributes, so that classes. if the above tor is t tie nurnl,rAr of trrrns the rluster- 0 P II S decides the trivial occur rnter cluster ‘l‘lre simplicity attribute which of an attribute for the value of an attribute with this At the classes, which value of an attribute. have be further set and a list of attributes, sehbct that I6jj. Suppose cannot the classes. selection The and applies the final classes The a proposed of that, class. attributes. divide 2.1.1 rt~embers An arc in the hier- value for attributes rlew attributes to refine has determined whose thtt classes given defines classes with ttle value for the chosen and any other recursively the objcict, set is di- attribute. might a partitiori- description, and differentiate nt’ss of two complexes. remaining to choostl a sitrlple is used to avoid which clustering is used forms to characterize cation critchria. archy is easy criterion wllich At each an attribute set according illg at,tribute criterion pairs for a list of altributes. in a top the object on the Ii/. The goat of the algorithm dc!IlOtXd (,‘i, clrlster 01 dll fjljk[, k ,jIiSt ag$ri /l~f’~~ values, to ttie t~xartlple, various Sirrl~/. ififlkrerlc’e tfeyree of arl attribute k- / 1, wticlre k and f 01‘ill1 Vatut5 of’ Lhe attri for the I ot of coin- bultl X. wc calculate computatiorls Ltle follow- to calculate the dlld set two of the LEARNING / 509 inttlr (‘luster diff’erenct> degree for attribute A: size and the relationship and the feature plex relation size eat eat. size the third the relation relation that Complex Y argument of the object, of the feature. .Y> X eats the first and second are members is a value eat(X to form the com- (X ,Y, Z>, describing and Y is of sixe Z. Note that of a complex Thus (Y, 2) are composed set, while relations will be used as attributes. b’or the selectors of attribute C, we compute iri a similar The value a complex ‘I’IIIIS [JteX IIlLlX{ ; ).I,} t 111clx(0, I} t l,l,X{l,O}X 5 1a 1’12 3 s That is, the set 1’2, isfied for sorrie we (I) c-orrt~Jtex (a) of’ A of (i att,rit)ute of corn- t :)/2 5 ii deyrcr o/ sirrdurrty of’ complex (2) to cotriplex (I) of 1 I I1 I. ‘ I’ her&re the rlqret: u/ ctr#t!rences are :( and 0 re2 SJJN~ ivtlly. ‘I‘t ie inter cluster drflerence deyree for attribute ;irl(l I O),‘:! :,. is defined r-f (X, Y, Z), r f for the object IIl‘lX( ; ,O)1 } t IIIILX{f ,l ,I)} 1 1 2 a degree of deyree oj similurity have lo A is (:: of an attribute relation X is the set that exarripte, domain because Y, eat bound sizecsnakes. t,o mice and medium) is satisfied attribute eat snakes, because insects, has a value snakes eat r f (X, Y,Z,)}. value for Y size(snakes, to snakes. small size medium], is satisfied eat of is sat- of eat is [small, small) and for Y bound size 3 r f (X,Y .Z) the for snake?, in the food chain Given of the attribute {Z, 1 -1 Y of all Z’s, such Y. For as follows. the value Thus, [small. and Y, the medium] medium for sized ani- rr1als. ‘I’llis computation at tribute makes tt~rri, values value (~,6] value l~,I,cl, sm(1u, is further and llowever, .Si911:!, has a higher rneasirre ‘I’versky 181 supports irr the example, thus wcbigli lhe Therefore hierarchy tree, This / II* inter irriport4ir.e cluster At eaclr a quality value cluster for involve only two objects, tween t Ire two objet ts. tributes, .Nc~w attributes is a relation relation. trit,il tc5 are not sufficierlt. of Ltlc sarnr class. while rtbl;lt ioIls arid features. 2/. ‘I‘he user can two criteria. 0 1’US selected at I)e defined to distinguish when betweerr are chunks lt‘or I tris purpost’, we define Lo t)(’ ttrc composition f (Y ,Z>. 5 10 / SCIENCE could at- rrierribers composed t iorr r(X ,Y> and in feature airirrrals current of a co4r&- of a rela- lcor exarr~ple iri the /0od I,(1 tit3,c-ri bed by the feature beat- primitive eaten(X,Y) eden, meaning describes level is defined frorrr a complex posc~l of a level n retatiori and at level to be defined level k / 1 relations t 1 relCltions are corrlposed corrIpt(bx relations us4 corriplcxity. tributes ‘l‘hus, process, Only attributes objects are first ttrct~ based upon If at, any tirrlcs which features in the clustering used to define attributes. level These to define Relations but rather are used to clus- classified attributes relations ilow Each current are defined. with a com- feature. the in Now, one. anti thuh level k t I attributes. arc’ not. directly features, relationship relation and an existing have by Y, are defined attribute tinit> new attributes the eaten two Z. Relations starting 91 or the inverse describes of order, It:& us- A primitive X is being the Y eats relations as a relation two objects. to the system eat (X ,Z> levels “link” more involved supplied SOIIK~ Y and i~rcreasirrg between based with upon increasing cannot rcbfine class~hs, the system define terminates at- having reacht~d a final ctassilicatiorl. At each level k, new level k relations ~~frx relutr‘o~i r f (X, Y, 2) r/mirL tlorriairr eat of several re- relations is a direct to define ‘I‘htl rcllatiorl relation of X eats k New at tribut.cs one set of binary primitive a feuel n relution relations relation level and there consisting of t.tlirlL a small These In order We deline ing YLprimitive their Att,ril)utc!s havtk to relations are formed. with cut or parent. as ter objet ts. difference valutt of the attributes is supplied such t>iL(.tIrlotlt’ in the eX~>i~l~dillg tree. 2.2 (kneratirlg systerrr k is increased value is the sum of of these The lations Itbvel k off bc~tween the inter for hoirit’ user spc~citied cot~lfic~ierrls u and rrraxirtlizchs the qualit,y any (2) also satisfies of a class description. is c~orriputed. u 4 sirriplicit,y . . . as . . . as a subject.” of corriplex a t,rade Ievc>l in the expanding slimulus ttiari Sz’rr~,~. tlifft!rc~rlc~c~ arid the sirrlplicity tAac.tI al,tribut,e a11 evidence ( I ), but riot vice versa. value such may seem and tie provides to thrh complexes 0 I’ 11 S rriaxirrii~t3 be as disproperties. will prorrrotc stimulus t,he conditions thtb c~orlditions of corrlplex is, it is from should similarity riieasur-e, once again satisfying That than hurr~arrs “terid to st~1t~c.t.the Itlore salient I<eferririg 0 1’II S sys- with different difference a rc~l’ert~rrl, and t.lit~ less salient object for an ordered. descriptions idea of an asyrrirrietric at first. irl the (b,c,ct[ classes cluster asyrrIrrit~t.ri(~ similarity difference si~rl( [u, 61, 16, c, dl) is less than Class the inter that, partially value therefore to ensure c‘or~rrtcrirrt,uitive that are from 61,[a, 6, cl). Maxirrlixing clustcar use of Lhe fact of attributes tirict, as possible The of the irrter 1 relation is composed a level k ret&ion. limit relations which nurrrber to refine I relatiorls 011ly t,tIe relatiorrs classc:, of retatiorls clxptosion at level k new relations. A level to form are defined. of the number carI be dt~fined at each of the k relations. define All inverses the combinatorial are defined. with a level one relation To of possible level, only a limited arc considered to define wtrich dtJlirred attributes 1 are used at t,he rlext new used level to Two 3. 0 1’II S has applications ale described by a set following sections: in any of features Two examples tjons. Examples 3.2 The domain and objects a set of binary of such domains the food chain where rela- are presented dornain in the and the genetics tiorriain. ics. The fying tors. two features, example, size( size and locorrrotion, we describe songbirds, songbirds insects), jects are characterized ovc’r the objects At first, objects. is chosen jcc.ts inter tree. tion attributes with The eaten locomotion peas which cut tures as the to classify the Therefore of rnediurn sized ob- feature color hawks, owls, parent left, and new of anirnals defirles eat The ‘I‘hese four attributes For exarnplt:, is refined two level one and attributes eaten describe the size that and small the exist- of rnediurn eat animals, two deeat that are used to divide the class and sized flying size. llawks and while songbirds only the current there attributes have been are only two classes with used to refine more than the one ob- ject left, the class of frogs and toads, and the class of hawks and owls. eat eaten The eat with level and their the features toads two relations inverses to define have the same values [large, medium] for these is, hawks sized while and ~IIC~~UIJI, eaten relations frogs chat animals allirIlals, large, eat and hierarchy size [large, small define loads, eat eat animals are equal. was picked of these attributes so the systern is shown in Figure eaten size, small]. namely ‘1’hal by large alld medium which are eaten Thus, the The which terrrlinates. 1. and there- 0 1’IJ S continues example, with refine attribute t,he class The ferent resulting classes, traits. shape Again fic:d as the various 0 I’ IJ S defines ofl’spring, and hierarchy system paper, relations llsing is able an example to find methods from and the color and characterized which Mendel green which produce Research the clustering object set the relational classifications the domain peas; peas as ofFspring. a conceptual of conceptual as identi- For example, of round Further over and over the object purebreds. green with nine dif- and recessive with class has members we presented uses dominant classes and of classes. ventional and peas which orlly have round green 4. Summary which crossing He observed defined hybrids wrinkled for as parents the relations two different while the other green round tem different all nine classes has Inembers one class only offspring. distinguishing, the 0 t’ IJ S system classes only contains green hybrids and shape. 0 I’ II S correctly intermediate charac- purebreds as parents. of each pea and asserted lo this of all having We supplied with point or green the character- and the classes the -color At this Mendel’s of green his experiments, color and for these a yellow with yellow both purebreds traits, yellow parent peas. the class of hybrids purebreds continued two different system of yellow the class to refine between purebreds by next level which it the color value Furthermore, have peas to distinguish corresponds yellow the At first, off spring hybrid. while about purebreds. as either has green offspring, and other pea produces, identified For cxarnple, terization. fea- which For the class of yellow peas, the class classes off- parent. and the simplicity or a (yellow) ixation features each In the running to refine all peas are correctly set. hawks and owls have that class. animals. eaten, Frogs new attributes, medium, which are eaten owls is used to divide cannot tree llowever for the attribute and eat and concatenated level two attributes. values fore that class is Itot refined. diKerent eat, are formed, difference peas inforrnation the attributes are defined. cluster Mendel animals. Next, color purebred size, the latter of the animals using the attribute owls eat rnediurn eat small relations by an object, object. ing classes. all possible locomotion, first eaten inter with and green of purebreds, the same their oj’spriny of hybrids had exactly is used as an attribute alld green peas. peas had yellow of hybrids, from yel- peas only produced the same is provided the classes has used locomo- with different pea and the defines attributes coulplex offspring 0 1’[JS color of each with both to self-fertilize the class and the class with pea was yellow consistently offspring parent, set, in the members: peas with features When size yellow ob- was crossed he continued hypothesized produce some offspring pea ances- of genetics, some of the yellow Green features, pea, it produced while other thus the object are no attributes the size and loc.ornotion After one the system scribe classes, Fifty the relationship value. a class 01’1JY size, locomotion. objects Mendel of classi- and their offspring that After that offspring ob- value as locomotion, to divide there following eat he discovered offspring. as attributes difference After to that forrned : peas, consists observable father garden pea the resulting yellow All fourt,een a yellow low and green offspring. the field of genet- descendants the founding he self -fertilized spring. have to be defined. relations. garden 141. When and green two features. the following classes, In response are cluster and snakes. LO refine facts: of their when from in genetics orlly 011 their not Mendel, that produce For example, is created songbirds, I+‘or set. as the first attribute hierarchy eat. the following to describe has the sanle siniplicity but. a higher using eat(songbirds, by the same 0 E’lJ S uses features size relation, songbirds). facts are asserted anilnals locomotion(songbirds, worms), and eat (hawk, rc>lational and using medium), * eatcsongbirds. fly) we characterize based Cregor an example problem but also or1 features a green domain, Domain clust,ering objects served In the food chain Genetics I,et us now consider information, not possible clustering. of genetics with a this con- presented We where sys- to define the system LEARNING / 511 1 Classification Figure is ahIt: to form thtlrrnorr, the classes we introduced uscti in the classification ‘I‘hi:, work can tic LO assume available buil(l that conlirr~led, tion:, ‘i‘tlc~ present relat iorls. version enhance of chemistry, chrtit:s) 1,ticir reactive can si~~iilar predic- handle working its power. some compounds depending in which be formed Using 1,ernary IY, yet iI1 a more efficient manner. ;tct ivc>ly (~llgagt~d irl working 11. ary are classified as other prop- alkalis react. relaliolis, At the moment, in these we are like to t hdllk help OII I’at La~lgley, t.his work, p~~)plt~ from the rnachirrt~ learnirlg IIL(’valuable corrlnleI1t.s wah supported in part f’rolrl t tie Inforr~ratiotl search . on drafts by Scietlces IIon IIose and ILar~tly of the Ninth International ference on Artificial Intelligence, Langley, G. 1,. The search discovery. 111 Muchine Approuch, Vol. I!, H. S. Michalski, Michalski, tual clustering: bona1 I)ivision, NO0014 4 (1980), 84 tion: / SCIENCE and T. Los Altos, and ‘I’. M. Mitchell, 1983, 331 goal Michalski, oriented and algorithm lnterna- and lnformution Systems, &is., Tioga observaAn Ar- J. G. Car- Press, Palo R. S. Conceptual An Artijiciuf R. S. Michalski, b;ds., from Leurning: Ii. S. Michalski, classification Learning: 1986, 471 Tversky, 1966. concep- Alto, 363. It. E. and !I, New York, through concepts. In Muchzne bonell, Vol. Pub., K. E. Learning Approach, Morgan J. G. Kaufman clustering: of structured objects. Intelligence Carborrell, Approach, and I’ublishers, T. M. Los Altos, 498. A. Features X(4), 1977, 327 352. 5 12 Stepp, clustering. Intelligence CA, Publishers, framework Analysis tificial Mitchell, [8] Intelligence “43. H. S. and Conceptual 17] Stepp, K 0345 Artijiciul into conjunctive oJ Policy 219 [B] Michalski, Office of Naval He- of scientific acquisition A theoretical data Journal IJI Machine Contract Bradshaw, aspects J. G. Carbonell, of Mer~del, Ilafner for partitioning work An Kaufman R. S. K nowledge group This Learning: Con- 1985. H. A., and Four clus- Joint 19X6, 425 469. Inventing paper. for regularity: Eds., Morgan- to conceptual 691-697, I’., Zy Lkow, J. M., Simon, as well as the numerous of this 1’. Approaches 1984. Proceedings [5] al IJC1 who gave Irvine, CA, !,ering. CA, I uould of California, Langley, algorithm. and Computer D. and CA, Acknowledgements .JOIIC~~t‘or their Ilniversity clustering of Information Fisher, Iltis, t1. The f,l/t directions. conceptual 85 21, Dept. [4] these to c; 1,11 IJ 13 l<K A hierarchical M Mit,chell, binary with in on (among in a wdy similar only For example, For example, behavior. wil tI cLc.ids to forrrl salts. could he [3] OCCURS. of the system a11d salts objects. can either in other [Z] predict- be disc.onfirrned, of Ol’tJS would great,ly alkalis clax,t?, predictions or ~,hey call An exlension thrt domain itcitlb, available, of classtlb information, D. Report Srlence, is of 0 I’ IJ S would as well as missing in wtiich cast‘ t,tit: t,elief a rclvisiorl relations partial objects Fisher, Tech. It is unrealis- describing version of objects becomes i, reinforced, cast Iret> using properties References Fur[L] An incremental domain new attributes in two ways. all the information the hierarchy As IIWW data and purekreds. to define for food chain process. be ext,ended initially. ing rllissing of hybrids a method Tree of Similarity. Psyrhological Review