From: KDD-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. A Method for Reasoning with Structured and Continuous Attributes in the INLENMultistrategy Knowledge Discovery System Kenneth A. Kaufman and Ryszard S. Michalski* MachineLeamingandInferenceLabomtoryp GeorgeMasonUniversity, Fairfax,Virginia, 22030,USA (iuldlan, michalsk)@aic.gmu.e4lu * Also GMIJ Departmentsof ComputerScienceand SystemsEngineering and the Instituteof ComputerScience,PolishAcademyof Sciences Struck ptialiy .attihntm ..IaY”IY Abstract attributes have domains (value sets) that are Such odered sets, typically hierarchies. SIlhW ” ..a&” Im~WlPA~~ -u “a-6” AifCn”PCIII VA J “AOIV pq-gmg to incorporate background knowledge about hierarchical relationships among attribute values. Inductive generalization rules for structured attributes have been developed that take into consideration the type of nodes in the domain hierarchy (anchor or non-anchor) and the type of decision rules to be generated (characteristic, discriminant or miniium These complexity). generalization rules enhance the ability of knowledge discovery system INIEN-2 to exploit the semantic content of the domain knowledge in the process of generating hypotheses. If the dependent attribute (e.g., a decision attribute) is structured, the system generates a system of hierarchicaliy organ&d ruies representing relationships between the values of this attribute and independent attributes. Such a situation often occurs in practice when the decision to be assigned to a situation can be at different levels of abstraction (e.g., this is a liver disease, or this is a liver cancer). Continuous attributes (e.g., physical measurements) am quantized into a hierarchy of values (ranges of values arranged into different levels). These methods am illustrated by an example concerning the discovery of patterns in world economics and demographics. Introduction Most symbolic learning systems representinformation about objects or events in the form of attribute-value vectors. Attributes usedin these systemsare typically numerical(i.e., havetotally oulemdvalue sets)or nominal (i.e., haveunordemdvalue sets). In someapplicationsit is useful to use attributeswith value sets that are partially order& for example,representing a genemlization hierarchy of concepts. Theoretically,almost any attribute can be viewedashavinga hierarchicaldomain. For example,the domain of a numeric q&&]e “&* ra & @it iEt& 8 hierarchicalIyorganizedset of classes-specilicnumeric valuesat the lowestlevel; toddler,child teen,young adult, middleage,seniorand very senior at the secondlevel; ti possibly young, adult and mature at the third level. To represent such concepts, a system needs background 232 kDD-96 knowledgethat relatesthe numericalage with the higher level concepts.In general,the structureof the domain does not . ..haveto be fixed; it may be changingwith the context or me problemat hand. Structuring attributes can prove advantageousfor a knowledgediscoverysystem. It allows facts, trendsand regularitiesto be mmied both at high and low levels of abstraction,andfor backgmundknowledgeto be storedand generalizations to be ma& at the appropriatelevels. The ideaof groupingvaluesof attributesinto a structure of classes in order to reflect semantic relationships characteristicto the given applicationdomainhasled to the introductionof structuredattributes(Michalski 1980). The domainof a structuredattribute is a partially a&red set, tvnicallv can *I Domains ~ ------__-L of _- structu& variables .-----& -~I= I ahierarchv be gene by a domain expert (e.g., Kami & Loksh 1996), or by an automaticprocessusing a numericalor conceptualclusteringmethod(e.g., Sokal & Sneath1973; Michalski & Stepp 1983; Fisher 1987). The structureof the domain can be modifred to suit a specific class of problems(e.g.,Fisher 1995). Structumdattributescan be usedas independent (input) variables(Michalski 1980), as well as dcpcndcnt(output) variables(e.g., Reinke 1984). The rolesof structuredvariablesin thesetwo casesdiffer. This paper discussesmethods for reasoning with strut attriiutes in the processof data analysis and ‘knowledgediscovery. ‘Many of the presentedideashave beenadaptedfor knowledgediscoveryfrom the Inferential Theoryof Learning,which providesa unifying framework for characterizing learning and discovery processes (Michalski, 1994). Among the novel ideasam the use of “anchor nod& to guide the inductive generalization process,the use of non-hierarchicalattributedomains,and the introductionof differentkindsof genemhzation rulesfor structuredattributes. The use of continuousvariablesin reasoningis closely relatedto that of structmedattributes in that the way their domainsare structuredis tailored to the particulardiscoveryproblem. The presentedideasamd methods have been implemented in the lNLEN-2 knowledgediscoverysystem, and am illustrated by an applicationto a knowledgediscoveryproblem in world demographics. The Extension-Against Operator Three Types of Descriptions and An inductive generalizationruie (or transmutationjtakes input information and background knowledge, and hypothesizesmore general knowledge (Michalski 1980; 1994). For example,droppinga condition from a decision rule is a generalizationtransmutation. A powerfulinductivegeneralizationrule usedin the AQ learningprogramis the extension-againstoperator. If rule Rl: [x1 = al 8z CTX characterizespositive concept examplesE+ and rule R2: [x,= b] & CTX chamcteh negativeexamplesE’ (wherethe CTK, a context, stands for any additionalconditions), then the extensionof Rl againstR2 Rl -I R2 producesR3: [xi # b], which is the maximal consistent generalizationof Rl (Michalski & McCormick 1971; Michalski 1983). By qeatiug the extension-againstoperator until the resulting rule no longer covers any negativeexamples,a consistentconceptdescription(one that coversno negative examples)canbe generated.Such a processcau appliedto generatea description (cover) that is complete and consistentwithregaul to all the training examples(i.e., it coversall positiveexamplesand no negativeexamples). Another important conceptthat needsto be exp-Gned beforeintroducingthe centralideasof this paperis the type of description, as defined in the AQ rule learning methodology(Michalski et al. 1986). By applying the extension-againstoperator in different ways, one can generatea range of descriptionswith different degreesof generality.Hem we distinguishthree types of descriptions: 1) discriminant cover$ (in which the extension-againstis usedto createmaximal generalizations;such descriptions specify minimal conditions to discriminatebetweenthe conceptandnon-conceptexamples);2) characteristiccovers (in which the extension-againstis specializedto create maximally specific genera&ations; such generaliitions specifythe maximal numberof conditionsthat chamctcrize positive examples);and 3) minimal complexitycovers (in which the extension-against is usedto createthe simplest possiblegeneralizations). To illustratetheseconcepts,considerthe diagramshown in Figure 1, which represents a two-dimensional representationspace spannedover three-valuedattribute Shape(Sh) and six-valuedattribute Color (Co). Positive and negative conceptexamples am marked by + and -, respectively.The characteristic (maximally speciflr) ~~~I~~ ~.~-.- * area ~~cover It states: C&r is rea’& is represented by thc snaoea yellow or green, and Shape is square or triangle. A discriminantdescription(maximum generahzation)of the sameset of examplesstates:Color is red, yellow, green or white (or, equivalently,Color is not blue or orange). A minimum complexitydescriptionof theseexamplesstates: Color is red or yellow or green. The threedescriptionsare equallyconsistentwith the five input examples. ThesethmetypesofdescriptionsareusedintheNLEN2 system for knowledgediscoveryin databases.In the following section,we presentan extensionof thesei&as to descriptionswith structnredattributes(both as independent (input) and dqendent (output) variables) and continuous independentattributes. Color is red or yellow or green, and Shapeis squareor tritlngie Figure 1. A characteristiccover of E+ vs. E-. Generalization Rules for Structured Attributes Structured Input Variables In o&r to apply the previously definedextension-against operatorto structuredattributes, new generalizationrules ricedto be defied. Let us illustrate the problem by an examplethat usesa suucmmdattribute “Food” shown in Figure 2. Each non-leaf node denotesa conceptthat is more generalthan its chiklren nodes. These relationships riced to be taken into considerationwhen developing a genemhzationof some facts. Supposethe conceptto be learnedis exemplified by statements:“John eats strip ntm-.lP L&w (‘Tnhn J”ll‘l Ar\ncw.*t - an+ c GaL . ..-.r..lln “QLuua :M 1w ,...a,.C11-. ” krtnn., EZstent generalizationsof thesefacts exist, for e&z: that John eats strip steak, steak, cattle, meat, meat or vegetables,or anything but vanilla ice cream. The first statementrepresentsa maximally specific description,the last statementrepresentsa maximally geneml description, and the remaining ones representintermediatelevels of genemlization. A problem arises in determining the generalizationsof most interest. We approach this problemby drawinginsightsfrom humanreasoning. We tend to assign different levels of significanceto nodes in a generalizationhierarchy. Some cognitive scientistsexplain thii in part with the ideaof basic level nodes,whosechildrensharemany sensoriallymcognizable commonalities(l&he et al. 1976). Other factorsthat ate important for characterizingnodes are concept typicality (how common am a concept’sfeaturesamong its sibling concepts),and the context in which the conceptis being used(Klimesch 1988;Kubat, Bratko, & Michalski 1996). Spatial, Text, 6r Multimedia 233 o e A Rocky Road Anchor nodesare shownin bold Nodesmarkedby + and are valuesoccurringin positiveand negativeexamples, respectively. Figure 2. The domainof a structuredattribute“Food.” To capturesuch preferencessimply, we introduce the idea of anchor nodes in a hierarchy. Such nodes are the ones that are desirableto use for the given task domain. To illustrate this idea, considerFigure 2 again. In the presentedhierarchy, vanilla and rocky road are kinds of ice cream;ice creamis a frozendessert,which is a dessert,which is a type of food. In everyday usage, dependingon the context,we will typically think of them as ice cream or dessert,but not so typically as l3ozen dessertor food. In designinga knowledgediscoverysystem, we should be able to encodethe contextualsignificanceof the nodes into the knowledgerepresentationof the system, so that the cdeatedrules will represent desirable levels of abstraction. To this end, nodes in the hierarchy that am considemdto be at preferablelevels of abstraction am -.x.l,,.A .x..uJK.Ilu‘ “~“I..*..I-J. rP.#?x,” 11ull- aa Given the information which nodes am anchors zmd which are not, different typesof descriptionscan be created during the generalizationphase of knowledge diver-y. The meaningof diiinant and characteristicdescriptions needsto be properly defined. In building a characteristic description, the following rule is assumed:General& positive examplesto the next higher anchornode(s)if this maintains consistencyconditions. For example, consider the nodes in bold in Figure 2 to be anchors. Then in chamcuzisticmode, the extensionof the positive attribute value “Strip” againstthe negativeattribute “Vanilla” would generalize to “Steak.” In building a discriminant description from these examples, attribute values am general&xlto the most gene& value that doesnot cover the nearestanchornodeto the value of a negativeexample. In the example above, the positive value “Strip” would genemlixe(extend)againstthe value “Vanilla” to “not Ice Cream.” In building the equivalent of minimum complexity descriptions,any of the intermedii anchor nodescouldbe usedif this would simplify the description. For instance,one generalizationrule would be to generalize 234 KDD-96 positive examplesto the highest anchor nodes possible, suchthat consistencyis maintained. In this example,the &dpGon of ,3,,. wo&i gene m “GMe&F Another featum to hmXeasethe power of repmsenting structuredvariablesis that one doesnot needto limit the domain of structuredattributes to strict hiemrchies (one parentfor everynode). Instead,an arbitrary lattice may be usedwhen desimd. This featureis useful for representing overlappingor orthogonal classificationhierarchies. ‘Ibe natureof the particularproblem determineshow the nodes are gexmalkml. The system genemhzesthe nodes in the way that producesthe most desirable(for example, the simplest)description. Generalizationin structureddomains doesnot incmase the complexity of the extension-againstoperator,save far the fact that the internal n&s in the hierarchymust now be among the attribute’s legal value set, while in an unstructureddomainthey may or may not be present. Any incmasein discovery complexity will come during the postprocessing,when different possible generalizationsof the discov& cover will be examined;in the worst case, this will be boundedin number of applications by the number of internal nodes in the tree times the tree’s maximum height. StructurecS Output Vadabies Typically, dependentvariablesam numeric or nominal. In the latter case, values of an independentvariable zxe independentconcepts. Such a representationfails to take advantageof any genemhxationhierarchiesthat may exist amongvaluesof a dependentvariable. For example,when selectinga personalcomputerto buy, the on&date models may be group& into IBM-compatible and Macintoshcompatible. Rather than just choosing one system from among the entire set, it may be simpler to organizethe knowledgeto choosefirst the general type, and then the ---AC.. --A-l -C.L- ilswluGu~ylx;. -.....-..A L-^ ‘1llls1GNIs FL:- ,..-A- __^ .^ CLs~llGluuuGlurulG us w UlG needfor using structnmdattributes as depzidentvariables and &fining an appropriatemethod for handling them in the processof rule searchand generaliion. Given a structured dependent variable (a decision variable),decisionrules or descriptionscan relate to nodes of the dependentvariableat different levelsin the hierarchy. The proposedmethod focusesFist on the top-level nodes, creatingrules for them. Subsequently,rules ate createdfor the descendantnodesin the context of their ancestors. For example,in the caseof the computer selectionproblem, the genemhxationopemtor would first determhrehow to choosebetweenIBM- andMacintosh-compatiblemachines, and then generaterules for distinguishing among the machinesin eachclass,and that classalone. The rule for a specific IBM-compatible machine.does not attempt to differentiateit from a particularMacintosh;it will be based on the assumptionthat an IBMcompatible computeris to be selected.Due to this algorithm, the rules at eachlevel of abstractionwill tend to be more conciseand easierto interpret. Dynamic Quantization of Continuous Attributes In some respects,numerical attriiutes are similar to structuredattributes with multiple views. There am differentways to groupthe valuesinto discreteranges,awl unless definitive backgroundknowledgeis available, the optimal organizationfor a particular discoveryproblem may not be apparent. Even when numeric values have beenquantizedinto rangesby a domain expert, the expeftgeneratedabstractionschemamay not be optimal for the learningtask (e.g.,Kami & Loksh 1996). Furthermore,a particularsetof raugesmay be usefulfor one learningtask, but irrelevantto another. The implemented methodology performs automatic discretization of numeric data using the ChiMerge algorithm (Kerbes1992). With this method, neighboring distinct valuesof the attributefound in the dataare merged into single ranges based on a x2 analysis of the ClassGcationof the values. When the classification patternsof adjacentrangesare statisticallydependenton one another,the rangesare combinedinto one. One importantconsequence of this algorithm is that the grouping of values into intervals will dependon the classificationof the input data. Specifically, given a new learningproblem from the samedataset, the training data will likely be groupedinto classesmuch differently, nd thus the set of rangesof a given attribute most likely to generateconcise and useful knowledge may be much different. For example,given an auto insurancecustomer database,theremay be little correlationbetweenthe set of accident-proneclients and the set of customerswho m likely be interestedin purchasinga serviceofferedby the company. It is quite possible that the rangesinto which the driver’s agesaredividedthat aremostusefulfor the first learningtask are not appropriatefor the secondone. To combatthis problem,ChiMerge recomputesthe rangesfor eachnumeric variablewhenevernew sets of datam added or a new discoveryproblemis specified. A limitation of the ChiMerge methodologyis that it is dependenton the classificationof input training examples, and that it thereforecan not be usedto quantizean output variable(sincethat would dependon its alreadyhavingbeen divided into classes).Thereare severalwaysthat an output numeric variable can be disc~tized into a hierarchy of ranges. A domainexpertcan suggestranges,rangescan be detfmined basedon someotheroutput variable,conceptual clusteringcan be used,etc. A discussionand comparison of thesemethodswill be addressedin a forthcomingpaper. Experiments The aboveideasand algorithmshave beenimplementedin the IWEN-2 system for knowledge discovery in data. INIEN-2 belongs to the INLEN family of knowledge discoveryprogramsthat use an integraa multi-operator approachto knowledgediscovery(Kaufman,Michalski, & Kerschberg1991; Michalski et al. 1992). This section illustrates an application of the presentedmethods to a problem of knowledgediscoveryin world economicsand dB.llOgl%@iCS. The predominantreligion of diffemt countries, as specifiedin the PEOPLE da&baseof the Workl Factbook publishedby the CentralIntelligenceAgency,is used as an exampleof a structuredattribute. The set of valuesfound in the data contain some natural inclusion relationships. For example, there co-exist labels of “Christian”, “Protestant” and ‘Zutheran” in the da&base. IIhe organization of some of this attribute’s values when structuredis shownin Figure 3. Religion Muslim Jewish Buddhist Sunni Sbi’a Ibadhi Shinto Christian Indigenous R. Catholic Protestant Orthodox Figure 3. Partof the structureof the PEOPLEdatabase’s Religion attribute If the Religion attribute were set up in an unstmctmed manner, the statement“Religion is Lutheran”would be regardedequallyas antitheticalto “Religion is Christian”as to the statement“Religion is Buddhist,”leadmg to the possibility that some contradictions(such as “Religion is Lutheran,but not Christian”) might be discovered. Experiments using INLEN- have indicated very interesting findings regardingthe usageof structuredand non-structuredattributes. Among the findings regarding their use as independentvariables were that structuring attributesled to simpler rules than when the attributeswere not structures,and that certain patternswedeonly found when the domains were structured. These findings are illustratedby the resultsbelow: When INLEN- leamed rules to distinguish the 55 countries with low (less than 1 per 1000 people) population growth mte (PGR) from other countries, in a version of the PEOPLE databasein which the attribute “Religion”wasnot structured,one of the rules it found was as follows: PGR<I@ (20 examples) Literacy = 95% to 99%, Life Expectancy is 70 to 80 years, Religion is Roman Catholic or Orthodox or Romanianor Lutheran or Evangelical or Anglican or Shinto, Net Migration Rate I +20 per loo0 people. Spatial, Text, & Multimedia 235 This rule was satisfiedby 20 of the 55 countrieswith low growth rates. When I. a..,.. the W” ame -I evnerimc?nt “r.tR”-‘~-’ wear ., - rnn --- with ., ^- “Rdioinn” -“emaD ---- structura a similarrule wasdiscover& (4 examples) Religion is &w&Muslim g Literacy = liW90 or less than 3W0, ZnfantMortality Rate is 25 to 40 or greater than 55per looOpeople, Fertility Rate is 1 to 2 or 4 to 5 or 6 to 7, Population Growth Rate is 1 to 3 or greater than 4. PGR<I$ (I4 examples) Literacy = 95% to 99%, Life ExZxctancy is 70 to 80 years, Religion is Roman Catholic or Orthodox or Shinto, Net Migration Rate I +I0 per IO00 people. This rule was satisfied by 14 of the 55 low-growth countries. The removal of some of the Protestant subclassesfrom the Religion condition and the tightening of the Net Migration Rate condition causedsix countries which had beencoveredbv the first rule, not to satisfy this rule. At a slight cost -in consistency, simpler, more encompassing rules than eitherof the abovewere found: PGR<l$ (21 examples,1 exception) Literacy = 95% to 99%, Life Expectancy is 70 to 80 years, Religion is Christian or Shinto, Net Migration Rate I +lO per 1OOOpeopk Even when not relaxing the consistencyconstraint, a conciserule not generatedin the unstructureddatasetwas discoveredthat descrii 20 countries including the six omittedin the structuredrule: PGRcl$ (20 examples) Birth Rate = 10 to 20 per loo0 people, Death Rate is 10 to IS per IIKK)people. Similar differenceswereobtainedusing structuredoutput variables. By classifying events at different levels of generality,rules can classifyboth in generaland in specific terms. This tendsto reducethe complexityand increasethe significanceof the rules at all levelsof generalization. In the PEOPLE da&baseit is difficult to learn a set of rules for predictinga country’s likely predominantreligion given other demographicattributes without using the hieraichies. Thereare3u on ""'11-1 X__^ rGllgious ""r""-"2"^ l.a~gullGs ,:..&^A llsK4J A.". 101 a." UIG 170 countries. The conditions making up the rules will likely have very low support levels (informational significance, &fined as number of positive examples satisfying the condition divided by total number of examples satisfying the condition) because a single condition that exists in most of the countrieswith a certain pmdomhntntreligion will typically be founQ at least occasionally,elsewhere. When learning rules for distinguishing between the religions in an unstructuredformat, the highest support level found in the entirerule basewas 37% for the awkward condition“Literacy is 70-90%or 95-99%“,which descdbed 26 of the 50 Roman Catholic countries,and 43 of the remainderof the world’s countries. In contrast structuring the classification of religions led to several top-level conditionswith higher supportlevels. One condition,with a 63% support level in its ability to distinguish the 88 Christiancountrieswas “PopulationGrowth RateI 2”. More drasticeffectswem seenat the lower levels of the hierarchy. In the unstructureddataset,five rules, eachwith two to five conditions,wererequiredto definethe 11 Sunni 236 KDD-96 Muslim countries. The only one to descrii mom than two of the 11 countries was this set of fragmented CQ~~~~~p~~ The rangesin eachof the four conditionsare dividedinto multiple segments,suggestingthat this is not at all a strongpattern. In contrast,the structureddatasetpmduced two rules, eachwith one condition, to distinguish Sunni Muslii countries from other predominantly Islamic nations. The fast, “Religion is Sunni_Muslii if Infant Mortality Rate2 40”, describedall but one of the positive examples(Jordanwas the exception),and only one nonSunni Islamic nation. The second, “Religion is Sunni-Muslim if Birth Rate is 30 to 40” alone discriminatedAlgeria, Egypt, Jordanand Tajikistan from * the rest of the Muslim world. Experiments also mdicate the practical utility of problem-orientedquantixation of continuous variables. Advantagesinclude simpler rules that will often have higher acunacy than with a fixed discretizationschema. xxn%“., ,&..-.-,..~A,:..~ rl:Fplurr..c -&r...o rrc I\n VvllGuI;lliu&lG*m‘ 1gUalLG~bxlLL ,G&Ljl”,W “A Aa IUG..,n..,A W ”IIUkn”Lwl LJU an economicdatabase,INLEN- was able set thresholds that would generateknowledgewith high support levels. For example, the allocation of a country’s GNP to agricultureof greaterthan 28% (with that numbergenemted by &Merge) was a useful indicator for distinguishing EasternAfrican countriesfrom otherregionsof the world. Conclusion This paperdescribessome novel featuresimplementedin 4l.A mlTChTr) n.m+,3... ..,,aAn. UWWVS,IJ. A:on,-.~,or.rStNCti UlG LIWAIXI’& a,‘WZUlF,... L”, bn,. NL”WIGU~-Z and numeric domainssharethe common trait that many organizationalschemasare possible, and the selectionof one can have an impact on the successof the discovery process.Structuredvariablesprovide a very useful method for providing learning programs with background information about a feature domain, when such is available. A schemafor structuring an attribute may be providedeitherby a domainexpertor by a learningsystem. The implementationof structuredattributes can introduce the conceptsof generalization,agglomeration,anchornodes andmultiple domainviews to a discoverysystem. In largedatabases the structuringof nominal or numeric attributes can assist in the discoveryprocess. In most attribute domainsin which theream more than just a few distinct values, structuringof the domain is usually both possibleandrecommended.Therewill generallybe a way to organize the values tig to some classification schema.This allows the import of backgroundlmowledge, evento thoseempiricaldiscoveryenginesthat traditionally rely on a minimum of backgroundknowledge. This domainknowledge,in turn, can result in the discoveryof relationships mom attuned to the user’s existing understandingof the backgtounddomain through such techniquesas the defmition of author nodes. Attribute ^L_“A.-“z-- -ic- &so “* u@.%Jj suur;-,l: for olltput “&-&&* Doing so providesa meansfor separatingthe tasks of determining the general class of the decision and de&mining the specificdecisionwithin that class. By allowing multiple npresentationsof structumdand numeric da@ &ptive representationselection may be possible, potentially leading to discoveringrelationships that cau alter a user’s preconceivednotions about the domain. The learningenginecan selectmpresentations far the attributedomainsafter, ratherthan beforethe learning. Theserepresentations may includenot only variations on one basis for classification, but also orthogonal classification hierarchies. Similarly, problemoriented quantizationof numericdatamay enhancethe likelihood of usefulresultsin continuousdomains. Techniquessuch as anchornodesand multiple domain views can help cmate an environment in which a representationspacesuitable to the problem is selected, while attentionis focusedon the levels of abstractionof greatestutility to the user. One areafor future researchis the developmentof a representation of a node’s significance beyond a simple anchor/non-anchorvalue, and the explorationof thresholdsfor determiningthe proper level ACA,...^..,.,:-“d.z”..ill2..suul “..“L all-- GlIvuolllllGllL. ---2”” -I”- L “I g011wuluauu11 Acknowledgments This msearchwas conductedin the Machine Learning md InferenceLaboratoryat GeorgeMason University. The Laboratory’sresearchis supportedin part by the Defense AdvancedReseat& Projects Agency under Grant No. NOOO14-91-J-1854 administeredby the office of Naval Research,in part by the Defense AdvancedResearch ProjectsAgency uuderGrantsNo. F49620-92-J-0549ad CArlLeflnr 1 IlALrlzlullllll13~lcu ..TL.“:,:“.-“A l.-. &I...*1- rwc;t; r)--- Ulll~ -42”” VI“C I“T7uLu-73-L-vwL uy UlG AlI ScientificResearch,in part by the Oftice of Naval Research under Grant No. NOOO14-91-J-1351, and in part by the National Science Foundation under Grants No. DMI9496192and IRI-9020266. References Fisher, D. 1987. KnowledgeAcquisition via Incremental ConceptualClustering. MachineLearning,2139-172. Fisher, D. 1995. Optimization and Simplification of ““. Hierarchical Ciusterings. Proceedingsof the First Intemational Conferenceon KnowledgeDiscovery and Data Mining (KDD-95), Montreal, PQ, 118-123. Karni, R. and Loksh, S. 1996. GeneralizationTrees as DiscoveredKnowledge for Manufacturing Management. Forthcoming. Kaufinau, K., Michalski, R.S. and Kerschberg,L. 1991. Mining For Knowledge in Data: Goals aud General Descriptionof the INLEN System. In Piatetsky-Shapiro, G. and Frawley, W.J. (Eds.), Knowledge Discovery in Databases,Menlo Park,CA: AAAI Press,449462. Kerber, R. 1992. ChiMerge: Discretizationof Numeric Attributes. Proceedingsof the Tenth National Conference on Artificial Intelligence (AAAI-92), San Jose, CA, 123127. Klimesch, W. 1988. Struktur und Aktivierung des Geaaechtnisses.Das Vemetzungsmodell:Grumilagenturd Elementeeiner uebergreifendenTheorie. Bern:VerlagHans Huber. Kubat, M., Bra&o, I. and Michalski, R.S. 1996. A Review of Machiie Learning Techniques. Chapter in Methods and Applications of Machine Learning a& Discovery (forthcoming). Michalski, R.S. aud McCormick, B.H. 1971. Interval Generalizationof Switching Tkny. Proceedingsof the 3rd Annual Houston Conferenceon Computer and System Science, Houston,TX. MichaIski, RS. 1980. InductiveLearningas R&-Guided Generalizationand ConceptualSimplification of Symbolic Descriptions: Unifying Principles and a Methodology. Workshopon Current Developmentsin Machine Learning, CarnegieMellon University, Pittsburgh,PA. Michalski, R.S. 1983. A Theory and Methodology of Inductive Learning. Chapter in Michalski, R.S., C’arbonell,J. and Mitcheii, T. @ds.j, Mu&&? ieam$zg: An Artificial Intelligence Approach, Palo Alto: Tioga Publishing,Co., 83-134. Michalski, R.S. (1994). Inferential Theory of Learning: Developing Foundations for Multistrategy Leaming. Chapterin Machine Learning: A Multistrategy Approach, Michalski, R.S. and Tecuci, G. @ds.), San Francisco: MorganKaufmaun,3-61. Michalski, R.S., Kerschberg, L., Kaufman, K. aud Ribeiro, J. 1992. Mining for Knowledgein Databases: The INLEN Architecture,Initial Implementationand First Cl..“.“,“. L.rnt,:,,,r rrbJ”,‘ L.,f-“....“r:-D.xn..,*c. L.......“l “J ..a- mterccgerc~ RGDU1l.a. J”l4lluu ruw” ~y,yswm,L ZntegratingAI and Database Technologies,l(1): 85-l 13. Michalski, R.S., Mozetic, I., Hong, J. and Lavrac, N. 1986. The AQl5 Inductive Learning System: An OverviewandExperiments. Report No. UIUCDCS-R-86 1260, Departmentof Computer Science, University of Illinois, Urbana,IL. Michalski, R.S. and Stepp, R.E. 1983. Automated Construction of Classifications: Conceptual Clustering versus Numerical Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(4): 396410. * ^“.-f”:.z-v -“ss.,” 2”” n,:,1., D c 400” ntilluw, n.E. IYOV. NlUWKUgG tw#Jls1u011 aiii RefmementTools for the Advise Meta-Expert System. Master’s Thesis, University of Illinois at UrbanaChampaign. Rosch,E., Mervis, C., Gray, W., Johnson,D. and BoyesBmem, P. 1976. Basic Objects in Natural Categories, Cognitive Psychology, 8:382-439. Sokal, R.R. and Sneath, P.H. 1973. Principles of Numerical Taxonomy. San Francisco:W.H. Freeman.