A Method for Reasoning with ... in the INLEN- Multistrategy Knowledge Discovery System

From: KDD-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.
A Method for Reasoning with Structured and Continuous Attributes
in the INLENMultistrategy Knowledge Discovery System
Kenneth
A. Kaufman
and Ryszard
S. Michalski*
MachineLeamingandInferenceLabomtoryp
GeorgeMasonUniversity,
Fairfax,Virginia, 22030,USA
(iuldlan, michalsk)@aic.gmu.e4lu
* Also GMIJ Departmentsof ComputerScienceand SystemsEngineering
and the Instituteof ComputerScience,PolishAcademyof Sciences
Struck
ptialiy
.attihntm
..IaY”IY
Abstract
attributes have domains (value sets) that are
Such
odered sets, typically hierarchies.
SIlhW ”
..a&”
Im~WlPA~~
-u
“a-6”
AifCn”PCIII VA J
“AOIV
pq-gmg
to
incorporate background knowledge about hierarchical
relationships among attribute values.
Inductive
generalization rules for structured attributes have been
developed that take into consideration the type of nodes
in the domain hierarchy (anchor or non-anchor) and the
type of decision rules to be generated (characteristic,
discriminant or miniium
These
complexity).
generalization rules enhance the ability of knowledge
discovery system INIEN-2 to exploit the semantic
content of the domain knowledge in the process of
generating hypotheses. If the dependent attribute (e.g., a
decision attribute) is structured, the system generates a
system of hierarchicaliy organ&d ruies representing
relationships between the values of this attribute and
independent attributes. Such a situation often occurs in
practice when the decision to be assigned to a situation
can be at different levels of abstraction (e.g., this is a
liver disease, or this is a liver cancer). Continuous
attributes (e.g., physical measurements) am quantized
into a hierarchy of values (ranges of values arranged into
different levels). These methods am illustrated by an
example concerning the discovery of patterns in world
economics and demographics.
Introduction
Most symbolic learning systems representinformation
about objects or events in the form of attribute-value
vectors. Attributes usedin these systemsare typically
numerical(i.e., havetotally oulemdvalue sets)or nominal
(i.e., haveunordemdvalue sets). In someapplicationsit is
useful to use attributeswith value sets that are partially
order& for example,representing
a genemlization
hierarchy
of concepts. Theoretically,almost any attribute can be
viewedashavinga hierarchicaldomain. For example,the
domain of a numeric
q&&]e
“&*
ra
&
@it
iEt&
8
hierarchicalIyorganizedset of classes-specilicnumeric
valuesat the lowestlevel; toddler,child teen,young adult,
middleage,seniorand very senior at the secondlevel; ti
possibly young, adult and mature at the third level. To
represent such concepts, a system needs background
232
kDD-96
knowledgethat relatesthe numericalage with the higher
level concepts.In general,the structureof the domain does
not
. ..haveto be fixed; it may be changingwith the context
or me problemat hand.
Structuring attributes can prove advantageousfor a
knowledgediscoverysystem. It allows facts, trendsand
regularitiesto be mmied both at high and low levels of
abstraction,andfor backgmundknowledgeto be storedand
generalizations
to be ma& at the appropriatelevels.
The ideaof groupingvaluesof attributesinto a structure
of classes in order to reflect semantic relationships
characteristicto the given applicationdomainhasled to the
introductionof structuredattributes(Michalski 1980). The
domainof a structuredattribute is a partially a&red set,
tvnicallv
can
*I Domains
~ ------__-L of
_- structu& variables
.-----&
-~I=
I ahierarchv
be gene
by a domain expert (e.g., Kami & Loksh
1996), or by an automaticprocessusing a numericalor
conceptualclusteringmethod(e.g., Sokal & Sneath1973;
Michalski & Stepp 1983; Fisher 1987). The structureof
the domain can be modifred to suit a specific class of
problems(e.g.,Fisher 1995). Structumdattributescan be
usedas independent
(input) variables(Michalski 1980), as
well as dcpcndcnt(output) variables(e.g., Reinke 1984).
The rolesof structuredvariablesin thesetwo casesdiffer.
This paper discussesmethods for reasoning with
strut
attriiutes in the processof data analysis and
‘knowledgediscovery. ‘Many of the presentedideashave
beenadaptedfor knowledgediscoveryfrom the Inferential
Theoryof Learning,which providesa unifying framework
for characterizing learning and discovery processes
(Michalski, 1994). Among the novel ideasam the use of
“anchor nod& to guide the inductive generalization
process,the use of non-hierarchicalattributedomains,and
the introductionof differentkindsof genemhzation
rulesfor
structuredattributes. The use of continuousvariablesin
reasoningis closely relatedto that of structmedattributes
in that the way their domainsare structuredis tailored to
the particulardiscoveryproblem. The presentedideasamd
methods have been implemented in the lNLEN-2
knowledgediscoverysystem, and am illustrated by an
applicationto a knowledgediscoveryproblem in world
demographics.
The Extension-Against
Operator
Three Types of Descriptions
and
An inductive generalizationruie (or transmutationjtakes
input information and background knowledge, and
hypothesizesmore general knowledge (Michalski 1980;
1994). For example,droppinga condition from a decision
rule is a generalizationtransmutation.
A powerfulinductivegeneralizationrule usedin the AQ
learningprogramis the extension-againstoperator. If rule
Rl: [x1 = al 8z CTX characterizespositive concept
examplesE+ and rule R2: [x,= b] & CTX chamcteh
negativeexamplesE’ (wherethe CTK, a context, stands
for any additionalconditions), then the extensionof Rl
againstR2
Rl -I R2
producesR3: [xi # b], which is the maximal consistent
generalizationof Rl (Michalski & McCormick 1971;
Michalski 1983).
By qeatiug the extension-againstoperator until the
resulting rule no longer covers any negativeexamples,a
consistentconceptdescription(one that coversno negative
examples)canbe generated.Such a processcau appliedto
generatea description (cover) that is complete and
consistentwithregaul to all the training examples(i.e., it
coversall positiveexamplesand no negativeexamples).
Another important conceptthat needsto be exp-Gned
beforeintroducingthe centralideasof this paperis the type
of description, as defined in the AQ rule learning
methodology(Michalski et al. 1986). By applying the
extension-againstoperator in different ways, one can
generatea range of descriptionswith different degreesof
generality.Hem we distinguishthree types of descriptions:
1) discriminant cover$ (in which the extension-againstis
usedto createmaximal generalizations;such descriptions
specify minimal conditions to discriminatebetweenthe
conceptandnon-conceptexamples);2) characteristiccovers
(in which the extension-againstis specializedto create
maximally specific genera&ations; such generaliitions
specifythe maximal numberof conditionsthat chamctcrize
positive examples);and 3) minimal complexitycovers (in
which the extension-against
is usedto createthe simplest
possiblegeneralizations).
To illustratetheseconcepts,considerthe diagramshown
in Figure 1, which represents a two-dimensional
representationspace spannedover three-valuedattribute
Shape(Sh) and six-valuedattribute Color (Co). Positive
and negative conceptexamples am marked by + and -,
respectively.The characteristic
(maximally speciflr)
~~~I~~
~.~-.- * area
~~cover
It states:
C&r is rea’&
is represented
by thc snaoea
yellow or green, and Shape is square or triangle. A
discriminantdescription(maximum generahzation)of the
sameset of examplesstates:Color is red, yellow, green or
white (or, equivalently,Color is not blue or orange). A
minimum complexitydescriptionof theseexamplesstates:
Color is red or yellow or green. The threedescriptionsare
equallyconsistentwith the five input examples.
ThesethmetypesofdescriptionsareusedintheNLEN2 system for knowledgediscoveryin databases.In the
following section,we presentan extensionof thesei&as to
descriptionswith structnredattributes(both as independent
(input) and dqendent (output) variables) and continuous
independentattributes.
Color is red or yellow or green, and Shapeis squareor
tritlngie
Figure 1. A characteristiccover of E+ vs. E-.
Generalization
Rules for Structured
Attributes
Structured Input Variables
In o&r to apply the previously definedextension-against
operatorto structuredattributes, new generalizationrules
ricedto be defied. Let us illustrate the problem by an
examplethat usesa suucmmdattribute “Food” shown in
Figure 2. Each non-leaf node denotesa conceptthat is
more generalthan its chiklren nodes. These relationships
riced to be taken into considerationwhen developing a
genemhzationof some facts. Supposethe conceptto be
learnedis exemplified by statements:“John eats strip
ntm-.lP
L&w
(‘Tnhn
J”ll‘l
Ar\ncw.*t
-
an+
c GaL
. ..-.r..lln
“QLuua
:M
1w
,...a,.C11-.
”
krtnn.,
EZstent generalizationsof thesefacts exist, for e&z:
that John eats strip steak, steak, cattle, meat, meat or
vegetables,or anything but vanilla ice cream. The first
statementrepresentsa maximally specific description,the
last statementrepresentsa maximally geneml description,
and the remaining ones representintermediatelevels of
genemlization. A problem arises in determining the
generalizationsof most interest. We approach this
problemby drawinginsightsfrom humanreasoning.
We tend to assign different levels of significanceto
nodes in a generalizationhierarchy. Some cognitive
scientistsexplain thii in part with the ideaof basic level
nodes,whosechildrensharemany sensoriallymcognizable
commonalities(l&he et al. 1976). Other factorsthat ate
important for characterizingnodes are concept typicality
(how common am a concept’sfeaturesamong its sibling
concepts),and the context in which the conceptis being
used(Klimesch 1988;Kubat, Bratko, & Michalski 1996).
Spatial, Text, 6r Multimedia
233
o e
A
Rocky Road
Anchor nodesare shownin bold Nodesmarkedby + and are valuesoccurringin positiveand negativeexamples,
respectively.
Figure 2. The domainof a structuredattribute“Food.”
To capturesuch preferencessimply, we introduce the
idea of anchor nodes in a hierarchy. Such nodes are the
ones that are desirableto use for the given task domain.
To illustrate this idea, considerFigure 2 again.
In the presentedhierarchy, vanilla and rocky road are
kinds of ice cream;ice creamis a frozendessert,which is a
dessert,which is a type of food. In everyday usage,
dependingon the context,we will typically think of them
as ice cream or dessert,but not so typically as l3ozen
dessertor food.
In designinga knowledgediscoverysystem, we should
be able to encodethe contextualsignificanceof the nodes
into the knowledgerepresentationof the system, so that
the cdeatedrules will represent desirable levels of
abstraction. To this end, nodes in the hierarchy that am
considemdto be at preferablelevels of abstraction am
-.x.l,,.A
.x..uJK.Ilu‘
“~“I..*..I-J.
rP.#?x,”
11ull- aa
Given the information which nodes am anchors zmd
which are not, different typesof descriptionscan be created
during the generalizationphase of knowledge diver-y.
The meaningof diiinant
and characteristicdescriptions
needsto be properly defined. In building a characteristic
description, the following rule is assumed:General&
positive examplesto the next higher anchornode(s)if this
maintains consistencyconditions. For example, consider
the nodes in bold in Figure 2 to be anchors. Then in
chamcuzisticmode, the extensionof the positive attribute
value “Strip” againstthe negativeattribute “Vanilla” would
generalize to “Steak.” In building a discriminant
description from these examples, attribute values am
general&xlto the most gene& value that doesnot cover
the nearestanchornodeto the value of a negativeexample.
In the example above, the positive value “Strip” would
genemlixe(extend)againstthe value “Vanilla” to “not Ice
Cream.” In building the equivalent of minimum
complexity descriptions,any of the intermedii anchor
nodescouldbe usedif this would simplify the description.
For instance,one generalizationrule would be to generalize
234
KDD-96
positive examplesto the highest anchor nodes possible,
suchthat consistencyis maintained. In this example,the
&dpGon of ,3,,. wo&i gene
m “GMe&F
Another featum to hmXeasethe power of repmsenting
structuredvariablesis that one doesnot needto limit the
domain of structuredattributes to strict hiemrchies (one
parentfor everynode). Instead,an arbitrary lattice may be
usedwhen desimd. This featureis useful for representing
overlappingor orthogonal classificationhierarchies. ‘Ibe
natureof the particularproblem determineshow the nodes
are gexmalkml. The system genemhzesthe nodes in the
way that producesthe most desirable(for example, the
simplest)description.
Generalizationin structureddomains doesnot incmase
the complexity of the extension-againstoperator,save far
the fact that the internal n&s in the hierarchymust now
be among the attribute’s legal value set, while in an
unstructureddomainthey may or may not be present. Any
incmasein discovery complexity will come during the
postprocessing,when different possible generalizationsof
the discov& cover will be examined;in the worst case,
this will be boundedin number of applications by the
number of internal nodes in the tree times the tree’s
maximum height.
StructurecS Output Vadabies
Typically, dependentvariablesam numeric or nominal. In
the latter case, values of an independentvariable zxe
independentconcepts. Such a representationfails to take
advantageof any genemhxationhierarchiesthat may exist
amongvaluesof a dependentvariable. For example,when
selectinga personalcomputerto buy, the on&date models
may be group& into IBM-compatible and Macintoshcompatible. Rather than just choosing one system from
among the entire set, it may be simpler to organizethe
knowledgeto choosefirst the general type, and then the
---AC.. --A-l -C.L- ilswluGu~ylx;.
-.....-..A L-^ ‘1llls1GNIs
FL:- ,..-A- __^
.^ CLs~llGluuuGlurulG
us w
UlG
needfor using structnmdattributes as depzidentvariables
and &fining an appropriatemethod for handling them in
the processof rule searchand generaliion.
Given a structured dependent variable (a decision
variable),decisionrules or descriptionscan relate to nodes
of the dependentvariableat different levelsin the hierarchy.
The proposedmethod focusesFist on the top-level nodes,
creatingrules for them. Subsequently,rules ate createdfor
the descendantnodesin the context of their ancestors. For
example,in the caseof the computer selectionproblem,
the genemhxationopemtor would first determhrehow to
choosebetweenIBM- andMacintosh-compatiblemachines,
and then generaterules for distinguishing among the
machinesin eachclass,and that classalone. The rule for a
specific IBM-compatible machine.does not attempt to
differentiateit from a particularMacintosh;it will be based
on the assumptionthat an IBMcompatible computeris to
be selected.Due to this algorithm, the rules at eachlevel
of abstractionwill tend to be more conciseand easierto
interpret.
Dynamic Quantization of Continuous Attributes
In some respects,numerical attriiutes are similar to
structuredattributes with multiple views. There am
differentways to groupthe valuesinto discreteranges,awl
unless definitive backgroundknowledgeis available, the
optimal organizationfor a particular discoveryproblem
may not be apparent. Even when numeric values have
beenquantizedinto rangesby a domain expert, the expeftgeneratedabstractionschemamay not be optimal for the
learningtask (e.g.,Kami & Loksh 1996). Furthermore,a
particularsetof raugesmay be usefulfor one learningtask,
but irrelevantto another.
The implemented methodology performs automatic
discretization of numeric data using the ChiMerge
algorithm (Kerbes1992). With this method, neighboring
distinct valuesof the attributefound in the dataare merged
into single ranges based on a x2 analysis of the
ClassGcationof the values. When the classification
patternsof adjacentrangesare statisticallydependenton one
another,the rangesare combinedinto one.
One importantconsequence
of this algorithm is that the
grouping of values into intervals will dependon the
classificationof the input data. Specifically, given a new
learningproblem from the samedataset, the training data
will likely be groupedinto classesmuch differently, nd
thus the set of rangesof a given attribute most likely to
generateconcise and useful knowledge may be much
different. For example,given an auto insurancecustomer
database,theremay be little correlationbetweenthe set of
accident-proneclients and the set of customerswho m
likely be interestedin purchasinga serviceofferedby the
company. It is quite possible that the rangesinto which
the driver’s agesaredividedthat aremostusefulfor the first
learningtask are not appropriatefor the secondone. To
combatthis problem,ChiMerge recomputesthe rangesfor
eachnumeric variablewhenevernew sets of datam added
or a new discoveryproblemis specified.
A limitation of the ChiMerge methodologyis that it is
dependenton the classificationof input training examples,
and that it thereforecan not be usedto quantizean output
variable(sincethat would dependon its alreadyhavingbeen
divided into classes).Thereare severalwaysthat an output
numeric variable can be disc~tized into a hierarchy of
ranges. A domainexpertcan suggestranges,rangescan be
detfmined basedon someotheroutput variable,conceptual
clusteringcan be used,etc. A discussionand comparison
of thesemethodswill be addressedin a forthcomingpaper.
Experiments
The aboveideasand algorithmshave beenimplementedin
the IWEN-2 system for knowledge discovery in data.
INIEN-2 belongs to the INLEN family of knowledge
discoveryprogramsthat use an integraa multi-operator
approachto knowledgediscovery(Kaufman,Michalski, &
Kerschberg1991; Michalski et al. 1992). This section
illustrates an application of the presentedmethods to a
problem of knowledgediscoveryin world economicsand
dB.llOgl%@iCS.
The predominantreligion of diffemt countries, as
specifiedin the PEOPLE da&baseof the Workl Factbook
publishedby the CentralIntelligenceAgency,is used as an
exampleof a structuredattribute. The set of valuesfound
in the data contain some natural inclusion relationships.
For example, there co-exist labels of “Christian”,
“Protestant” and ‘Zutheran” in the da&base. IIhe
organization of some of this attribute’s values when
structuredis shownin Figure 3.
Religion
Muslim
Jewish Buddhist
Sunni Sbi’a
Ibadhi
Shinto Christian
Indigenous
R. Catholic Protestant Orthodox
Figure 3. Partof the structureof the PEOPLEdatabase’s
Religion attribute
If the Religion attribute were set up in an unstmctmed
manner, the statement“Religion is Lutheran”would be
regardedequallyas antitheticalto “Religion is Christian”as
to the statement“Religion is Buddhist,”leadmg to the
possibility that some contradictions(such as “Religion is
Lutheran,but not Christian”) might be discovered.
Experiments using INLEN- have indicated very
interesting findings regardingthe usageof structuredand
non-structuredattributes. Among the findings regarding
their use as independentvariables were that structuring
attributesled to simpler rules than when the attributeswere
not structures,and that certain patternswedeonly found
when the domains were structured. These findings are
illustratedby the resultsbelow:
When INLEN- leamed rules to distinguish the 55
countries with low (less than 1 per 1000 people)
population growth mte (PGR) from other countries, in a
version of the PEOPLE databasein which the attribute
“Religion”wasnot structured,one of the rules it found was
as follows:
PGR<I@
(20 examples)
Literacy = 95% to 99%,
Life Expectancy is 70 to 80 years,
Religion is Roman Catholic or Orthodox or
Romanianor Lutheran or
Evangelical or Anglican or Shinto,
Net Migration Rate I +20 per loo0 people.
Spatial, Text, & Multimedia
235
This rule was satisfiedby 20 of the 55 countrieswith
low growth rates.
When
I. a..,..
the
W”
ame
-I
evnerimc?nt
“r.tR”-‘~-’
wear
., -
rnn
---
with
., ^-
“Rdioinn”
-“emaD ----
structura a similarrule wasdiscover&
(4 examples)
Religion is &w&Muslim g
Literacy = liW90 or less than 3W0,
ZnfantMortality Rate is 25 to 40 or
greater than 55per looOpeople,
Fertility Rate is 1 to 2 or 4 to 5 or 6 to 7,
Population Growth Rate is 1 to 3 or greater than 4.
PGR<I$
(I4 examples)
Literacy = 95% to 99%,
Life ExZxctancy is 70 to 80 years,
Religion is Roman Catholic or Orthodox or Shinto,
Net Migration Rate I +I0 per IO00 people.
This rule was satisfied by 14 of the 55 low-growth
countries. The removal of some of the Protestant
subclassesfrom the Religion condition and the tightening
of the Net Migration Rate condition causedsix countries
which had beencoveredbv the first rule, not to satisfy this
rule. At a slight cost -in consistency, simpler, more
encompassing
rules than eitherof the abovewere found:
PGR<l$
(21 examples,1 exception)
Literacy = 95% to 99%,
Life Expectancy is 70 to 80 years,
Religion is Christian or Shinto,
Net Migration Rate I +lO per 1OOOpeopk
Even when not relaxing the consistencyconstraint, a
conciserule not generatedin the unstructureddatasetwas
discoveredthat descrii 20 countries including the six
omittedin the structuredrule:
PGRcl$
(20 examples)
Birth Rate = 10 to 20 per loo0 people,
Death Rate is 10 to IS per IIKK)people.
Similar differenceswereobtainedusing structuredoutput
variables. By classifying events at different levels of
generality,rules can classifyboth in generaland in specific
terms. This tendsto reducethe complexityand increasethe
significanceof the rules at all levelsof generalization.
In the PEOPLE da&baseit is difficult to learn a set of
rules for predictinga country’s likely predominantreligion
given other demographicattributes without using the
hieraichies.
Thereare3u
on
""'11-1 X__^
rGllgious
""r""-"2"^
l.a~gullGs
,:..&^A
llsK4J
A.".
101
a."
UIG
170 countries. The conditions making up the rules will
likely have very low support levels (informational
significance, &fined as number of positive examples
satisfying the condition divided by total number of
examples satisfying the condition) because a single
condition that exists in most of the countrieswith a certain
pmdomhntntreligion will typically be founQ at least
occasionally,elsewhere.
When learning rules for distinguishing between the
religions in an unstructuredformat, the highest support
level found in the entirerule basewas 37% for the awkward
condition“Literacy is 70-90%or 95-99%“,which descdbed
26 of the 50 Roman Catholic countries,and 43 of the
remainderof the world’s countries. In contrast structuring
the classification of religions led to several top-level
conditionswith higher supportlevels. One condition,with
a 63% support level in its ability to distinguish the 88
Christiancountrieswas “PopulationGrowth RateI 2”.
More drasticeffectswem seenat the lower levels of the
hierarchy. In the unstructureddataset,five rules, eachwith
two to five conditions,wererequiredto definethe 11 Sunni
236
KDD-96
Muslim countries. The only one to descrii mom than
two of the 11 countries was this set of fragmented
CQ~~~~~p~~
The rangesin eachof the four conditionsare dividedinto
multiple segments,suggestingthat this is not at all a
strongpattern. In contrast,the structureddatasetpmduced
two rules, eachwith one condition, to distinguish Sunni
Muslii countries from other predominantly Islamic
nations. The fast, “Religion is Sunni_Muslii if Infant
Mortality Rate2 40”, describedall but one of the positive
examples(Jordanwas the exception),and only one nonSunni Islamic nation. The second, “Religion is
Sunni-Muslim if Birth Rate is 30 to 40” alone
discriminatedAlgeria, Egypt, Jordanand Tajikistan from *
the rest of the Muslim world.
Experiments also mdicate the practical utility of
problem-orientedquantixation of continuous variables.
Advantagesinclude simpler rules that will often have
higher acunacy than with a fixed discretizationschema.
xxn%“.,
,&..-.-,..~A,:..~
rl:Fplurr..c
-&r...o rrc
I\n
VvllGuI;lliu&lG*m‘
1gUalLG~bxlLL
,G&Ljl”,W
“A Aa
IUG..,n..,A
W ”IIUkn”Lwl
LJU
an economicdatabase,INLEN- was able set thresholds
that would generateknowledgewith high support levels.
For example, the allocation of a country’s GNP to
agricultureof greaterthan 28% (with that numbergenemted
by &Merge) was a useful indicator for distinguishing
EasternAfrican countriesfrom otherregionsof the world.
Conclusion
This paperdescribessome novel featuresimplementedin
4l.A
mlTChTr) n.m+,3...
..,,aAn. UWWVS,IJ.
A:on,-.~,or.rStNCti
UlG LIWAIXI’&
a,‘WZUlF,...
L”, bn,.
NL”WIGU~-Z
and numeric domainssharethe common trait that many
organizationalschemasare possible, and the selectionof
one can have an impact on the successof the discovery
process.Structuredvariablesprovide a very useful method
for providing learning programs with background
information about a feature domain, when such is
available. A schemafor structuring an attribute may be
providedeitherby a domainexpertor by a learningsystem.
The implementationof structuredattributes can introduce
the conceptsof generalization,agglomeration,anchornodes
andmultiple domainviews to a discoverysystem.
In largedatabases
the structuringof nominal or numeric
attributes can assist in the discoveryprocess. In most
attribute domainsin which theream more than just a few
distinct values, structuringof the domain is usually both
possibleandrecommended.Therewill generallybe a way
to organize the values tig
to some classification
schema.This allows the import of backgroundlmowledge,
evento thoseempiricaldiscoveryenginesthat traditionally
rely on a minimum of backgroundknowledge. This
domainknowledge,in turn, can result in the discoveryof
relationships mom attuned to the user’s existing
understandingof the backgtounddomain through such
techniquesas the defmition of author nodes. Attribute
^L_“A.-“z-- -ic- &so “* u@.%Jj
suur;-,l:
for olltput “&-&&* Doing so
providesa meansfor separatingthe tasks of determining
the general class of the decision and de&mining the
specificdecisionwithin that class.
By allowing multiple npresentationsof structumdand
numeric da@ &ptive representationselection may be
possible, potentially leading to discoveringrelationships
that cau alter a user’s preconceivednotions about the
domain. The learningenginecan selectmpresentations
far
the attributedomainsafter, ratherthan beforethe learning.
Theserepresentations
may includenot only variations on
one basis for classification, but also orthogonal
classification hierarchies. Similarly, problemoriented
quantizationof numericdatamay enhancethe likelihood of
usefulresultsin continuousdomains.
Techniquessuch as anchornodesand multiple domain
views can help cmate an environment in which a
representationspacesuitable to the problem is selected,
while attentionis focusedon the levels of abstractionof
greatestutility to the user. One areafor future researchis
the developmentof a representation
of a node’s significance
beyond a simple anchor/non-anchorvalue, and the
explorationof thresholdsfor determiningthe proper level
ACA,...^..,.,:-“d.z”..ill2..suul
“..“L all-- GlIvuolllllGllL.
---2”” -I”- L
“I g011wuluauu11
Acknowledgments
This msearchwas conductedin the Machine Learning md
InferenceLaboratoryat GeorgeMason University. The
Laboratory’sresearchis supportedin part by the Defense
AdvancedReseat& Projects Agency under Grant No.
NOOO14-91-J-1854
administeredby the office of Naval
Research,in part by the Defense AdvancedResearch
ProjectsAgency uuderGrantsNo. F49620-92-J-0549ad
CArlLeflnr 1 IlALrlzlullllll13~lcu
..TL.“:,:“.-“A l.-.
&I...*1- rwc;t;
r)--- Ulll~
-42”” VI“C
I“T7uLu-73-L-vwL
uy UlG AlI
ScientificResearch,in part by the Oftice of Naval Research
under Grant No. NOOO14-91-J-1351,
and in part by the
National Science Foundation under Grants No. DMI9496192and IRI-9020266.
References
Fisher, D. 1987. KnowledgeAcquisition via Incremental
ConceptualClustering. MachineLearning,2139-172.
Fisher,
D. 1995. Optimization and Simplification of
““.
Hierarchical Ciusterings. Proceedingsof the First
Intemational Conferenceon KnowledgeDiscovery and Data
Mining (KDD-95), Montreal, PQ, 118-123.
Karni, R. and Loksh, S. 1996. GeneralizationTrees as
DiscoveredKnowledge for Manufacturing Management.
Forthcoming.
Kaufinau, K., Michalski, R.S. and Kerschberg,L. 1991.
Mining For Knowledge in Data: Goals aud General
Descriptionof the INLEN System. In Piatetsky-Shapiro,
G. and Frawley, W.J. (Eds.), Knowledge Discovery in
Databases,Menlo Park,CA: AAAI Press,449462.
Kerber, R. 1992. ChiMerge: Discretizationof Numeric
Attributes. Proceedingsof the Tenth National Conference
on Artificial Intelligence (AAAI-92), San Jose, CA, 123127.
Klimesch, W. 1988. Struktur und Aktivierung des
Geaaechtnisses.Das Vemetzungsmodell:Grumilagenturd
Elementeeiner uebergreifendenTheorie. Bern:VerlagHans
Huber.
Kubat, M., Bra&o, I. and Michalski, R.S. 1996. A
Review of Machiie Learning Techniques. Chapter in
Methods and Applications of Machine Learning a&
Discovery (forthcoming).
Michalski, R.S. aud McCormick, B.H. 1971. Interval
Generalizationof Switching Tkny.
Proceedingsof the
3rd Annual Houston Conferenceon Computer and System
Science, Houston,TX.
MichaIski, RS. 1980. InductiveLearningas R&-Guided
Generalizationand ConceptualSimplification of Symbolic
Descriptions: Unifying Principles and a Methodology.
Workshopon Current Developmentsin Machine Learning,
CarnegieMellon University, Pittsburgh,PA.
Michalski, R.S. 1983. A Theory and Methodology of
Inductive Learning. Chapter in Michalski, R.S.,
C’arbonell,J. and Mitcheii, T. @ds.j, Mu&&? ieam$zg:
An Artificial Intelligence Approach, Palo Alto: Tioga
Publishing,Co., 83-134.
Michalski, R.S. (1994). Inferential Theory of Learning:
Developing Foundations for Multistrategy Leaming.
Chapterin Machine Learning: A Multistrategy Approach,
Michalski, R.S. and Tecuci, G. @ds.), San Francisco:
MorganKaufmaun,3-61.
Michalski, R.S., Kerschberg, L., Kaufman, K. aud
Ribeiro, J. 1992. Mining for Knowledgein Databases:
The INLEN Architecture,Initial Implementationand First
Cl..“.“,“.
L.rnt,:,,,r rrbJ”,‘
L.,f-“....“r:-D.xn..,*c.
L.......“l “J
..a- mterccgerc~
RGDU1l.a.
J”l4lluu
ruw” ~y,yswm,L
ZntegratingAI and Database Technologies,l(1): 85-l 13.
Michalski, R.S., Mozetic, I., Hong, J. and Lavrac, N.
1986. The AQl5 Inductive Learning System: An
OverviewandExperiments. Report No. UIUCDCS-R-86
1260, Departmentof Computer Science, University of
Illinois, Urbana,IL.
Michalski, R.S. and Stepp, R.E. 1983. Automated
Construction of Classifications: Conceptual Clustering
versus Numerical Taxonomy. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 5(4): 396410.
* ^“.-f”:.z-v -“ss.,”
2””
n,:,1., D
c 400”
ntilluw,
n.E.
IYOV.
NlUWKUgG
tw#Jls1u011
aiii
RefmementTools for the Advise Meta-Expert System.
Master’s Thesis, University of Illinois at UrbanaChampaign.
Rosch,E., Mervis, C., Gray, W., Johnson,D. and BoyesBmem, P. 1976. Basic Objects in Natural Categories,
Cognitive Psychology, 8:382-439.
Sokal, R.R. and Sneath, P.H. 1973. Principles of
Numerical Taxonomy. San Francisco:W.H. Freeman.