From: AAAI-87 Proceedings. Copyright ©1987, AAAI (www.aaai.org). All rights reserved.
Improving Inference Throu
Clustering
Douglas
Department
Fisher
of Information and Computer
University of California
Irvine, California
abstract
92717
conceptual
Conceptual clustering is an important way to summarize
data in an understandable manner.
However, the recency of
the conceptual clustering paradigm has allowed little exploration
of conceptual clustering as a means of improving performance.
Science
clustering
capabilities.
clustering,
classifications
produced
ing systems
by capturing
means of guiding inference
nodes and generating
inter-correlations
at classification
inferences as a by-product
tree
of classifica-
tion. Results from the domains of soybean and thyroid disease
diagnosis support the success of this approach.
can be a basis
seen object
properties.
This paper
tual clustering.
environmental
paper
learning
through
finement
is concerned
automated
[Dietterich,
with improving
knowledge
19821.
Learning
acquisition
filters
perforand re-
and incorpo-
measure,
rates environmental
observations
into a knowledge base
that is used to facilitate performance
at some task.
Assumptions
Section
the
environnnent,
knowledge
base,
ramifications
and performance task all have important
on the design of a learning algorithm.
This paper is concerned with conceptual clustering, a task of machine learning that has not been traditionally
discussed in the larger
context of intelligent processing.
Conceptual
clustering
systems
[Michalski and Stepp,
1983; Fisher, 1985; Cheng and Fu, 19851 accept a number of object descriptions (events, observations,
facts), and
produce a classification
scheme over the observed objects.
Importantly,
conceptual clustering methods do not require
the guidance of a teacher to direct the formation
of the
classification
(as with learning bm
examples), but use an
evaluation function to discover classes with good conceptual description.
These evaluation functions generally favor classes exhibiting many differences between objects of
different classes,
same
class.
surrounding
portant
and few differences
As with other forms of learning,
the conceptual
implications
Perhaps
ing clustering
between
clustering
task
objects
of the
the context
can have im-
on the design of these systems.
the most important
is the performance
contextual
factor
surround-
task that benefits
from
of classification
is manifest
in recent
[Clancey,
concerned
that
inference
called
during
issues - in
classification
classification.’
maximize
3 describes
der of the paper
from knowledge
the COBWEB
focuses
classification
ticularly
on soybean
COBWEB
egory utility
of object
the amount
used
This
19851,
of information
of class membership.
algorithm.
The remain-
on the utility
of COBWEB
gen-
trees for inference,
concentrating
par-
disease diagnosis.
uses a measure
of concept
[Gluck and Corter,
classes
trees
The following
[Gluck and Corter,
category utility
that
can be inferred
erated
as a
19841.
with performance
with the utility of COBWEB
classes
of un-
discussions
section motivates and develops an evaluation function
by COBWEB
to guide class and concept formation.
favors
about
The generality
cluster-
inference
describes the COBWEB
system for concepCOBWEB’s
design was motivated by both
and performance
concerns.
Bowever, this
is primarily
to facilitate
lUachine
by conceptual
for effective
as classification
of problem-solving
particular,
mance
do
[1985] have used clustering techniques to organize expert
system knowledge. Generalizing on their use of conceptual
This paper presents COBWEB,
a conceptual clustering system
that organizes data to maximize inference abilities. It does this
attribute
While most systems
not explicitly
address this task, exceptions
do exist.
In
particular,
Cheng and l?u [1985] and l?u and Buchanan
and concepts.
quality called cat-
19851 to guide formation
While
our primary
inter-
est in category utility is that it favors classes that maximize inference ability, Gluck and Corter originally derived
category
utility
as 8 means
observed
during
human
from
occurs
a psychological
construct
in hierarchical
be where inference
of predicting
classification.
effects
called the b&c
classification
abilities
certain
These
schemes
effects
stem
keel that
and seems to
are maximized.
‘COBWEB is ralso distinguished from other systems in that it is
incremental. Issues surrounding COBWEB’s performance as an incremental system are given in [Fisher, 19871.
Category
utility rewards information
rich categories
Category
and
for each category
easily developed) as a tradeoff of intra-class object similarity and inter-class
dissimilarity.
Objects are described in
attribute
terms of (nominal)
mation
attribute
and Size = large).
and class,
the same value (xi),
[Lebowitz,
class
similarity
19821
similarity
the larger
the
value pair,
Ai = Kj,
is measured
in terms
is measured
classes that
in terms
share this value,
,E;Z=l Ci Cj P(&
is a tradeoff
and
of partition
= Kj)P(Ck(Ai
tions,
to increase
of predictability
This function
predictiveness
quality.
values
subsume
exists
Inforprob-
conjunctive)
Probabilistic
these types of logical represent afrom probabilistic
This increased
generality
comes
constant
of storage
requirements.
COBWEB
that
than
incrementally
chical classification
is
and predic-
propriate
into
a
is formed,
where each node is a proba-
path,
updating
distributional
information
one of several
possible
along
operators
the in-
More precisely,
= V&)P(CklAi
objects
bilistic concept representing
an object class (e.g., Birds BodyCover
= feathers (1.0) and Transport = fly (0.88) and
. ..). The incorporation
of an object is basically a process of
classifying the object by descending the tree along an ap-
infrequently
as rewarding
class partitions.
incorporates
classification hierarchy. Given an initially empty hierarchy,
over an incrementally
presented series of objects, a hierar-
A. Placing an Object in an Existing Class
= I$) =
(B aYes rule), so by substitution
Perhaps
the
of objects
equals
class.
values
guessed for an arbitrary
That
the most
place
is, after updating
at the root,
the distribution
may
into
which child ‘best’
is tentatively
placed in each
child.
Finally,
Gluck and Carter define category
utility to
be the increase in the expected number of attribute
val-
a given node is evaluated using category utility. The node
to which adding the object results in the best partition is
ues that
the best existing
can be correctly
the expected
edge (C<Cj
=
guessed
number of correct
P(&
= Kj)2).
(P(Ck) C; Cj P(Ai
of a partition
{Cl,
=
. ..> Cn3, over
CV({Cr,
[CL P(CAG)
Ci Cj P(A = KjlCk)2] - Ci
Cj
Cz, . . . . C,,))
P(&
n
=
qj)’
.
The denominator,
n, is the number of categories in a partition, and averaging over categories allows comparison
of
different size partitions.
2This assumes a probabditty matching guessing strategy, meaning
that an attribute value is inessed with a probability equal to its probabiity of occuaring, as opposed to a probability maximizing strategy which assumes that the most frequently occurring value is always
guessed (see [Fisher, 19871).
$62
Machine Learning & Knowledge Acquisition
from adding the object
to
host for the new object.
33. Creating a New Class
guesses with no such knowl-
Formally,
that results
of attribute
be incorporated
To determine
the object
a partition
in an existing
ber of class Ck. a
V<jlCk)‘) given knowledge
The partition
way of updating
a new object
the object
children.
hosts a new object,
mem-
natural
is to simply
one of the root’s
Ci Cj P(A = vijlck) 2 is the expected number of attribute
values that can be correctly
distinguish
a simple mapping
the way, and performing
at each level.
can also be regarded
= KjlG)
above function
value distributions
is termed
19811.
are
= Vij(Ck),
predictability
note that for any i, j, and I%,P(Ai
p( Ck)P(Ai
as there
the proportionality
Specifically,
and predictiveness
occurring
of object
representation
and Medin,
at the cost of storing the probabilities,
each of which can
be computed from two integer counts, thus only increasing
in contrasting
= &)-?‘(A
the class conditioned
ference potential
on attribute
to logical representations.
summed for all classes’(k),
attributes
(i), and values (j).
The probability P(Ai = V&) weights the importance
of individual values, in essence saying that it is more important
tiveness of frequently
occurring values.
[Smith
and thus the more predictive
the value is of the class.
Attribute
value predictability
into a measure
Such a category
concept
represent ations
of P(Ck IA; = Ki);
the fewer objects
as is P( A; = V;j ICk) for all
abilistic concepts from the logical (generally
representations
typically used in AI systems.
and thus the more predictable
is of class members.
Inter-
value
this probability,
combined
Color = red
0f a partition,
values.
a probabilistic
probability P(A; = V&JCk); the larger this
t e greater the proportion
of class members
l-f
probability,
sharing
For an attribute
Ck, intra-class
of a conditio
- value pairs (e.g.,
given P(Ck) is known
utility can be computed
is thus of generic value, but it can also be viewed (and more
In addition to placing objects in existing classes, there
is a way to create new classes.
Specifically,
the quality
of the partition resulting from placing the object in the
best existing host is compared
to the partition
resulting
from creating a new singleton class containing the object.
Class creation is performed if it yields a better partition (by
category utility). This operator allows COBWEB
to adjust
the number of classes at a partition to fit the regularities
of the environment;
the number of classes is not bounded
by a system parameter
(e.g., as in CLUSTER/2).
6. Merging and Splitting
While operators
themselves
1 and 2 are effective in many cases, by
they are very sensitive
to initial input order-
Table P: COBWEB
FUNCTION
COBWEB
(Object,
1) Update counts of the Root
2) IF Root is a leaf
THEN Return expanded
Root
Object
( of tree ))
COBWEB
that
one
whose efficacy
properties
best hosts
[Pearl,
of the following
that
a new class if appropriate
2b) Merge nodes if appropriate
COBWEB
(Object,
2c) Split a node if appropriate
COBWEB
(Object,
and call
COBWEB
Each
Root)
or c) then call
thora
To guard
COBWEB
against
the effects of initially
also includes
and splitting.
two operators
The function
of a level (of n nodes)
the resultant partition
of merging
skewed data,
Rot,
These
disease
for node
merging
and ‘combine’ them in hopes that
(of n-l nodes) is of better quality.
An experiment
to see whether
The merging of two nodes simply involves creating a new
node and combining attribute
distributions
of the nodes
for effective
being merged.
(but
of the newly
attempted
The two original nodes
Although
node.
created
on all possible
are made children
merging could be
node pairs every time an object
5th instance,
not incorporated)
classified
Phytoph-
of 36 attributes
= rotted,
in which soybean
to COBWEB
classification
After
. ..)
unseen
could
be used
every
cases were classified
with respect
no information
disease
in order
incorporating
to the classification
up until that point.
contained
Rot,
Rot).3
diagnosis.
the remaining
Four
were also included
a total
presented
bhe resultant
tree conseruceed
Root
Root-condition
was conducted
disease
19841.
in the data - Diaporthe
making
= Charcoal
cases were incrementally
domains,
along 35 attributes.
designations
= low,
by
of classifica-
disease cases [Stepp,
Rhizoctonia
description,
Precipitation
and
and organized
The utility
were represented
Diagnostic-condition
is to take two nodes
important
was tested in the several
Charcoal
in each object
ing.
system.
was described
diseases
roe.
(e.g.,
heuristic
that
or ‘hidden causes’
can be extracted
clustering
case (object)
stem rot,
(Object,
on regularities
a set of 47 soybean
soybean
from cate-
independent
Cheng and Fu, 19851 in the environment,
these regularities
including
tend to maximize
on the assumption
are dependent
1985;
that
that can be inferred
This is a domain
tion trees for inference
node)
2d) IF none of the above (2a,b,
classifications
depends
a conceptual
and call
Merged
forms
of information
gory membership.
leaf to accommodate
and perform
2a) Create
structure
the amount
the new object
Find that child of Root
ELSE
control
Test instances
regarding
being
3iagnosCc
is observed, such a strategy would be unnecessarily
redundant and costly. Instead, when an object is incorporated,
only merging the two best hosts (as indicated by category
condiCon’,
utility) is evaluated.
Besides node merging,
of the classification
tree. This leaf represented that previously observed object that best matched the test object.
node
splitting
may
also serve
but the value of this attribute was inferred as a
Specifically, classification
eerbyproduct
of classification.
minated when the test object was matched against a leaf
to increase partition
quality. A node of a partition
(of n
nodes) may be deleted and its children promoted, resulting
in a partition of n+m-1 nodes, where the deleted node had
The diagnostic
m children. Splitting is considered only for the children
the best host among the existing categories.
had been incorporated.
The graph of Figure I gives the results of the experThe graph shows that after 5 randomly
selected
iment.
COBWEB’s
control structure
is summarized
As an object
is incorporated,
at most
plied at each tree level. Compositions
operators
can be viewed as transforming
tion tree.
Fisher
hill-climbing
possible
[1987] adopts
(without
top-down
erators
of merging
bidirectionally
or bottom-up
in this space,
through
keeps update
average
branching
is
the tree in a
but the inverse op-
allow COBWEB
thus allowing
operator
to move
an approxima-
application.
cost small (B”ZogBn
factor
classified
Fisher
ing robustness.
to building
fashion,
and splitting
strategy
weaknesses
COBWEB
trees.
tion .of backtracking
of previously
a single classifica-
the view that
are not restricted
strictly
in Table 1.
one operator is apof these primitive
backtracking)
through the space of
In order to maintain robust-
class&a&ion
ness, operators
of
This
where B is the
of the tree and n is the number
objects),
while maintaining
learn[1987] addresses the strengths and
of this approach
in more detail.
condition
be the corresponding
was terminated
of the test object
condition
after’one
was guessed
to
of the leaf. The experiment
half of the domain
(of 47 cases)
instances,
the classification
could be used to correctly
diagnose disease (over the remaining 42 unseen cases) 88%
of the time.
After 10 instances,
100% correct diagnosis
was achieved
impressive,
In fact,
and maintained.
While
these
they follow from the regularity
when COBWEB
results
seem
of this domain.
was run on the data
with no in-
formation
of Diagnostic
condition at all, the four classes
were ‘rediscovered’ as nodes in the resultant tree. This indicates
that Diagnostic
of attribute
correlated
the various
correlations.
network
condition
of attributes,
Diagnostic
participates
In organizing
conditions
in a network
classes
around
classes corresponding
are generated
(Figure
the
to
2).
3While Diagnostic condition was included in each object descrip
tion, it was simply treated as another attribute. Diagnostic condition
was not treated as a teacher imposed class designation as in learning
from examples.
Fisher
463
% of Correct
Diagnoses
% of Correct
loo-
cobweb
90.
80.
80.
70.
60.
70.
50.
60.
40.
50.
30.
40.
20.
30.
lo-
20.
10.
OY
5
10
# of Incorporated
Figure
Predictions
1001
90.
1: Success
at inferring
b5
20
25
oybean Cases
‘Diagnostic
condition’
-’
#5of Cl$lerve$5Cbj?$s
2’
Figure 3: Prediction
over all attributes
that
is averaged
over all attributes,
A;, not equal to AM.
This measures the average increase in the ability to guess
a value of AM given one knows the value of a second atattributes,
A;,
= VMj,lAi = xji) =
P(AM = VMjM) for all Ai, and thus P(AM = VMjMIAi =
Vrji)" - P(AM = Vnaj,)” = 0.
tribute.
then
If AM is independent
dependence
In Figure
2: A partial
tree over soybean
The success at inferring - Diagnostic condition
relationship
between an attribute’s
dependence
afforded
by the COBWEB
tree over the frequency-based
as a function
cases
of all other
P(AM
4 the advantage
classification
Figure
is 0 since
of attribute
dependence.
method
Each
is shown
point
repre-
sents
implies a
on other
attributes
and the utility of COBWEB
classification
trees
for induction over that attribute.
To further characterize
this relationship,
the induction test conducted
for Diag-
one of the 36 attributes
used to describe soybean
The graph indicates a significant positive correlacases.
tion between an attribute’s
dependence on other attributes
and the degree that COBWEB
trees facilitate correct inference.
For example,
dependencies
Diagnostic
with many
most predictable
other
condition
attributes
participates
attribute.
nostic condition was repeated for each of the remaining 35
attributes.
The results (including Diagnostic
condition)
were averaged over all attributes
and are presented in Figure 3. On average, correct induction of attribute
values
for unseen objects levels off at 88% using the COBWEB
generated classification
tree.
To put these results into perspective,
Figure 3 also
graphs the averaged results of a simpler, but reasonable
,% Increase
in Correct
Prediction
100
90
80
70
DiagnostiKcondition
60
inferencing
strategy.
This ‘frequency-based’
method dictates that one always guess the most frequently occurring
value of the unknown attribute.
Averaged results using
qm
50
40
Root-condip
’ a
I
this strategy level off at 72% correct prediction, placing it
at 16% under the more complicated
classification
strategy.
While
averaged
results
est is determining
are informative,
a relationship
dependencies
and the ability
tribute’s value.
Dependence
of an attribute,
Ai, is a given as a function
the primary
between
to correctly
inter-
attribute
inter-
predict
an at-
AM, on other attributes,
of
C~M[P(AM = VSkjx(Ai = J&i)’ - P(AM = VMjM)2],
464
Machine learning & Knowledge Acquisition
10
0 i
Figure
:
I
1
0.15
q.05
Attnbute
Depe: clence
4: Prediction
as function of attribute dependence
-1ol
0
w
in
and is also the
V. Concluding
Remarks
[Clancey,
19841 W. J. Clancey.
Classification
problem
ing.
To summarize,
CQBWEB
the soybean
captures
tween attributes,
sification
ference
and summarizes
tree nodes.
of attributes
in attribute
obtained
data strongly
the important
that
these correlations
beat clas-
In doing so, COBWEB
promotes
in proportion
participation
inter-correlations.
using thyroid
Experimentation
suggests
inter-correlations
to their
Similar
disease data
results
[Fisher,
above assumed
19871.
proceeded
a missing
attribute
value.
Further studies have indicated
however, that for
properties
of previously
the inductive task of predicting
unseen objects,
one eighth
ference
classification
the depth of the tree to obtain
results.
This
ing intermediate
attribute
sonably
need only proceed
behavior
node
value prediction
accurate).
emerges
default
values
In COBWEB,
in-
as a result
of us-
to determine
when
is cost effective
a level where the attribute
to about
comparable
(cheap,
default
approximates
but rea-
values occur
conditional
at
inde-
pendence from other attributes - in this case, knowing the
value of other attributes
will not aid in further classification, and prediction might as well occur at this level. In
this light, COBWEB
Finally,
ful tool
but
can be viewed as an incremental
version of a system
satisficing
this work casts
by Pearl
conceptual
for problem-solving,
well-defined
by assigning
performance
task
and
[1985].
clustering
as a use-
it the generic,
of inferring
unknown
attribute
values.
The future should find that as conceptual clustering methods utilize more complex representation
languages
be interpreted
solving tasks.
[Stepp,
19841,
as improving
so too
more
can
their
behavior
sophisticated
Kaufmann,
[Dietterich,
problem-
William
task
for
conceptual
Schlimmer,
reviewers
clustering.
Pat Langley,
for numerous
work was supported
Rogers
Thanks
a performance
also
go to
Hall, and anonymous
ideas and helpful comments.
by Hughes
Aircraft
Jeff
AAAI
This
Company.
Chapter
14: Lear&g
and
P. Cohen & E. Feigenbaum
Kaufmann,
(Eds.),
Los Altos,
intelligence.
CA:
Inc.
Ninth International
gence (pp. 691-697).
Conference
& B. Langley.
ApProceedings of the
on Artificial
Los Angeles,
Intelli-
CA: Morgan
Kauf-
mann.
[Fu and Buchanan,
19851 L. Fu & B. Buchanan.
Learning intermediate
concepts in constructing
a hierarchical knowledge base. Proceeding8 of the Ninth Hnterna-
tional Joint
659-666).
Conference
Los Angeles,
[Gluck and Carter,
on Artificial
CA: Morgan
Intelligence
(pp-
Kaufmann.
19851 M. Gluck & J. Corter.
tion, uncertainty,
Informa-
and the utility of categories.
Proceedings of the Seventh Annual Conference of the Cognitive
Science Society (pp. 283-287).
Irvine, CA: Lawrence
Erlbaum Associates.
[Lebowitz, 19821 M. L e b owitz.
eralizations.
[Michalski
tering
Cognition
and Stepp,
tomated
Correcting
erroneous
$t R. Stepp.
of classifications:
versus numerical
gen-
and Brain Theory, 5, 367-381.
19831 R. Michalski
construction
on Pattern
409.
Conceptual
Auclus-
taxonomy.
IEEE Transactions
and Machine Intelligence, 5, 396-
Analysis
19851 J. Pearl.
Learning
hidden
causes
from
em-
Joint
Proceedings of the Ninth International
Conference on Artificial Intelligence (pp. 567-
572).
Los Angeles,
[Smith
with Dennis Kibler suggested
inference.
[Fisher and Langley, 19851 D. Fisher
proaches to conceptual
clustering.
pirical
Acknowledgements
William
[Fisher, in press] D. Fisher.
Knowledge acquisition
via
incremental
conceptual clustering.
Machine Learning.
[Pearl,
Discussions
19821 T. Dietterich.
inductive
solv-
on Arti-
Inc.
The handbook of artificial
have been
classification
all the way to leaves before predicting
in-
Proceedings of the National Conference
ficial Intelligence (pp. 49-55).
Austin, TX:
data.
and
Medin,
CA: Morgan
19811 E.
gories and concepts.
versity
Smith
Kaufmann.
& D. Medin.
Cambridge,
MA:
Cate-
Harvard
Uni-
Press.
[Stepp,
19841 R. Stepp. Conjunctive conceptual clustering:
A methodology and experimentation (Technical Report
UIUCDCS-R-84-1189),
IL: University
Doctoral
Dissertation,
of Illinois, Department
Urbana,
of Computer
Sci-
ence.
eferences
[Cheeseman,
19851 P. Cheeseman.
In defense of probability. Proceedings of the Ninth International
Joint Con-
ference on Artificial
’Angeles,
CA: Morgan
Intelligence
(pp.
1002-1009).
Los
Kaufmann.
[Cheng and Fu, 19851 Y. Cheng & King-sun Fu.
ceptual clustering
in knowledge organization.
Transactions on Pattern
ligence, 7, 592-598.
Analysis
and Machine
Con1EE.E
Intel-
Fisher
465