Uploaded by Hugo Marquez

1

advertisement
Pattern Recognition
with Fuzzy Objective
Function Algorithms
ADVANCED APPLICATIONS IN PATTERN RECOGNITION
General editor: Morton Nadler
A STRUCTURAL ANALYSIS OF COMPLEX AERIAL
PHOTOGRAPHS
Makoto Nagao and Takashi Matsuyama
PATTERN RECOGNITION WITH FUZZY OBJECTIVE FUNCTION
ALGORITHMS
James C. Bezdek
James C. Bezdek holds a B.S. in Civil
Engineering from the University of
Nevada (Reno) and a Ph.D. in Applied
Mathematics from Cornell University.
His professional interests currently include mathematical theories underlying
problems in cluster analysis, cluster validity, feature selection, and classifier design; applications in medical diagnosis,
geologic shape analysis, and sprinkler
networks; and the numerical analysis of
low-velocity hydraulic systems. Professor
Bezdek has held research grants from
NSF and ONR and has been a pattern
recognition consultant for several industrial firms. He is an active member of
many professional groups, including the
IEEE Computer Society.
A Continuation Order Plan is available for this series. A continuation order will bring delivery of each
new volume immediately upon publication. Volumes are billed only upon actual shipment. For further
information please contact the publisher.
Pattern Recognition
with Fuzzy Objective
Function Algorithms
James C. Bezdek
Utah State University
Logan, Utah
PLENUM PRESS. NEW YORK AND LONDON
Library of Congress Cataloging in Publication Data
Bezdek, James C. 1939Pattern recognition with fuzzy objective function algorithms.
(Advanced applications in pattern recognition)
Bibliography: p.
Includes indexes.
1. Pattern perception. 2. Fuzzy-algorithms. 3. Cluster analysis. I. Title. II. Series.
Q327.B48
001.53 '4
81-4354
AACR2
ISBN 978-1-4757-0452-5
ISBN 978-1-4757-0450-1 (eBook)
DOI 10.1007/978-1-4757-0450-1
© 1981 Plenum Press, New York
Softcover reprint of the hardcover 18t edition 1981
A Division of Plenum Publishing Corporation
233 Spring Street, New York, N.Y. 10013
All rights reserved
No part of this book may be reproduced, stored in a retrieval system, or transmitted,
in any form or by any means, electronic, mechanical, photocopying, microfilming,
recording, or otherwise, without written permission from the Publisher
Foreword
The fuzzy set was conceived as a result of an attempt to come to grips with
the problem of pattern recognition in the context of imprecisely defined
categories. In such cases, the belonging of an object to a class is a matter of
degree, as is the question of whether or not a group of objects form a cluster.
A pioneering application of the theory of fuzzy sets to cluster analysis
was made in 1969 by Ruspini. It was not until 1973, however, when the
appearance of the work by Dunn and Bezdek on the Fuzzy ISODATA (or
fuzzy c-means) algorithms became a landmark in the theory of cluster
analysis, that the relevance of the theory of fuzzy sets to cluster analysis and
pattern recognition became clearly established.
Since then, the theory of fuzzy clustering has developed rapidly and
fruitfully, with the author of the present monograph contributing a major
share of what we know today. In their seminal work, Bezdek and Dunn have
introduced the basic idea of determining the fuzzy clusters by minimizing an
appropriately defined functional, and have derived iterative algorithms for
computing the membership functions for the clusters in question. The
important issue of convergence of such algorithms has become much better
understood as a result of recent work which is described in the monograph.
Further, farreaching generalizations of the original fuzzy c-means
algorithms have been recently discovered that generate linear varieties of
arbitrary dimension up to one less than the dimension of data space. These
algorithms (and convex combinations of them) are discussed. Their potential
for a wide range of applications is suggested; it seems but a matter of time
until these applications are made.
Although the results of experimental studies reported in this book
indicate that fuzzy clustering techniques often have significant advantages
over more conventional methods, universal acceptance of the theory of
fuzzy sets as a natural basis for pattern recognition and cluster analysis is not
likely to materialize in the very near future. The traditional view, that
probability theory is all that is needed to deal with imprecision, is too deeply
entrenched to be abandoned without resistance. But those of us who believe
that fuzziness and randomness are two distinct facets of uncertainty feel that
v
vi
Foreword
it is only a matter of time before this dualistic view of imprecision becomes
accepted as a useful tool in mathematical models of real processes. This
monograph is an important step towards that end.
Berkeley, California
L. A. Zadeh
Preface
It is tempting to assert that the basic aim of all science is the recognition of
patterns. Scientists study observed groups of variables, trying to isolate and
identify functional relationships-qualitative and quantitative. These associations provide mathematical models which are in turn used to infer
objective properties of the process being modeled. Traditional approaches
to this course of scientific investigation have been radically altered by
modern data processing capabilities: we examine data and postulate interrelationships whose level of complexity would have been unthinkable even
three decades ago. This capability has stirred a concomitant interest in the
notion of precision: precision in nature, in the data we gather from nature, in
our machine representation of the data, in the models we construct from the
data, in the inferences we draw from the models, and, ultimately, in Our
philosophical perception of the idea itself.
This monograph has as its cornerstones the notions of imprecision and
information in mathematical modeling. Lotfi Zadeh introduced in 1965 an
axiomatic structure-the fuzzy set-which offers a new way to capture these
two ideas mathematically. Many researchers assume that "fuzziness" either
proposes to replace statistical uncertainty, or can, by a suitable change of
view, be interpreted as the same thing. Neither of these presumptions is
accurate! Fuzzy models are compatible but distinct companions to classical
ones: sometimes more useful; sometimes equally appropriate; and sometimes unsuitable to represent the process under study.
Since their inception, fuzzy sets have advanced in a wide variety of
disciplines: e.g., control theory, topology, linguistics, optimization, category
theory, and automata. Nowhere, however, has their impact been more
profound or natural than in pattern recognition, which, in turn, is often a
basis for decision theory. In particular, my aim is to discuss the evolution of
fuzziness as a basis for feature selection, cluster analysis, and classification
problems. It is not my intent to survey a wide variety of research concerning
these subjects; nor will the reader find collected here a group of related (but
uncorrelated) papers. Instead, this volume is specifically organized around
two fundamental structures-fuzzy relations and partitions-their theory,
vii
viii
Preface
and their applications to the pattern recognition problems mentioned above.
These same two structures also provide a basis for a variety of models and
algorithms in group decision theory-a variety so great, in fact, that I have
decided to include herein only those results concerning decision theory
which bear directly on the pattern recognition problems just mentioned.
Some of this material is already "obsolete"; it is included to make the
presentation historically and chronologically continuous (presumably, this is
the soundest pedagogical path to follow). My choice in this matter is
motivated by a desire to collect and unify those concepts that appear to play
a major role in current research trends. This choice may, of course, prove
historically to have been shortsighted, and at the very least, will occasion the
omission of many bright and innovative ideas, for which I apologize a priori.
Moreover, some of the material included here is the center of great
controversy at present: for this I do. not apologize. This is not only predictable, but exactly as it should be. New ideas need to be challenged, sifted,
modified, and-if necessary-discarded, as their scientific value becomes
evident. Indeed, the method of science is to successively approximate
physical processes by various models, always hoping that the next step
improves the accuracy and usefulness of previous models. Whether fuzzy
models assume a place in this iterative procedure remains to be seen. (I hope
they do and, of course, believe they will.) In any case, I have little doubt that
many of the results presented herein will have been subsumed by more
general ones, or perhaps by alternatives of more evident utility, even before
this book is published.
My purposes in writing this monograph are simple:
(i) to synthesize the evolution of fuzzy relations and partitions as
axiomatic bases for models in feature selection, clustering, and
classification;
(ii) to give a careful, unified development of the mathematics involved,
to the extent that this is possible and desirable;
(iii) to demonstrate that this theory can be applied to real physical
problems with measurable success.
About Theorems. Mathematical detail sometimes bewilders the uninitiated, who often cannot afford the time needed to master the underpinnings of concise, thorny proofs. On the other hand, the erection of any
theory ultimately requires a foundation based on precise formulations of
previous results. Accordingly, I have tried, at what seem to be crucial
junctions, to divide proofs into two parts: first, an informal "analysis" of the
meaning, application, and key ideas in the proof of the result; and then, a
formal proof. My hope is that this diversion from the usual course will enable
Preface
ix
readers impatient for the applications to press on with an intuitive understanding of the theory, while at the same time not disappointing those with
both the time and inclination to follow detailed arguments. The level of rigor
will vary from none, to sloppy, to annoyingly painstaking, even within a
single section-unmistakable signs of "growing pains" for any evolving
subject area! We find art and science intermingled here: the art will diminish
if the subject flourishes; nonetheless, its charm as a harbinger of the straight
and narrow to follow remains intact. One of the nicest surprises ahead is
that some of the most useful results have a very straightforward theoretical
basis.
The Numbering System. The following abbreviations are used
throughout:
A = Algorithm
C = Corollary
D = Definition
E = Example
Fig. = Figure
H = Homework
P = Proposition
S = Section
T = Theorem
Items are numbered consecutively within a section, which is the
fundamental unit of organization. For example, (T5.2) is Theorem 2 of S5
(Section 5); Fig. 8.1 is Figure 1 of Section 8. Following the table of contents
there is a list of special symbols and the page on which they are defined. The
end of a proof or example is indicated bye. Each section closes with remarks
and bibliographic comments. Bibliographic references appear in parentheses without decimals-(33) is reference 33; (3.3) is equation 3 of
Section 3.
Exercises. Homework problems are given at the ends of sections.
Altogether there are several hundred problems which draw upon and extend
the text material in various ways. The range of difficulty is quite wide, with a
view towards accommodating the needs of a variety of readers. Many
problems (with answers) are simple calculations, applications or examples of
definitions or formulas; these I have assigned as homework in beginning
graduate courses in pattern recognition. Also included are many published
theorems which are peripherally related to the central issues of the text; the
more difficult ones are accompanied by references, and thus afford an
excellent means for easing into literature of interest. (Exercises of this type
often lead toward conjectures and discoveries of new facts.) Finally, I have
included some problems for which answers are not known. This type of
question makes an excellent class project, and it too can lead to exciting
results, both within the classroom and without. Several exercises ask for a
"handheld" run of a particular algorithm. This type of exercise strengthens
one's understanding of the method in question. However, the algorithms
discussed herein are intended to be coded as computer programs, and I have
described them in a manner which seems most useful in this respect, rather
x
Preface
than providing detailed listings, which are very seldom compatible with
specific facilities.
Acknowledgments. Among the countless people to whom I am
indebted for various reasons in connection with this volume, several must be
mentioned explicitly: Lotfi Zadeh, Joe Dunn, Mort Nadler, Laurel Van
Sweden, and L. S. Marchand have, in different ways, made this task both
possible and pleasant.
Utah State University
Logan, Utah
James C. Bezdek
Contents
Notation . . . . . . . . . . .
X111
1. Models for Pattern Recognition . . . . .
1
S 1 Pattern Recognition . . . . . . . .
S2 Some Notes on Mathematical Models
S3 Uncertainty
. . . . . . . .
1
5
7
2. Partitions and Relations
15
S4 The Algebra of Fuzzy Sets
S5 Fuzzy Partition Spaces
S6 Properties of Fuzzy Partition Spaces
S7 Fuzzy Relations
3. Objective Function Clustering.
15
23
28
39
43
S8 Cluster Analysis: An Overview
S9 Density Functionals . . . . .
S10 Likelihood Functionals: Congenital Heart Disease.
Sll Least-Squares Functionals: Fuzzy c-Means . . . .
S12 Descent and Convergence of Fuzzy c-Means . . .
S13 Feature Selection for Binary Data: Important Medical Symptoms
4. Cluster Validity. . . . . . . . . . . . . . . . .
43
48
60
65
80
86
95
S14 Cluster Validity: An Overview . . . . . . . .
S15 The Partition Coefficient: Numerical Taxonomy.
S16 Classification Entropy . . . . . . . . . . . .
S17 The Proportion Exponent: Numerical Taxonomy Revisited.
S18 Normalization and Standardization of Validity Functionals
S19 The Separation Coefficient: Star-Tracker Systems
S20 Separation Indices and Fuzzy c-Means .
95
99
109
119
126
137
145
5. Modified Objective Function Algorithms. . . . . . . . . . . . .
155
S21 Affinity Decomposition: An Induced Fuzzy Partitioning Approach
S22 Shape Descriptions with Fuzzy Covariance Matrices . . . . . .
155
165
xi
xii
Contents
S23 The Fuzzy c -Varieties Clustering Algorithms
S24 Convex Combinations: Fuzzy c-Elliptotypes
174
192
6. Selected Applications in Classifier Design. . . .
203
S25 An Overview of Classifier Design: Bayesian Classifiers. . . . . .
S26 A Heuristic Approach to Parametric Estimation: Bayesian Classifiers
for Mixed Normal Distributions . . . . . . . . . . .
S27 Nearest Prototype Classifiers: Diagnosis of Stomach Disease
S28 Fuzzy Classification with Nearest Memberships . . . . . .
REFERENCES
ALGORITHM INDEX.
AUTHOR INDEX. .
SUBJECT INDEX . .
203
216
228
235
241
249
251
253
Notation
Page
Data set
{. , . , .... } Set description by list .
C
Set containment
E'
Belongs to
In Natural logarithm
e Inverse of In .
X
G)
{xip(x)}
Y
h:X~
00
[a, b]
Pr(A)
Pr(AIB)
P(X)
p=;,q
p~q
p~q
A
(1
u
E
-
'rJ
3
:3
0
0
8'(X)
1·1
~
/\
1
2
15
15
9
9
Combinatorial: n things taken k at a time
8
Set description by property p(x)
Function h maps X into Y
Infinity
Closed real interval a ~ x ~ b
Probability of A
Probability of A given B
Power set of X .
p implies q
P is implied by q
p if and only if q
Complement of set A .
Intersection
Union
Does not belong to .
By definition
For all
There is (also, for some) .
Such that
Empty set.
Zero vector
Family of subsets of X
Cardinality
Correspondence or equivalence
Minimum.
10
10
10
10
10
12
15
15
15
15
15
15
15
15
16
16
16
16
16
16
16
16
17
17
xiii
Notation
xiv
v
ij
]
C!:
L
Ven
~
Me
Mfc
~p
c!
(A{
conv(S)
AxB
Be
U
iJ
(x, y)
d(x, y)
~+
aJ(U)
aUik
Maximum.
Zero function
One function
Isomorphism (also, approximately)
Proper subset
Summation
Vector space of c x n matrices.
Real numbers
Hard c-partition space
Fuzzy c -partition space
Space of p-tuples of real numbers
c -factorial.
Transpose of matrix A
,
Convex hull of S .
Cartesian product of A and B
Standard basis of ~c
Matrix U considered as a vector
Geometric centroid of U
Inner product of vectors x, y .
Distance from x to y
The real interval [0, CX))
{,(x; v)
Partial derivative of J with respect to Uik evaluated
at U
Norm of x.
Trace of matrix A
Gradient with respect to x .
Directional derivative of f at x in direction v .
m ~k
m approaches k from above .
Ilxll
Tr(A)
Vx
n Product.
fog
BO(x, r)
{y ~ f(y)}
Oij
B(x, r)
dia(A)
dis(A,B)
8B(x, r)
g(xli)
n(l.L, (72)
Functional composition .
Open ball of radius r centered at x
The function taking y to f( y) .
Kronecker's delta
Closed ball of radius r centered at x
Diameter of set A
Distance between sets A, B
Boundary ball of radius r centered at x
Class conditional probability density
Uni\ariate normal density with mean I.L, variance
(r
Page
17
17
17
17
18
22
24
24
24
26
29
29
29
29
31
32
33
33
35
48
48
49
54
55
67
69
70
38
80
37
83
104
37
145
145
180
207
208
Notation
g(xli; 9d
n (J1,~)
xv
Parametric form of g(xli) . . . . . . . . . . .
Multivariate normal density with mean J1 covariance matrix ~ . . . . . . . . . . . . . . . .
216
216
Download