Pattern Recognition with Fuzzy Objective Function Algorithms ADVANCED APPLICATIONS IN PATTERN RECOGNITION General editor: Morton Nadler A STRUCTURAL ANALYSIS OF COMPLEX AERIAL PHOTOGRAPHS Makoto Nagao and Takashi Matsuyama PATTERN RECOGNITION WITH FUZZY OBJECTIVE FUNCTION ALGORITHMS James C. Bezdek James C. Bezdek holds a B.S. in Civil Engineering from the University of Nevada (Reno) and a Ph.D. in Applied Mathematics from Cornell University. His professional interests currently include mathematical theories underlying problems in cluster analysis, cluster validity, feature selection, and classifier design; applications in medical diagnosis, geologic shape analysis, and sprinkler networks; and the numerical analysis of low-velocity hydraulic systems. Professor Bezdek has held research grants from NSF and ONR and has been a pattern recognition consultant for several industrial firms. He is an active member of many professional groups, including the IEEE Computer Society. A Continuation Order Plan is available for this series. A continuation order will bring delivery of each new volume immediately upon publication. Volumes are billed only upon actual shipment. For further information please contact the publisher. Pattern Recognition with Fuzzy Objective Function Algorithms James C. Bezdek Utah State University Logan, Utah PLENUM PRESS. NEW YORK AND LONDON Library of Congress Cataloging in Publication Data Bezdek, James C. 1939Pattern recognition with fuzzy objective function algorithms. (Advanced applications in pattern recognition) Bibliography: p. Includes indexes. 1. Pattern perception. 2. Fuzzy-algorithms. 3. Cluster analysis. I. Title. II. Series. Q327.B48 001.53 '4 81-4354 AACR2 ISBN 978-1-4757-0452-5 ISBN 978-1-4757-0450-1 (eBook) DOI 10.1007/978-1-4757-0450-1 © 1981 Plenum Press, New York Softcover reprint of the hardcover 18t edition 1981 A Division of Plenum Publishing Corporation 233 Spring Street, New York, N.Y. 10013 All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission from the Publisher Foreword The fuzzy set was conceived as a result of an attempt to come to grips with the problem of pattern recognition in the context of imprecisely defined categories. In such cases, the belonging of an object to a class is a matter of degree, as is the question of whether or not a group of objects form a cluster. A pioneering application of the theory of fuzzy sets to cluster analysis was made in 1969 by Ruspini. It was not until 1973, however, when the appearance of the work by Dunn and Bezdek on the Fuzzy ISODATA (or fuzzy c-means) algorithms became a landmark in the theory of cluster analysis, that the relevance of the theory of fuzzy sets to cluster analysis and pattern recognition became clearly established. Since then, the theory of fuzzy clustering has developed rapidly and fruitfully, with the author of the present monograph contributing a major share of what we know today. In their seminal work, Bezdek and Dunn have introduced the basic idea of determining the fuzzy clusters by minimizing an appropriately defined functional, and have derived iterative algorithms for computing the membership functions for the clusters in question. The important issue of convergence of such algorithms has become much better understood as a result of recent work which is described in the monograph. Further, farreaching generalizations of the original fuzzy c-means algorithms have been recently discovered that generate linear varieties of arbitrary dimension up to one less than the dimension of data space. These algorithms (and convex combinations of them) are discussed. Their potential for a wide range of applications is suggested; it seems but a matter of time until these applications are made. Although the results of experimental studies reported in this book indicate that fuzzy clustering techniques often have significant advantages over more conventional methods, universal acceptance of the theory of fuzzy sets as a natural basis for pattern recognition and cluster analysis is not likely to materialize in the very near future. The traditional view, that probability theory is all that is needed to deal with imprecision, is too deeply entrenched to be abandoned without resistance. But those of us who believe that fuzziness and randomness are two distinct facets of uncertainty feel that v vi Foreword it is only a matter of time before this dualistic view of imprecision becomes accepted as a useful tool in mathematical models of real processes. This monograph is an important step towards that end. Berkeley, California L. A. Zadeh Preface It is tempting to assert that the basic aim of all science is the recognition of patterns. Scientists study observed groups of variables, trying to isolate and identify functional relationships-qualitative and quantitative. These associations provide mathematical models which are in turn used to infer objective properties of the process being modeled. Traditional approaches to this course of scientific investigation have been radically altered by modern data processing capabilities: we examine data and postulate interrelationships whose level of complexity would have been unthinkable even three decades ago. This capability has stirred a concomitant interest in the notion of precision: precision in nature, in the data we gather from nature, in our machine representation of the data, in the models we construct from the data, in the inferences we draw from the models, and, ultimately, in Our philosophical perception of the idea itself. This monograph has as its cornerstones the notions of imprecision and information in mathematical modeling. Lotfi Zadeh introduced in 1965 an axiomatic structure-the fuzzy set-which offers a new way to capture these two ideas mathematically. Many researchers assume that "fuzziness" either proposes to replace statistical uncertainty, or can, by a suitable change of view, be interpreted as the same thing. Neither of these presumptions is accurate! Fuzzy models are compatible but distinct companions to classical ones: sometimes more useful; sometimes equally appropriate; and sometimes unsuitable to represent the process under study. Since their inception, fuzzy sets have advanced in a wide variety of disciplines: e.g., control theory, topology, linguistics, optimization, category theory, and automata. Nowhere, however, has their impact been more profound or natural than in pattern recognition, which, in turn, is often a basis for decision theory. In particular, my aim is to discuss the evolution of fuzziness as a basis for feature selection, cluster analysis, and classification problems. It is not my intent to survey a wide variety of research concerning these subjects; nor will the reader find collected here a group of related (but uncorrelated) papers. Instead, this volume is specifically organized around two fundamental structures-fuzzy relations and partitions-their theory, vii viii Preface and their applications to the pattern recognition problems mentioned above. These same two structures also provide a basis for a variety of models and algorithms in group decision theory-a variety so great, in fact, that I have decided to include herein only those results concerning decision theory which bear directly on the pattern recognition problems just mentioned. Some of this material is already "obsolete"; it is included to make the presentation historically and chronologically continuous (presumably, this is the soundest pedagogical path to follow). My choice in this matter is motivated by a desire to collect and unify those concepts that appear to play a major role in current research trends. This choice may, of course, prove historically to have been shortsighted, and at the very least, will occasion the omission of many bright and innovative ideas, for which I apologize a priori. Moreover, some of the material included here is the center of great controversy at present: for this I do. not apologize. This is not only predictable, but exactly as it should be. New ideas need to be challenged, sifted, modified, and-if necessary-discarded, as their scientific value becomes evident. Indeed, the method of science is to successively approximate physical processes by various models, always hoping that the next step improves the accuracy and usefulness of previous models. Whether fuzzy models assume a place in this iterative procedure remains to be seen. (I hope they do and, of course, believe they will.) In any case, I have little doubt that many of the results presented herein will have been subsumed by more general ones, or perhaps by alternatives of more evident utility, even before this book is published. My purposes in writing this monograph are simple: (i) to synthesize the evolution of fuzzy relations and partitions as axiomatic bases for models in feature selection, clustering, and classification; (ii) to give a careful, unified development of the mathematics involved, to the extent that this is possible and desirable; (iii) to demonstrate that this theory can be applied to real physical problems with measurable success. About Theorems. Mathematical detail sometimes bewilders the uninitiated, who often cannot afford the time needed to master the underpinnings of concise, thorny proofs. On the other hand, the erection of any theory ultimately requires a foundation based on precise formulations of previous results. Accordingly, I have tried, at what seem to be crucial junctions, to divide proofs into two parts: first, an informal "analysis" of the meaning, application, and key ideas in the proof of the result; and then, a formal proof. My hope is that this diversion from the usual course will enable Preface ix readers impatient for the applications to press on with an intuitive understanding of the theory, while at the same time not disappointing those with both the time and inclination to follow detailed arguments. The level of rigor will vary from none, to sloppy, to annoyingly painstaking, even within a single section-unmistakable signs of "growing pains" for any evolving subject area! We find art and science intermingled here: the art will diminish if the subject flourishes; nonetheless, its charm as a harbinger of the straight and narrow to follow remains intact. One of the nicest surprises ahead is that some of the most useful results have a very straightforward theoretical basis. The Numbering System. The following abbreviations are used throughout: A = Algorithm C = Corollary D = Definition E = Example Fig. = Figure H = Homework P = Proposition S = Section T = Theorem Items are numbered consecutively within a section, which is the fundamental unit of organization. For example, (T5.2) is Theorem 2 of S5 (Section 5); Fig. 8.1 is Figure 1 of Section 8. Following the table of contents there is a list of special symbols and the page on which they are defined. The end of a proof or example is indicated bye. Each section closes with remarks and bibliographic comments. Bibliographic references appear in parentheses without decimals-(33) is reference 33; (3.3) is equation 3 of Section 3. Exercises. Homework problems are given at the ends of sections. Altogether there are several hundred problems which draw upon and extend the text material in various ways. The range of difficulty is quite wide, with a view towards accommodating the needs of a variety of readers. Many problems (with answers) are simple calculations, applications or examples of definitions or formulas; these I have assigned as homework in beginning graduate courses in pattern recognition. Also included are many published theorems which are peripherally related to the central issues of the text; the more difficult ones are accompanied by references, and thus afford an excellent means for easing into literature of interest. (Exercises of this type often lead toward conjectures and discoveries of new facts.) Finally, I have included some problems for which answers are not known. This type of question makes an excellent class project, and it too can lead to exciting results, both within the classroom and without. Several exercises ask for a "handheld" run of a particular algorithm. This type of exercise strengthens one's understanding of the method in question. However, the algorithms discussed herein are intended to be coded as computer programs, and I have described them in a manner which seems most useful in this respect, rather x Preface than providing detailed listings, which are very seldom compatible with specific facilities. Acknowledgments. Among the countless people to whom I am indebted for various reasons in connection with this volume, several must be mentioned explicitly: Lotfi Zadeh, Joe Dunn, Mort Nadler, Laurel Van Sweden, and L. S. Marchand have, in different ways, made this task both possible and pleasant. Utah State University Logan, Utah James C. Bezdek Contents Notation . . . . . . . . . . . X111 1. Models for Pattern Recognition . . . . . 1 S 1 Pattern Recognition . . . . . . . . S2 Some Notes on Mathematical Models S3 Uncertainty . . . . . . . . 1 5 7 2. Partitions and Relations 15 S4 The Algebra of Fuzzy Sets S5 Fuzzy Partition Spaces S6 Properties of Fuzzy Partition Spaces S7 Fuzzy Relations 3. Objective Function Clustering. 15 23 28 39 43 S8 Cluster Analysis: An Overview S9 Density Functionals . . . . . S10 Likelihood Functionals: Congenital Heart Disease. Sll Least-Squares Functionals: Fuzzy c-Means . . . . S12 Descent and Convergence of Fuzzy c-Means . . . S13 Feature Selection for Binary Data: Important Medical Symptoms 4. Cluster Validity. . . . . . . . . . . . . . . . . 43 48 60 65 80 86 95 S14 Cluster Validity: An Overview . . . . . . . . S15 The Partition Coefficient: Numerical Taxonomy. S16 Classification Entropy . . . . . . . . . . . . S17 The Proportion Exponent: Numerical Taxonomy Revisited. S18 Normalization and Standardization of Validity Functionals S19 The Separation Coefficient: Star-Tracker Systems S20 Separation Indices and Fuzzy c-Means . 95 99 109 119 126 137 145 5. Modified Objective Function Algorithms. . . . . . . . . . . . . 155 S21 Affinity Decomposition: An Induced Fuzzy Partitioning Approach S22 Shape Descriptions with Fuzzy Covariance Matrices . . . . . . 155 165 xi xii Contents S23 The Fuzzy c -Varieties Clustering Algorithms S24 Convex Combinations: Fuzzy c-Elliptotypes 174 192 6. Selected Applications in Classifier Design. . . . 203 S25 An Overview of Classifier Design: Bayesian Classifiers. . . . . . S26 A Heuristic Approach to Parametric Estimation: Bayesian Classifiers for Mixed Normal Distributions . . . . . . . . . . . S27 Nearest Prototype Classifiers: Diagnosis of Stomach Disease S28 Fuzzy Classification with Nearest Memberships . . . . . . REFERENCES ALGORITHM INDEX. AUTHOR INDEX. . SUBJECT INDEX . . 203 216 228 235 241 249 251 253 Notation Page Data set {. , . , .... } Set description by list . C Set containment E' Belongs to In Natural logarithm e Inverse of In . X G) {xip(x)} Y h:X~ 00 [a, b] Pr(A) Pr(AIB) P(X) p=;,q p~q p~q A (1 u E - 'rJ 3 :3 0 0 8'(X) 1·1 ~ /\ 1 2 15 15 9 9 Combinatorial: n things taken k at a time 8 Set description by property p(x) Function h maps X into Y Infinity Closed real interval a ~ x ~ b Probability of A Probability of A given B Power set of X . p implies q P is implied by q p if and only if q Complement of set A . Intersection Union Does not belong to . By definition For all There is (also, for some) . Such that Empty set. Zero vector Family of subsets of X Cardinality Correspondence or equivalence Minimum. 10 10 10 10 10 12 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 17 17 xiii Notation xiv v ij ] C!: L Ven ~ Me Mfc ~p c! (A{ conv(S) AxB Be U iJ (x, y) d(x, y) ~+ aJ(U) aUik Maximum. Zero function One function Isomorphism (also, approximately) Proper subset Summation Vector space of c x n matrices. Real numbers Hard c-partition space Fuzzy c -partition space Space of p-tuples of real numbers c -factorial. Transpose of matrix A , Convex hull of S . Cartesian product of A and B Standard basis of ~c Matrix U considered as a vector Geometric centroid of U Inner product of vectors x, y . Distance from x to y The real interval [0, CX)) {,(x; v) Partial derivative of J with respect to Uik evaluated at U Norm of x. Trace of matrix A Gradient with respect to x . Directional derivative of f at x in direction v . m ~k m approaches k from above . Ilxll Tr(A) Vx n Product. fog BO(x, r) {y ~ f(y)} Oij B(x, r) dia(A) dis(A,B) 8B(x, r) g(xli) n(l.L, (72) Functional composition . Open ball of radius r centered at x The function taking y to f( y) . Kronecker's delta Closed ball of radius r centered at x Diameter of set A Distance between sets A, B Boundary ball of radius r centered at x Class conditional probability density Uni\ariate normal density with mean I.L, variance (r Page 17 17 17 17 18 22 24 24 24 26 29 29 29 29 31 32 33 33 35 48 48 49 54 55 67 69 70 38 80 37 83 104 37 145 145 180 207 208 Notation g(xli; 9d n (J1,~) xv Parametric form of g(xli) . . . . . . . . . . . Multivariate normal density with mean J1 covariance matrix ~ . . . . . . . . . . . . . . . . 216 216