Fuzzy Pattern Recognition Overview of Pattern Recognition Pattern Recognition Procedure Unknown Speech /Image /Data Feature Extraction Feature Reduction Classification (supervised) Class Label Known Clustering (unsupervised or self-organizing) Performance Criteria Clusters Cluster Validity Overview of Pattern Recognition Supervised Learning for Classification The class label is known for a set of samples. Find the decision boundary from the given samples. For unknown data set, do classification Unsupervised Learning for Clustering Set of data is given, find the group or grouping boundary Reinforcement Learning (Reward/Penalty) Unkind teacher is given Trial and Error Scheme Overview of Pattern Recognition Classification and Clustering Problem: Which class to assign Problem: How to partition Class 1 How many clusters Class 2 ? Classification Clustering Overview of Pattern Recognition Pattern Recognition Algorithm Based on statistical approach Parametric Approach Bay’es Classifier with Gaussian Density Nonlinear Boundary or Decision Function Nonparametric Approach for Density Estimation Parzen window K-nearest method Based on Neural Networks Classifier Multilayer Perceptron, ART, Neocogntion, … Clustering SOM(Self-Organizing Map) Fuzzy Pattern Recognition Classification Rule-Based Classifier Fuzzy Perceptron Fuzzy K-NN Algorithm Clustering Fuzzy C-Mean Possibilistic C-Mean Fuzzy C-Shell Clustering Fuzzy Rough Clustering Cluster Validity Validity Measures Based on Fuzzy Set Theory Fuzzy Pattern Recognition Fuzzy Classification Rule-Based Classifier Idea: Nonlinear Partition of Feature Space x2 Rule 1 : If x1 is S and x2 is S, t henclass1 Rule 2 : If x1 is S and x2 is not S, t henclass 2 Rule 3 : If x1 is M and x2 is S, t henclass 2 Rule 4 : If x1 is M and x2 is not S, t henclass1 Rule 5 : If x1 is not S and x2 is L, t henclass 1 Rule 6 : If x1 is L and x2 is S, t henclass 1 x1 Rule 7 : If x1 is L and x2 is M, t henclass 2 How to find the rule from sample data. Project the labeled training data, and design membership functions Fuzzy clustering and projection to obtain membership function Fuzzy Classification Fuzzy K-Nearest Neighbor Algorithm Crisp K-NN Algorithm Class 1 Class 2 Class 2 K=3 Class 1 W {x1 , x2 ,..., xn } : A set of n labelled patterns BEGIN Input y, a unknownsample Set K , 1 K n Initializei 1 DO UNT IL( K - nearest neighborsfound) Computedistancefrom y to xi Find K - nearest neighbors END DO UNT IL Determinethemajorityclass represented in theset of K - nearest neighbors IF (a tie exists)T HEN Computethedistancesof neighborsin each classes which tied IF (no tie exists)T HEN Classify y to theclass with minimumsum ELSE Classify y to theclass with thelast minimumsum END IF ELSE Classify y to themajorityclass END IF END Fuzzy Classification Fuzzy K-Nearest Neighbor Algorithm Fuzzy K-NN Algorithm Known Class Membershipof LabeledSamples: uij thedegree of belongedness of x j to class i Class 1 Class 2 W {x1 , x2 ,..., xn } : A set of n labelled patterns uij for all i 1,2 , ... C and j 1,2 , ... n BEGIN Input y, a unknownsample Set K , 1 K n Init ializei 1 DO UNT IL( K - nearestneighborsfound) Computedistancefrom y to xi Find K - nearestneighbors END DO UNT IL ui ( y ) u j 1 K ij 1/ y x 1 / y x j 1 2 /( m 1) j 2 /( m 1) j (m 1) END FOR Classify y to theclass wit h themaximumui ( y ) END fuzzy clusteringor othermethods m T heneighborsare moreevenly weighted. For i 1 to C Computethemembershipof y to i - th class K How to calculateuij for all xi to class j. m T hecloser neighborsare far morehavily weighted than thos e fartheraway. Fuzzy Nearest Prototype Classification Crisp and Fuzzy Nearest Prototype Classification Prototype of Class 1 Prototype of Class 2 Decision Boundary • Crisp Version W {Z1 , Z 2 ,...,Z C } : A set of C prototypevectors BEGIN Input x, a vector t obe classified For all i 1,...C Find thedist ancefrom Z i to x END For Determinetheminimumdist anceand classify x to thatclass END • Fuzzy Version W {Z1 , Z 2 ,...,Z C } : A set of C protot ypevect ors BEGIN Input x, a vect or tobe classified For all i 1,...C Find thedist ancefrom Z i to x END For For all i 1,...C Comput eui ( x) 1/ x Zi C (1 / x Z j j 1 END For END 2 /( m 1) 2 /( m 1) ) Fuzzy Perceptron Crisp Single-Layer Perceptron (Twoclass problem) Find the linear decision boundary of separable data Linear Decision Boundary X {x1 , x2 ,..., xN | xi R d , i 1,2,...N } : Set of N d - dimensional input vectors C {cx1 , cx2 ,...,cxN | ci {1,1}, i 1,2,...N } : Associatedlabel set X 1 {xi | cxi 1} and X 1 {xi | cxi 1} : Sets of positiveand negativeexample X 1 N1 and X 1 N 1 : N1 N 1 N t xi xi(1) , xi( 2) ,..., xi( d ) ,1 : Augmentedinput vector for each xi w w1 , w2 ,...,wd , wd 1 : Linearseparatinghyperplane t LearningRule : w (t 1) w (t ) η(cx (t ) Cw (t ) ( x(t )))x(t ) 1 if sgn(w t (t ) x(t )) 0 Cw (t ) ( x(t ) otherwise 1 Fuzzy Perceptron Fuzzy Perceptron uk ( xi ) [0,1] : Degree of belongedness of xi to class k For twoclass case u1 ( xi ) u1 ( xi ) 1. For xi X 1 , exp( f (d 1 ( xi ) d 1 ( xi )) / d ) exp( f ) u1 ( xi ) 0.5 2(exp(f ) exp( f )) For xi X 1 , u1 ( xi ) 0.5 exp( f (d1 ( xi ) d 1 ( xi )) / d ) exp( f ) 2(exp(f ) exp( f )) d1 ( xi ) xi mean of positiveclass u1 ( xi ) 1 u1 ( xi ). u1 ( xi ) 1 u1 ( xi ). d 1 ( xi ) xi mean of negativeclass d mean of positiveclass - mean of negativeclass f a constant Fuzzy Perceptron Fuzzy Perceptron LearningRule : w (t 1) w (t ) η u1 ( x(t )) u1 ( x(t )) (c x ( t ) C w (t ) ( x(t )))x(t ) m Advantage u1 ( x(t ) u 1 ( x(t )) 1 Generalize the crisp algorithm Elegant termination in non-separable case Crisp case: Not terminate in finite time m Fuzzy Perceptron Termination of FP If misclassifications are all caused by very fuzzy data, then terminate the learning. veryfuzzy data u1 ( xi ) [0.5 ,0.5 ] 1 exp( f ) 2(exp(f ) exp( f )) Note: FP can be combined with kernel-based method. (J.H. Chen & C.S. Chen, IEEE Trans. On NNs, 2002) Fuzzy C-Mean Clustering Objective The aim of the iterative algorithm is to decr ease the value of an objective function Notations x1 , x 2 ,...,x n Samples Prototypes p1 , p 2 ,...,p k L2-distance: d || x i p j || ( x ik p jk ) 2 2 k 1 Fuzzy C-Mean Crisp objective: n min || xi p j ||2 i 1 j{1, 2,...,k } Fuzzy objective k n m 2 u || p x || ij i j i 1 j 1 Fuzzy C-Mean Crisp C-Mean Algorithm Initiate k seeds of prototypes p1, p2, …, pk Grouping: Assign samples to their nearest prototypes Form non-overlapping clusters out of these samples Centering: Centers of clusters become new prototypes Repeat the grouping and centering steps, until convergence Fuzzy C-Mean Crisp C-Mean Algorithm Grouping: Assigning samples to their nearest prototypes helps to decrease the objective n min i 1 j{1, 2,...,k } || x i p j ||2 Centering: Also helps to decrease the above objective, because m m | y i w || || y i y ||2 || y w ||2 i 1 2 i 1 and equality holds only if 1 m w y yi m i 1 Fuzzy C-Mean Membership matrix: Uc×n Uij is the grade of membership of sample j with respect to prototype i Crisp membership: uij 1, if || pi x j ||2 min || p k x j ||2 k uij 0, otherwise Fuzzy membership: c uij 1, j 1,, n i 1 Fuzzy C-Mean Objective function of FCM c n m 2 uij d ij i 1 j 1 c n uijm || p i x j || 2 i 1 j 1 Introducing the Lagrange multiplier λ c with respect to the constraint i1uij 1, the objective function as: J c m 2 uij dij i 1 c uij 1 i 1 Fuzzy C-Mean Setting the partial derivatives to zero, J c uik 1 0 k 1 J m uijm1 dij2 0 uij md2 ij 1 m 1 From the 2nd equation, u From this fact and the 1st equation, ij m11 1 uik 2 k 1 k 1 m d ik C C m 1 m 1 C 1 2 k 1 d ik 1 m 1 Fuzzy C-Mean Therefore, updating rule is 1 m1 m 1 1 m1 1 2 k 1 d ik c 1 1 m1 uij 1 2 d c 1 m 1 ij 2 k 1 d ik 1 1 2 c d ij m 1 2 k 1 d ik 1 Fuzzy C-Mean Setting the derivative of J with respect t o pi to zero, J c n m 2 0 u || p x || ij i j p i p i i 1 j 1 n uijm j 1 n m uij j 1 n || p i x j || 2 p i (p i x j )T (p i x j ) p i 2 uijm (p i x j ) j 1 Fuzzy C-Mean Update rule of ci: n n J uijm (pi x j ) 0 pi j 1 pi m u ij x j j 1 n j 1 To summarize: uij n 1 1 ( m1) d ij d k 1 kj c uij pi uij x j j 1 n m m u ij j 1 m Fuzzy C-Mean K-means Fuzzy c-means Fuzzy C-Mean Fuzzy C-Mean Gustafson-Kessel Algorithm Cluster Validity to Determine Number of Clusters Extraction of Rule Base from Fuzzy Cluster Possibilistic C-Mean Problem of FCM Equal Evidence = Ignorance u1 ( A) u1 ( B) u1 ( A) u1 ( B) u1 ( B) u1 (C ) u1 ( B) u1 (C ) u1 ( A) u2 ( A) 0.5 u1 ( B) u2 ( B) 0.5 u1 ( A) u2 ( A) 0.5 u1 ( B) u2 ( B) 0.5 u ( A) u2 ( A) 0.5 1 u1 ( B) u2 ( B) 0.5 u ( A) u2 ( A) 0.5 1 u1 ( B) u2 ( B) 0.5 Possibilistic C-Mean Objective Function of Fuzzy C-Mean c J u d uij 1 i 1 i 1 c m ij 2 ij Constraint from Ruspini: Sum of membership of a datum over all classes should be 1. Too restrictive condition for noisy data Objective Function of PCM C N C N i 1 j 1 J (uij ) d i (1 uij ) m m i 1 j 1 2 ij Minimize intra-cluster distance Make membership as large as possible Possibilistic C-Mean Necessary Condition uij 1 d 1 i 2 ij 1 m 1 Determination of i N Average cluster distance i K u j 1 N u j 1 Based on alpha-cut i d x j i i 2 ij m ij d ij2 m ij , where i is the - cut of i . Possibilistic C-Mean Membership according to d ij2 i Possibilistic C-Mean N Cluster Centers ci m u ij x j j 1 N m u ij j 1 Inner Product dij2 ( x j ci )t Ai ( x j ci ) Gustafson-Kessel (See previous page) Spherical shell cluster d ( x c r ) 2 ij 1/ 2 j j Approximate Prototype 1 pi ( H i ) 1 wi 2 N N 2ci x t t m x j pi t H u x 1 w 2 uijm ( x j x j ) j i ij j i 2 j 1 j 1 1 1 ci ci ri 2 i Possibilistic C-Mean 2-Pass Algorithm: Initialize PC Partition DO Until (Change in PC Partition is Small) Update Prototype Update PC Partition using average cluster distances Based on the resulted PC Partition DO Until (Change in PC Partition is Small) Update Prototype Update PC Partition using alpha-cut distances Possibilistic C-Mean Advantage Robust to noisy data Possibly good to get the fuzzy rule base FCM-Based C-Shell PCM-Based C-Shell Other Notion of Distance Other Notion of Distance Weights on features p d (v, xk ) st xk( s ) v ( s ) 2 s 1 i {1,2,...C} p s 1 Optimal Weights is is a 1 n m (s) (s) p uik xk vi k 1 n m (r ) (r ) r 1 uik xk vi k 1 1 /(t 1) 2 2 2 Other Notion of Distance FCM with Euclidian Distance FCM with Adaptive Distance