Fuzzy rule-based system derived from similarity to prototypes Włodzisław Duch Marcin Blachnik Department of Informatics, Nicolaus Copernicus University, Poland School of Computer Engineering, Nanyang Technological University, Singapore Division of Computer Methods, Department of Elektrotehnology, The Silesian University of Technology, Poland Plan 1. What is it all about? 2. Fuzzy rule systems and prototype rule based systems. 3. From prototype rules to fuzzy rules and vice versa, with examples. 4. Results of applications on real datasets. 5. Conclusions. Motivation Understanding data, situations, recognizing objects or making diagnosis people frequently use similarity to known cases, and rarely use logical reasoning, but soft computing experts use logic instead of similarity ... Relations between similarity and logic are not clear. Q1: How to obtain the same decision borders in Fuzzy Logic systems and Prototype Rule Based systems? Q2: What type of similarity measure corresponds to a typical fuzzy functions and vice versa? Q3: How to transform one type of a system into another type preserving their decision borders? Q4: Are there any advantages of such transformations? Q5: Can we understand data better using prototypes instead of logical rules? Fuzzy Rule Based System Learning process includes: – for each feature, select shapes of membership functions and the number of these functions; – optimize parameters of the membership functions (such as positions and spreads) using training data; – aggregate input information and calculate final rule activations for each category; – assign membership degrees to output classes; – write the set of F-rules and interpret them. Prototype Rule Based System Learning process involves: specify the number and positions of prototypes; select similarity or dissimilarity (distance) functions (we use distance functions); calculate distance (similarity) to each prototype; assign P-rule to the output class as a rule; choices are: If P=argminp’(D(X,P’)) Then Class(X)=Class(P) This is a nearest prototype rule, similar to the fuzzy logic rule: If R=maxk MembFk(X) Then Class(X)<=Class(R) Another form of P-rules is based on similarity threshold: If D(X,P)≤dp Then C Taking D(X,P) distance crisp logic rules are obtained Advantages of prototype based rules Inspired by cognitive psychology: it may be easier to understand prototypes and similarity than fuzzy rules P-rules may be defined for nominal features using probabilistic distance measures (such as VDM), while F-rules require numerical inputs. Many algorithms for prototype selection and optimization exist but they have not been applied to understand data and their relation to fuzzy rules have not been explored; Applications of P-rules to real datasets give excellent results generating small number of prototypes. Value Difference Matrix (VDM) VDM – probability difference measure for 1 attribute dVDM x j , rj p Ci | x j p Ci | rj K q q i 1 for many attributes DVDM X , R dVDM x j , rj q N q j 1 VDM measure can be also applied for continuous features, in the simplest way using discretization and interpolation, or other probability estimation techniques (Gaussian smoothing, Parzen windows, etc). P-rules F-rules Condition: preserve classification borders Q: how are membership functions and distance functions related? Can one obtain new, interesting membership functions from known distance functions and vice versa? For all additive distance functions exp transformation changes distances D of P-rules into products of MF of F-rules: MF=exp(-D) Example: Euclidean distance is equivalent to Gaussian MFs N D X , P Wi X i Pi 2 2 i 1 F exp D X , P 2 2 N N 2 exp Wi X i Pi exp Wi X i Pi i 1 i 1 Algebraic (product) T-norm is obtained with Gaussian MFs X i ; Pi ,Wi exp Wi X i Pi ; F X i ; Pi ,Wi 2 i 1 Visualization Decision border MF for attrib 1 Euclidean distance function Square of Canberra distance function MF for attrib 2 VDM distance => membership functions Decision border DVDM distance function IVDM distance function MF for attrib 1 MF for attrib 2 Inverse transformation For all product T-norm D = ln(F) Advantages: New type of distance functions are generated. Example: distances generated from triangular functions. 1 ( xi pi ) / F 1 ( pi xi ) / i 1 0 otherwise N xi ( pi ; pi ) xi ( pi ; pi ) N 1 ( xi pi ) / xi ( pi ; pi ) D ln( F ) ln 1 ( pi xi ) / xi ( pi ; pi ) i 1 0 otherwise ln(1 ( xi pi ) / ) xi ( pi ; pi ) N ln(1 ( pi xi ) / ) xi ( pi ; pi ) i 1 inf otherwise Applications to real data 1. Gene expression data for 2 types of leukaemia (Golub et al, Science 286 (1999) 531-537 Description: 2 classes, 1100 features, 3 most relevant selected. Used methods: 1 prototype/class LVQ, DVDM similarity measure. Results (number of misclassified vectors): Data Set 2. Golub et al P-rules Train 3 0 Test 5 3 Searching for Promoters in DNA strings Description: 2 classes, 57 features, all symbolic features. Used methods: 9 prototypes for promoters, 12 for nonpromoters, generated using C-means + LVQ, with VDM similarity measure. Results: 5 misclassified vectors in leave one out test. Conclusions First step in understanding relations between fuzzy and similarity-based systems was made. Prototype rules can be expressed using fuzzy rules and vice versa. New possibilities in both fields: – new type of membership functions; – new type of distance functions; VDM measure used in P-rules leads to a natural shape of membership functions in fuzzy logic for symbolic data. Expert knowledge can be captured in both types of rules, but sometimes it is easier to express as P-rules and sometimes as F-rules. Many open problems remain. Thank You for lending your ears ... Speaker: Marcin Blachnik