Knowledge acquisition and processing: new methods for neuro-fuzzy systems Danuta Rutkowska Department of Computer Engineering Technical University of Częstochowa, Poland E-mail: drutko@kik.pcz.czest.pl SOFSEM 2004 Cognitive Technologies Knowledge Acquisition and Inference in the Framework of Soft Computing and Computing with Words SOFSEM 2004 Soft Computing, Computing with Words, ... • • • • • • • • • • Soft computing Computing with words Perception-based systems Computational Intelligence Artificial Intelligence Cognitive sciences Neural networks Fuzzy systems Evolutionary algorithms Intelligent systems Soft computing techniques Neuro-computing Rough sets Fuzzy logic Soft computing Evolutionary algorithms Uncertain variables Probabilistic techniques Cognition The word „cognition” comes from the latin word „cognitio”, which means „knowledge”. Cognitive sciences concern thinking, perception, reasoning, creation of meaning, and other functions of a human mind. Soft computing and cognition The principal aim of soft computing is to exploit the tolerance of uncertainty and vagueness in the area of cognitive reasoning. [Nauck D., Kruse R.: NEFCLASS-J – A JAVA-Based Soft Computing Tool, In. B. Azvine et al. (Eds.), Intelligent Systems and Soft Computing, LNAI 1804, Springer-Verlag, Heidelberg, New York (2000), pp. 139-160]. Artificial Intelligence and cognition The aim of artificial intelligence is to develop paradigms or algorithms that allow machines to perform tasks that involve cognition when performed by humans [A.P. Sage (ed.), Coincise Encyclopedia of Information Processing in Systems and Organization Pergamon Press, New York, 1990] Perception and fuzzy systems Perception is very important in human cognition The systems that incorporate perceptions expressed by words are fuzzy systems, introduced by Prof. L.A. Zadeh. Perception-based systems Fuzzy systems are rule-based systems (knowledge-based systems) that can be viewed as perception-based systems. The rule base of a fuzzy system is composed of fuzzy IF-THEN rules that are similar to the rules used by humans in their reasoning. Learning by examples Learning by examples is one of the simplest cognitive capabilities of a young child. Artificial neural networks with an inductive, supervised learning algorithm, imitate the cognitive behaviour. Machine learning Machine learning research has the potential to make a profound contribution to the theory and practice of expert systems, as well as to other areas of artificial intelligence. Its application to the problem of deriving rule sets from examples is already helping to circumvent the knowledge acquisition bottleneck. [P. Jackson, Introduction to Expert Systems, Addison Wesley, 1999, Chapter 20, p.399] Inductive learning The most common form of supervised learning task is called induction. An inductive learning program is one which is capable of learning from examples by a process of generalization. [P. Jackson, Introduction to Expert Systems, Addison Wesley, 1999, Chapter 20, p.381] Neural network (MLP) Model of an artificial neuron RBF network Gaussian function Normalized RBF network General neuro-fuzzy architecture Fuzzy reasoning for k-th rule consequent antecedent k-th rule R k : IF x is Ak THEN y is B k input variable T x x1 ,, xn X R n k 1, , N output variable yY R A A A k k 1 A x k n k fuzzification fuzzy relation input value x x1 ,, xn X T 1 if A x 0 if input fuzzy set xx xx B y A B x, y k k-th output fuzzy set k k Aggregation and defuzzification aggregation for logical approach aggregation for Mamdani approach B ' y S B y B ' y T B y N k 1 N k output fuzzy set for all N rules k 1 k T-norm S-norm defuzzification N y output value k k y y B k 1 N k y B k 1 centre of consequent fuzzy set Bk Fuzzy implications: Mamdani, logical Mamdani approach logical approach An example of a neuro-fuzzy network More general form of this network Another example of the NF network T-norm A triangular norm T is a function of two arguments T: [0,1]×[0,1]→[0,1] which satisfies the following conditions for a,b,c,d∈[0,1]: Monotonicity :T(a,b)≤T(c,d); a≤c; b≤d Commutativity :T(a,b)=T(b,a) Associativity :T (T(a,b),c)=T(a,T(b,c)) Boundary conditions :T(a,0)=0; T(a,1)=a T-conorm (S-norm) A T-conorm (S-norm) is a function of two arguments S: [0,1]×[0,1]→[0,1], which satisfies the following conditions for a,b,c,d∈[0,1] Monotonicity :S(a,b)≤S(c,d); a≤c; b≤d Commutativity :S(a,b)=S(b,a) Associativity :S (S(a,b),c)=S(a,S(b,c)) Boundary conditions :S(a,0)=a; S(a,1)=1 Neuro-fuzzy inference systems (NFIS) APPROACHES TO DESIGN NFIS MAMDANI LOGICAL TAKAGI - SUGENO Fuzzy-logic inference system FUZZIFIER FUZZY INFERENCE ENGINE DEFUZZIFIER y x FUZZY RULE BASE (IF ... THEN ...) Fuzzy-logic inference system: fuzzifier Fuzzy-logic inference system: fuzzy rule base Fuzzy-logic inference system: fuzzy inference engine Fuzzy-logic inference system: defuzzifier General architecture of Neuro-Fuzzy Inference System I II III x, y IV I1,1 x, y1 x1 1 x I1, 2 agr1 x, y 1 2 x, y x, y y1 I1, N x, y N I 2,1 x2 2 x . . . 1 2 I 2, 2 I 2 , N x, y N . . . y2 agr2 x, y 2 yN . . . I N ,1 x, y 1 xn N x I N ,2 2 I N , N x, y N y 1 1 x, y agrN x, y N 1 NFIS Flexible neuro-fuzzy system: Mamdani approach IMPLICATIONS e.g. AGGREGATIONS OF RULES e.g. Definition: Fuzzy implication A fuzzy implication is a function I:[0,1]2→[0,1] satisfying the following conditions: (I1) if a1≤a3 then I(a1,a2)≥I(a3,a2), for all a1,a2,a3[0,1] (I2) if a2≤a3 then I(a1,a2)≤I(a1,a3), for all a1,a2,a3[0,1] (I3) I(0,a2)=1, for all a2[0,1] (falsity implies anything) (I4) I(a1,1)=1, for all a1[0,1] (anything implies tautology) (I5) I(1,0)=0 (booleanity) Fuzzy implications NAME IMPLICATION I(a,b) KLEENE DIENES max 1 a, b ŁUKASIEWICZ min 1,1 a b REICHENBACH GOGUEN GÖDEL 1-a a b FODOR 1 if max1 a,b if SHARP 1 if 0 if ab ab NAME YAGER ab ab ZADEH WILLMOTT IMPLICATION I(a,b) 1 min1, b a 1 if b if 1 a b if a0 if a0 ab ab if a0 if a0 max min a, b ,1 a max 1 a, b , min maxa,1 b, min1 a, b Flexible neuro-fuzzy system: Logical approach IMPLICATIONS e.g. AGGREGATIONS OF RULES e.g. Flexible neuro-fuzzy system: AND-type compromise NFIS I a, b 1 T a, b S 1 a, b 0,1 I a, b 1 min a, b max 1 a, b SYSTEM 0 MAMDANI TYPE 1 LOGICAL TYPE (0,1) COMPROMISE (MAMDANI AND LOGICAL) Flexible neuro-fuzzy system: OR-type compromise NFIS SYSTEM 0 MAMDANI TYPE 1 LOGICAL TYPE 0.5 UNDEFINED (0,0.5) “MORE MAMDANI” (0.5,1) “MORE LOGICAL” Flexible neuro-fuzzy system L. Rutkowski and K. Cpałka „Flexible Neuro-Fuzzy Systems”, IEEE Trans. Neural Networks, vol. 14, pp. 554-574, May 2003 Flexible neuro-fuzzy system: Soft NFIS (1/2) 1 n ~ T a; 1 ai T a n i 1 n 1 ~ S a; 1 ai S a n i 1 1 ~ I a, b; 1 a b T a, b 2 1 ~ I a, b; 1 1 a b S 1 a, b 2 0,1 0,1 Flexible neuro-fuzzy system: Soft NFIS (2/2) Flexible neuro-fuzzy system: NFIS realized by parameterised families of triangular norms (1/2) THE DOMBI TRIANGULAR NORMS p 0, Flexible neuro-fuzzy system: NFIS realized by parameterised families of triangular norms (2/2) Flexible neuro-fuzzy system: NFIS realized by triangular norms with weighted arguments (1/2) T a1,a2;w1,w2 T 1 w1 1 a1 ,1 w2 1 a2 S a1,a2;w1,w2 S w1a1,w2 a2 T a1 ,a 2 ;0 ,w2 T 1,1 w2 1 a 2 1 w2 1 a 2 S a1 ,a 2 ;0 ,w2 S 0, w2 a 2 T a1 ,a 2 ;w1 ,0 T 1 w1 1 a1 ,1 1 w1 1 a1 w2 a 2 S a1 ,a 2 ;w1 ,0 S w a1 ,0 w1 , w2 0,1 w1 a1 Flexible neuro-fuzzy system: NFIS realized by triangular norms with weighted arguments (2/2) i 0.5 1.0000 0.2395 0.2553 6.66% 7.81% ii 0.5 1.0000 0.2392 0.2483 7.33% 7.81% iii 0 0.2845 0.2196 10.00% 7.81% iv v RMSE / MISTAKES [%] (TESTING SEQUENCE) RMSE / MISTAKES [%] (LEARNING SEQUENCE) FINAL VALUES AFTER LEARNING INITIAL VALUES NAME OF FLEXIBILITY PARAMETER RMSE / MISTAKES [%] (TESTING SEQUENCE) RMSE / MISTAKES [%] (LEARNING SEQUENCE) FINAL VALUES AFTER LEARNING INITIAL VALUES NAME OF FLEXIBILITY PARAMETER Flexible neuro-fuzzy system: Glass Identification – experimental results 0.5 1.0000 p 10 9.9953 I pagr 10 9.9998 0.1856 0.2191 p 10 9.9999 1 0.9576 3.33% 6.25% I agr 1 0.9931 1 0.8482 p I pagr p I agr wagr w 0.5 10 10 10 1 1 1 1 1 1.0000 9.9601 9.9997 9.9836 0.1784 0.2596 0.9213 2.00% 6.25% 0.9939 0.8456 next slide w agr Flexible neuro-fuzzy system: Glass Identification – weights representation k 1, ,2 w i 1, ,9 Weights representation in the Glass Identification problem (dark areas correspond to low values and vice versa) Flexible neuro-fuzzy system: Glass Identification – comparison table Method Testing Acc. [%] Dong and Kothari (IG) 92.86 Dong and Kothari (IG+LA) 93.09 Dong and Kothari (GR) 92.86 Dong and Kothari (GR+LA) 93.10 our result 93.75 r1,1 Neuro-fuzzy relational system T A1 x r2,1 T S rK,1 x1 A2 x r1,2 x2 r2,2 b2 T T T rK ,2 AK x b1 bM y S div T xN r1,M T r2,M T rK,M T S Neuro-fuzzy relational system with fuzzy matrix R Neuro-fuzzy connectionist system (basic architecture) A11 A21 y1 1 N A y2 x1 2 1 A A22 x2 yK y div 1 AN2 1 xN K 1 A A2K L1 ANK 1 L2 L3 Rule generation The neuro-fuzzy networks reflect fuzzy IF-THEN rules. The network architectures are created based on the rules. How to get the rules ? Basic questions: • How many rules ? • What kind of the membership functions (Gaussian, triangular, trapezoidal, etc.) ? • How to determine parameter values of the membership functions (centers, widths) ? Many methods There are many methods of rule generation. However, most of the rules obtained by these methods, when applied in neuro-fuzzy systems for classification, result in some misclassifications. Perception-based approach This method generates fuzzy IF-THEN rules, from a data set, by use of fuzzy granulation. The neuro-fuzzy systems, which utilize these rules, perform without misclassifications. Multi-stage classification The perception-based approach allows to generate fuzzy rules and perform a multi-stage classification without misclassifications. This method will be illustrated on the IRIS example. IRIS data set: 150 data items that contain measurements of iris flowers from three species of iris: Setosa, Versicolor, and Virginica; 50 data items for each of the iris species. The data include information about four features of the iris flowers: sepal length, sepal width, petal length, petal width. Ranges of the measurements of iris flowers (in centimeters) Sepal length 4.3 – 7.9 Sepal width 2.0 – 4.4 Petal length 1.0 – 6.9 Petal width 0.1 – 2.5 Ranges within the classes Setosa Versicolor Virginica Sepal 4.3 – 5.8 4.9 – 7.0 4.9 – 7.9 length Sepal 2.3 – 4.4 2.0 – 3.4 2.2 – 3.8 width Petal length 1.0 – 1.9 3.0 – 5.1 4.5 – 6.9 Petal width 0.1 – 0.6 1.0 – 1.8 1.4 – 2.5 Granulated ranges of sepal length 4.3 – 4.9 Sestosa 4.9 – 5.8 Sestosa 5.8 – 7.0 7.0 – 7.9 Versicolor Virginica Versicolor Virginica Virginica Granulated ranges of sepal width 2.0 – 2.2 Versicolor 2.2 – 2.3 Versicolor Virginica Versicolor Virginica 2.3 – 3.4 Sestosa 3.4 – 3.8 Sestosa 3.8 – 4.4 Sestosa Virginica Granulated ranges of petal length 1.0 – 1.9 Sestosa 3.0 – 4.5 Versicolor 4.5 – 5.1 Versicolor 5.1 – 6.9 Virginica Virginica Granulated ranges of petal width 0.1 – 0.6 Sestosa 1.0 – 1.4 Versicolor 1.4 – 1.8 Versicolor 1.8 – 2.5 Virginica Virginica Linguistic labels for sepal length 4.3 – 4.9 short sepal A11 4.9 – 5.8 medium long sepal A12 5.8 – 7.0 long sepal A13 7.0 – 7.9 very long sepal A14 Linguistic labels for sepal width 2.0 – 2.2 very narrow sepal A21 2.2 – 2.3 narrow sepal A22 2.3 – 3.4 medium wide sepal A23 3.4 – 3.8 wide sepal A24 3.8 – 4.4 very wide sepal A25 Linguistic labels for petal length 1.0 – 1.9 very short petal A31 3.0 – 4.5 medium long petal A32 4.5 – 5.1 long petal A33 5.1 – 6.9 very long petal A34 Linguistic labels for petal width 0.1 – 0.6 very narrow petal A41 1.0 – 1.4 medium wide petal A42 1.4 – 1.8 wide petal A43 1.8 – 2.5 very wide petal A44 Rule 1 IF sepal is short or medium long and medium wide or wide or very wide and petal is very short and very narrow THEN Setosa 1 1 1 3 1 2 IF x1 is A and x2 is A and x3 is A and 1 x4 is A4 THEN Setosa A21 A23 A24 A25 A A11 A12 1 1 A A31 1 3 A A41 1 4 Rule 2 IF sepal is medium long or long and very narrow or narrow or medium wide and petal is medium long or long and medium wide or wide THEN Versicolor 2 1 2 3 2 2 IF x1 is A and x2 is A and x3 is A and 2 x4 is A4 THEN Versicolor A A12 A13 2 1 A A21 A22 A23 2 2 A A32 A33 2 3 A A42 A43 2 4 Rule 3 IF sepal is medium long or long or very long and narrow or medium wide or wide and petal is long or very long and wide or very wide THEN Virginica 3 1 3 3 3 2 IF x1 is A and x2 is A and x3 is A and x4 is A43 THEN Virginica A A12 A13 A14 3 1 A A33 A34 3 3 A A22 A23 A24 3 2 A A43 A44 3 4 NF network for the iris classification Results of the 1st stage classification 50 data vectors correctly classified to Setosa 32 data vectors correctly classified to Versicolor 42 data vectors correctly classified to Virginica 26 data vectors – „I do not know” decision: Versicolor or Virginica These data vectors participate in the 2nd stage of the classification. 2nd stage classification Two fuzzy IF-THEN rules are formulated, based on the granulated ranges, obtained for the data vectors with the „I do not know” decision in the 1st stage. The NF network in the 2nd stage is reduced to the components associated with the Versicolor and Virginica classes. Results of the 2nd stage classification 12 data vectors correctly classified to Versicolor 1 data vector correctly classified to Virginica 13 data vectors – „I do not know” decision: Versicolor or Virginica These data vectors participate in the 3rd stage of the classification. Two new rules are created. Results of the 3rd stage classification 4 data vectors correctly classified to Versicolor 5 data vectors correctly classified to Virginica 4 data vectors – „I do not know” decision: Versicolor or Virginica These data vectors participate in the 4th stage of the classification. Two new rules are created. Results of the 4th stage classification 2 data vectors correctly classified to Versicolor 2 data vectors correctly classified to Virginica All data vectors correctly classified after 4 stages of the classification. No misclassifications ! IRIS data: P1, P2 IRIS 5 4,5 4 P2 (sepal width) 3,5 3 SestosaP1P2 2,5 VersicolorP1P2 VirginicaP1P2 2 1,5 1 0,5 0 0 1 2 3 4 5 P1 (sepal length) 6 7 8 9 IRIS data: P1, P3 IRIS 8 7 P3 (petal lenght) 6 5 SestosaP1P3 4 VersicolorP1P3 VirginicaP1P3 3 2 1 0 0 1 2 3 4 5 P1 (sepal lenght) 6 7 8 9 IRIS data: P2, P4 IRIS 3 2,5 P4 (petal width) 2 SestosaP2P4 1,5 VersicolorP2P4 VirginicaP2P4 1 0,5 0 0 1 2 3 4 5 P2 (sepal width) 6 7 8 9 IRIS data: P3, P4 IRIS 3 2,5 P4 (petal width) 2 SestosaP3P4 1,5 VersicolorP3P4 VirginicaP3P4 1 0,5 0 0 1 2 3 4 P3 (petal length) 5 6 7 8 Diagnosis of a tumor of mucous membrane of uterus Attributes : • • • • • • • • • period of time after menopause BMI (Body Mass Index) 9 attributes LH (luteinizing hormone ) FSH (follicle-stimulating hormone ) PRL (prolactin ) E1 (estron) Data: E2 (estradiol) 52 records of positive diagnosis Aromatase estrogenic receptor Diagnosis: 13 records of negative diagnosis negative (class 0), positive (class 1) Ranges of the attribute values a1 a2 a3 a4 a5 a6 a7 a8 a9 0.5 - 34 20 - 46 0.5 – 120.3 1.36 – 155.4 2.4 – 128.1 156 - 542 0.04 – 1.48 2.28 – 11.85 0.72 – 3.85 Ranges within the classes a1 a2 a3 a4 a5 a6 a7 a8 a9 Class 0 0.5 - 20 20 - 46 1.2 – 53.9 1.63 – 88.2 3.4 – 128.1 170 - 412 0.04 – 0.27 2.28 – 10.51 0.72 – 1.05 Class 1 0.5 - 34 20 - 45 0.5 – 120.3 1.36 – 155.4 2.4 – 76.6 156 - 542 0.05 – 1.48 3 – 11,85 0.91 – 3.85 Rules for the medical diagnosis k 1 k 9 IF x1 is A and and x9 is A THEN Class k k 0,1 NF network for the medical diagnosis A 01 A 02 Attribute 1 x1 Attribute 2 x2 Attribute 9 x9 .. . П Class 0 П Class 1 A 09 .. . 1 A1 A 12 .. . A 19 Results: correct diagnosis 3 cases with the “I do not know” response after the first stage of classification; 62 correct diagnosis for all 65 input vectors. (95.4% correct decisions, 4.6 % “I do not know” ) The “I do not know” answers, which mean positive or negative diagnosis, refer to the cases that are difficult to be recognized, because they belong to overlapping regions. Conclusions (perception-based classification) The perception-based approach allows to generate fuzzy IF-THEN rules in the same way as humans do, and perform the multi-stage classification without misclassifications. Final conclusions Neuro-fuzzy systems are soft computing methods utilizing artificial neural networks and fuzzy systems. Various connectionist architectures of neuro-fuzzy systems can be constructed. The knowledge acquisition concerns fuzzy IF-THEN rules, and is performed by a learning process. The systems realize an inference (fuzzy reasoning) based on these rules.