Fuzzy Inference Systems Review Fuzzy Models If <antecedence> then <consequence>. Basic Configuration of a Fuzzy Logic System Fuzzification Inferencing Input Defuzzification Output Target Error =Target -Output Types of Rules Mamdani Assilian Model R1: If x is A1 and y is B1 then z is C1 R2: If x is A2 and y is B2 then z is C2 Ai , Bi and Ci, are fuzzy sets defined on the universes of x, y, z respectively Takagi-Sugeno Model R1: If x is A1 and y is B1 then z =f1(x,y) R1: If x is A2 and y is B2 then z =f2(x,y) For example: fi(x,y)=aix+biy+ci Types of Rules Mamdani Assilian Model Takagi-Sugeno Model Mamdani Fuzzy Models The Reasoning Scheme Both antecedent and consequent are fuzzy The Reasoning Scheme Both antecedent and consequent are fuzzy 1: IF FeO is high & SiO2 is low 1 & Granite is prox & Fault is prox, THEN metal is high Implication (Max) = 0 2: IF FeO is aver & SiO2 is high 1 & Granite is interm & Fault is prox, THEN metal is aver = 0 3: IF FeO is low 1 & SiO2 is high & Granite is dist & Fault is dist, THEN metal is low = 0 30% 50% 70% 40% FeO = 60% 55% 70% SiO2 = 60% 0 km 10 km 20km 0 km Granite = 5 km 5 km 10km Fault = 1 km 0t 100t Metal = ? 1000t 0t 100t 1000t Defuzzifier Since consequent is fuzzy, it has to be defuzzified • Converts the fuzzy output of the inference engine to crisp using membership functions analogous to the ones used by the fuzzifier. • Five commonly used defuzzifying methods: – Centroid of area (COA) – Bisector of area (BOA) – Mean of maximum (MOM) – Smallest of maximum (SOM) – Largest of maximum (LOM) Defuzzifier Rule 1: Rule 2: + + Rule 3: Aggregate (Max) = Defuzzify (Find centroid) Formula for centroid n x i ( xi ) i0 n i0 ( xi ) 125 tonnes metal Sugeno Fuzzy Models • Also known as TSK fuzzy model – Takagi, Sugeno & Kang, 1985 Fuzzy Rules of TSK Model While antecedent is fuzzy, consequent is crisp If x is A and y is B then z = f(x, y) Fuzzy Sets The order of a Takagi-Sugeno type fuzzy inference system = the order of the polynomial used. Crisp Function f(x, y) is very often a polynomial function w.r.t. x and y. The Reasoning Scheme Examples R1: if X is small and Y is small then z = x +y +1 R2: if X is small and Y is large then z = y +3 R3: if X is large and Y is small then z = x +3 R4: if X is large and Y is large then z = x + y + 2 TAKAGI-SUGENO SYSTEM 1. 2. 3. 4. IF x is f1x(x) AND y is f1y(y) THEN z1 = p10+p11x+p12y IF x is f2x(x) AND y is f1y(y) THEN z2 = p20+p21x+p22y IF x is f1x(x) AND y is f2y(y) THEN z3 = p30+p31x+p32y IF x is f2x(x) AND y is f2y(y) THEN z4 = p40+p41x+p42y The firing strength (= output of the IF part) of each rule is: s1 = f1x(x) AND f1y(y) s2 = f2x(x) AND f1y(y) s3 = f1x(x) AND f2y(y) s4 = f2x(x) AND f2y(y) Output of each rule (= firing strength x consequent function) : 1. o1 = s1 ∙ z1 2. o2 = s2 ∙ z2 3. o3 = s3 ∙ z3 4. o4 = s4 ∙ z4 Overall output of the fuzzy inference system is: o +o +o +o z = s 1+ s 2+ s 3+ s 4 1 2 3 4 Sugeno system Rule1: IF FeO is high AND SiO2 is low AND Granite is proximal AND Fault is proximal, THEN Gold =p1(FeO%)+q1(SiO2%) +r1(Distance2Granite)+s1(Distance2Fault)+t1 Rule 2: IF FeO is average AND SiO2 is high AND Granite is intermediate AND Fault is proximal, THEN Gold =p2(FeO%)+q2(SiO2%)+r2(Distance2Granite)+s2(Distance2Fault)+t2 Rule 3: IF FeO is low AND SiO2 is high AND Granite is distal AND Fault is distal, THEN Gold =p3(FeO%)+q3(SiO2%)+r3(Distance2Granite)+s3(Distance2Fault)+t3 18 Sugeno system 1: IF FeO is high X SiO2 is low 1 X Granite is prox X Fault is prox, THEN Gold(R1) =p1(FeO%)+q1(SiO2%) + r1(Distance2Granite) +s1(Distance2Fault)+t1 s1 0 2: IF FeO is aver X SiO2 is high 1 X Granite is interm X Fault is prox, THEN Gold(R2) =p2(FeO%)+q2(SiO2%) + r2(Distance2Granite) +s2(Distance2Fault)+t2 s2 0 3: IF FeO is low 1 & SiO2 is high & Granite is dist & Fault is dist, THEN Gold(R3) =p3(FeO%)+q3(SiO2%) + r3(Distance2Granite) +s3(Distance2Fault)+t3 s3 0 30% 50% 70% 40% FeO = 60% 55% 70% SiO2 = 60% 0 km 10 km 20km 0 km Granite = 5 km 5 km 10km Fault = 1 km Metal = ? Sugeno system: Output Firing strength s1 Rule output Gold(R1) =p1(FeO%)+q1(SiO2%) + r1(Distance2Granite) +s1(Distance2Fault)+t1 s2 Gold(R2) =p2(FeO%)+q2(SiO2%) + r2(Distance2Granite) +s2(Distance2Fault)+t2 s3 Gold(R3) =p3(FeO%)+q3(SiO2%) + r3(Distance2Granite) +s3(Distance2Fault)+t3 Output s1 * Gold ( R 1) s 2 * Gold ( R 2 ) s 3 * Gold ( R 3 ) s1 s 2 s 3 A neural fuzzy system Implements FIS in the framework of NNs Output Nodes Antecedent Nodes Fuzzification Nodes x y Fuzzification Nodes Represents the term sets of the features. If we have two features x and y and two linguistic variables defined on both of it say BIG and SMALL. Then we have 4 fuzzification nodes. BIG SMALL x BIG SMALL y We use Gaussian Membership functions for fuzzification --- They are differentiable, triangular and trapezoidal membership functions are NOT differentiable. Fuzzification Nodes (Contd.) x 2 z ex p 2 and are two free parameters of the membership functions which needs to be determined How to determine and Two strategies: 1) Fixed and 2) Update and , through any tuning algorithm Consequent nodes z px qy k p, q and k are three free parameters of the consequent polynomial function How to determine p, q, k Two strategies: 1) Fixed 2) Update through any tuning algorithm Target (t) Error = ½(t-o)2 z1 w1 z2 z4 z3 w3 w2 Output node O = (w1z1+w2z2+w3z3+w4z4)/ (w1+w2+w3+w4 Consequent nodes e.g. z4 = p4x + q4y + k4 w4 Antecedent nodes e.g. If x is Small & y is Small μx1 μx2 BIG SMALL x μy2 μy1 SMALL BIG y Fuzzification nodes ANFIS Architecture Squares: Adaptive nodes Circles: Fixed nodes ANFIS Architecture Layer 1 (Adaptive) Contains adaptive nodes, each with a Gaussian membership function: (c x ) 2 f ( x ) exp 2 Number of nodes = number of variables x number of linguistic values In the previous example there are 4 nodes (2 variable x 2 linguistic values for each) Two parameters to be estimated per node: mean (centre) and standard deviation (spread) These are called premise parameters Number of premise parameters = 2 x number of nodes = 8 in the example ANFIS Architecture Layer 2 (Fixed) Contains fixed nodes, each with product operator (T-norm operator). Returns the firing strength of each If-Then Rule. s1 f 1 x f 1 y ; s 2 f 2 x f 1 y s 3 f1 x f 2 y ; s 4 f 2 x f 2 y The firing strength can be normalized. In ANFIS, each node returns a normalized firing strength – s s1 s1 s 2 s 3 s 4 Fixed nodes – no parameter to be estimated. ANFIS Architecture Layer 3 (Adaptive) Each node contains an adaptive polynomial, and returns output for each fuzzy If-Then rule z 1 s1 (p 10 p 11 x p 12 y) z 2 s 2 (p 20 p 21 x p 22 y) z 3 s 3 (p 30 p 31 x p 32 y) z 4 s 4 (p 40 p 41 x p 42 y) Number of nodes = number of If-Then Rules. The parameters ps are called consequent parameters. ANFIS Architecture Layer 4 (Fixed) Sums up the output of each node in the previous layer: z z1 z 2 z 3 z 4 A single node in this layer. No parameter to be estimated. ANFIS Training z z1 z 2 z 3 z 4 z 1 s1 (p 10 p 11 x p 12 y) z 2 s 2 (p 20 p 21 x p 22 y) Linear in the consequent parameters Pki, if the premise parameters and, therefore, the firing strengths sk of the fuzzy if-then rules are fixed. z 3 s 3 (p 30 p 31 x p 32 y) z 4 s 4 (p 40 p 41 x p 42 y) ANFIS uses a hybrid learning procedure (Jang and Sun, 1995) for estimation of the premise and consequent parameters. The hybrid learning procedure estimates the consequent parameters (keeping the premise parameters fixed) in a forward pass and the premise parameters (keeping the consequent parameters fixed) in a backward pass. ANFIS Training The forward pass: Propagate information forward until Layer 3 Estimate the consequent parameters by the least square estimator. Squares: Adaptive nodes Circles: Fixed nodes The backward pass: Propagate the error signals backwards and update the premise parameters by gradient descent. ANFIS Training : Least Square Estimation 1. Data assembled in form of (xn; yn) 2. We assume that there is a linear relation between x and y: y = ax + b 3. Can be extended to n dimensions: y = a1x1 + a2x2 + a3x3 + … + b The problem: Given the function f, find values of coefficients ais such that the linear combination best fits the data ANFIS Training : Least Square Estimation Given data {(x1; y1 (xN ; yN)}, we may define the error associated to saying y = ax + b by: This is just N times the variance of data : {y1 - (ax1+b),…., yn - (axN +b)} The goal is to find values of a and b that minimize the error. In other words minimize the partial derivative of the error wrt a and b: ANFIS Training : Least Square Estimation Which gives us: We may rewrite them as: The values of a and b which minimize the error satisfy the following matrix equation: Hence a and b are estimated using: ANFIS Training : Least Square Estimation For the following data find least square estimator 2 a x b x a b x 1 1 xy y 1 x . 1 x . SNo 1 2 3 4 5 6 7 8 9 TOTAL 2 X 2 3 4 6 8 1 2 11 14 51 1 x x Y 9 11 13 17 21 7 9 27 33 147 x x y x y 2 X2 XY 4 18 9 33 16 52 36 102 64 168 1 7 4 18 121 297 196 462 451 1157 ANFIS Training : Least Square Estimation z 1 s1 (p 10 p 11 x p 12 y); z 2 s 2 (p 20 p 21 x p 22 y); z 3 s 3 (p 30 p 31 x p 32 y); z 4 s 4 (p 40 p 41 x p 42 y) Output o z 1 z 2 z 3 z 4 s1 (p 10 p 11 x p 12 y) s 2 (p 20 p 21 x p 22 y) s 3 (p 30 p 31 x p 32 y) s 4 (p 40 p 41 x p 42 y) Let the training x1 x2 x 3 x4 x 5 x 6 y1 y2 y3 y4 y5 y6 data be : t1 t2 t3 t4 t5 t 6 E ( t1 [ s1 (p 10 p 11 x 1 p 12 y 1 ) s 2 (p 20 p 21 x 1 p 22 y 1 ) s 3 (p 30 p 31 x 1 p 32 y 1 ) s 4 (p 40 p 41 x 1 p 42 y 1 )] 2 : E ( t 6 [ s1 (p 10 p 11 x 6 p 12 y 6 ) s 2 (p 20 p 21 x 6 p 22 y 6 ) s 3 (p 30 p 31 x 6 p 32 y 6 ) s 4 (p 40 p 41 x 6 p 42 y 6 )] 2 E p 10 2 ( t1 [ s1 (p 10 p 11 x 1 p 12 y 1 ) s 2 (p 20 p 21 x 1 p 22 y 1 ) s 3 (p 30 p 31 x 1 p 32 y 1 ) s 4 (p 40 p 41 x 1 p 42 y 1 )] .( s1 ) 0 : E p 42 2 ( t1 [ s1 (p 10 p 11 x 1 p 12 y 1 ) s 2 (p 20 p 21 x 1 p 22 y 1 ) s 3 (p 30 p 31 x 1 p 32 y 1 ) s 4 (p 40 p 41 x 1 p 42 y 1 )] .( y 1 ) 0 Simplify and use LSE. ANFIS Training : Gradient descent After the least square estimate - the parameter - firing strength v - variable of all consequent parameters , plug in : values, alues and (x, y) values Error at layer 1 E (Target Output - Actual Output) 2 2 Let the centre and spread parameters E ci E O O i si i O O i si i ci where O is the output, E i in Layer 1 be represente d by c i (centre) and s i (spread). fi is the fuzzy membership function. Similarly, E O O i si i O O i si i i T he above expression s are used to update centre and spread parameters .