Lecture 8 Neurofuzzy

advertisement
Fuzzy Inference Systems
Review Fuzzy Models
If <antecedence> then <consequence>.
Basic Configuration of a Fuzzy Logic System
Fuzzification
Inferencing
Input
Defuzzification
Output
Target
Error =Target -Output
Types of Rules
Mamdani Assilian Model
R1: If x is A1 and y is B1 then z is C1
R2: If x is A2 and y is B2 then z is C2
Ai , Bi and Ci, are fuzzy sets defined on the universes of x, y, z
respectively
Takagi-Sugeno Model
R1: If x is A1 and y is B1 then z =f1(x,y)
R1: If x is A2 and y is B2 then z =f2(x,y)
For example: fi(x,y)=aix+biy+ci
Types of Rules
Mamdani Assilian Model
Takagi-Sugeno Model
Mamdani Fuzzy Models
The Reasoning Scheme
Both antecedent and consequent are fuzzy
The Reasoning Scheme
Both antecedent and consequent are fuzzy
1: IF FeO is high & SiO2 is low
1
& Granite is prox & Fault is prox, THEN metal is high
Implication (Max)
=
0
2: IF FeO is aver & SiO2 is high
1
& Granite is interm & Fault is prox, THEN metal is aver
=
0
3: IF FeO is low
1
& SiO2 is high & Granite is dist & Fault is dist, THEN metal is low
=
0
30%
50%
70% 40%
FeO = 60%
55%
70%
SiO2 = 60%
0 km
10 km
20km 0 km
Granite = 5 km
5 km 10km
Fault = 1 km
0t
100t
Metal = ?
1000t
0t
100t
1000t
Defuzzifier
Since consequent is fuzzy, it has to be defuzzified
• Converts the fuzzy output of the inference engine
to crisp using membership functions analogous to
the ones used by the fuzzifier.
• Five commonly used defuzzifying methods:
– Centroid of area (COA)
– Bisector of area (BOA)
– Mean of maximum (MOM)
– Smallest of maximum (SOM)
– Largest of maximum (LOM)
Defuzzifier
Rule 1:
Rule 2:
+
+
Rule 3:
Aggregate (Max)
=
Defuzzify (Find centroid)
Formula for centroid
n

x
i
  ( xi )
i0
n

i0
 ( xi )
125 tonnes
metal
Sugeno Fuzzy Models
• Also known as TSK fuzzy model
– Takagi, Sugeno & Kang, 1985
Fuzzy Rules of TSK Model
While antecedent is fuzzy, consequent is crisp
If x is A and y is B then z = f(x, y)
Fuzzy Sets
The order of a Takagi-Sugeno type fuzzy
inference system = the order of the
polynomial used.
Crisp Function
f(x, y) is very often a polynomial
function w.r.t. x and y.
The Reasoning Scheme
Examples
R1: if X is small and Y is small then z = x +y +1
R2: if X is small and Y is large then z = y +3
R3: if X is large and Y is small then z = x +3
R4: if X is large and Y is large then z = x + y + 2
TAKAGI-SUGENO SYSTEM
1.
2.
3.
4.
IF x is f1x(x) AND y is f1y(y) THEN z1 = p10+p11x+p12y
IF x is f2x(x) AND y is f1y(y) THEN z2 = p20+p21x+p22y
IF x is f1x(x) AND y is f2y(y) THEN z3 = p30+p31x+p32y
IF x is f2x(x) AND y is f2y(y) THEN z4 = p40+p41x+p42y
The firing strength (= output of the IF part) of each rule is:
s1 = f1x(x) AND f1y(y)
s2 = f2x(x) AND f1y(y)
s3 = f1x(x) AND f2y(y)
s4 = f2x(x) AND f2y(y)
Output of each rule (= firing strength x consequent function) :
1. o1 = s1 ∙ z1
2. o2 = s2 ∙ z2
3. o3 = s3 ∙ z3
4. o4 = s4 ∙ z4
Overall output of the fuzzy inference system is:
o +o +o +o
z = s 1+ s 2+ s 3+ s 4
1
2
3
4
Sugeno system
Rule1:
IF FeO is high AND SiO2 is low AND Granite is proximal AND Fault is proximal, THEN
Gold =p1(FeO%)+q1(SiO2%) +r1(Distance2Granite)+s1(Distance2Fault)+t1
Rule 2:
IF FeO is average AND SiO2 is high AND Granite is intermediate AND Fault is proximal,
THEN Gold =p2(FeO%)+q2(SiO2%)+r2(Distance2Granite)+s2(Distance2Fault)+t2
Rule 3:
IF FeO is low AND SiO2 is high AND Granite is distal AND Fault is distal,
THEN Gold =p3(FeO%)+q3(SiO2%)+r3(Distance2Granite)+s3(Distance2Fault)+t3
18
Sugeno system
1: IF FeO is high X SiO2 is low
1
X Granite is prox X Fault is prox, THEN
Gold(R1) =p1(FeO%)+q1(SiO2%) +
r1(Distance2Granite)
+s1(Distance2Fault)+t1
s1
0
2: IF FeO is aver X SiO2 is high
1
X Granite is interm X Fault is prox, THEN Gold(R2) =p2(FeO%)+q2(SiO2%) +
r2(Distance2Granite)
+s2(Distance2Fault)+t2
s2
0
3: IF FeO is low
1
& SiO2 is high & Granite is dist & Fault is dist, THEN
Gold(R3) =p3(FeO%)+q3(SiO2%) +
r3(Distance2Granite)
+s3(Distance2Fault)+t3
s3
0
30%
50%
70% 40%
FeO = 60%
55%
70%
SiO2 = 60%
0 km
10 km
20km 0 km
Granite = 5 km
5 km 10km
Fault = 1 km
Metal = ?
Sugeno system: Output
Firing
strength
s1
Rule
output
Gold(R1) =p1(FeO%)+q1(SiO2%) +
r1(Distance2Granite)
+s1(Distance2Fault)+t1
s2
Gold(R2) =p2(FeO%)+q2(SiO2%) +
r2(Distance2Granite)
+s2(Distance2Fault)+t2
s3
Gold(R3) =p3(FeO%)+q3(SiO2%) +
r3(Distance2Granite)
+s3(Distance2Fault)+t3
Output 
s1 * Gold ( R 1)  s 2 * Gold ( R 2 )  s 3 * Gold ( R 3 )
s1  s 2  s 3
A neural fuzzy system
Implements FIS in the framework of NNs
Output Nodes
Antecedent Nodes
Fuzzification Nodes
x
y
Fuzzification Nodes
Represents the term sets of the features.
If we have two features x and y and two linguistic variables defined
on both of it say BIG and SMALL. Then we have 4 fuzzification
nodes.
BIG SMALL
x
BIG SMALL
y
We use Gaussian Membership functions for fuzzification ---
They are differentiable, triangular and trapezoidal membership
functions are NOT differentiable.
Fuzzification Nodes (Contd.)
  x    2
z  ex p  
2





 and  are two free parameters
of the membership functions
which needs to be determined
How to determine  and 
Two strategies:
1) Fixed  and 
2) Update  and  , through
any tuning algorithm
Consequent nodes
z  px  qy  k
p, q and k are three free
parameters of the consequent
polynomial function
How to determine p, q, k
Two strategies:
1) Fixed
2) Update through any tuning algorithm
Target (t)
Error =
½(t-o)2
z1
w1
z2
z4
z3
w3
w2
Output node
O = (w1z1+w2z2+w3z3+w4z4)/
(w1+w2+w3+w4
Consequent nodes
e.g. z4 = p4x + q4y + k4
w4
Antecedent nodes
e.g. If x is Small & y is Small
μx1
μx2
BIG
SMALL
x
μy2
μy1
SMALL
BIG
y
Fuzzification nodes
ANFIS Architecture
Squares: Adaptive nodes
Circles: Fixed nodes
ANFIS Architecture
Layer 1 (Adaptive)
Contains adaptive nodes, each with a Gaussian membership function:
  (c  x ) 2
f ( x )  exp 
2






Number of nodes = number of variables x number of linguistic values
In the previous example there are 4 nodes (2 variable x 2 linguistic
values for each)
Two parameters to be estimated per node: mean (centre) and standard
deviation (spread)
These are called premise parameters
Number of premise parameters = 2 x number of nodes = 8 in the
example
ANFIS Architecture
Layer 2 (Fixed)
Contains fixed nodes, each with product operator (T-norm operator).
Returns the firing strength of each If-Then Rule.
s1  f 1 x  f 1 y ; s 2  f 2 x  f 1 y
s 3  f1 x  f 2 y ; s 4  f 2 x  f 2 y
The firing strength can be normalized. In ANFIS, each node returns
a normalized firing strength –
s 
s1
s1  s 2  s 3  s 4
Fixed nodes – no parameter to be estimated.
ANFIS Architecture
Layer 3 (Adaptive)
Each node contains an adaptive polynomial, and returns output
for each fuzzy If-Then rule
z 1  s1  (p 10  p 11 x  p 12 y)
z 2  s 2  (p 20  p 21 x  p 22 y)
z 3  s 3  (p 30  p 31 x  p 32 y)
z 4  s 4  (p 40  p 41 x  p 42 y)
Number of nodes = number of If-Then Rules.
The parameters ps are called consequent parameters.
ANFIS Architecture
Layer 4 (Fixed)
Sums up the output of each node in the previous layer:
z  z1  z 2  z 3  z 4
A single node in this layer.
No parameter to be estimated.
ANFIS Training
z  z1  z 2  z 3  z 4
z 1  s1  (p 10  p 11 x  p 12 y)
z 2  s 2  (p 20  p 21 x  p 22 y)
Linear in the consequent
parameters Pki, if the premise
parameters and, therefore, the
firing strengths sk of the fuzzy
if-then rules are fixed.
z 3  s 3  (p 30  p 31 x  p 32 y)
z 4  s 4  (p 40  p 41 x  p 42 y)
ANFIS uses a hybrid learning procedure (Jang and Sun, 1995) for
estimation of the premise and consequent parameters.
The hybrid learning procedure estimates the consequent parameters
(keeping the premise parameters fixed) in a forward pass and the
premise parameters (keeping the consequent parameters fixed) in a
backward pass.
ANFIS Training
The forward pass:
Propagate information
forward until Layer 3
Estimate the consequent
parameters by the least
square estimator.
Squares: Adaptive nodes
Circles: Fixed nodes
The backward pass:
Propagate the error
signals backwards and
update the premise
parameters by gradient
descent.
ANFIS Training : Least Square Estimation
1. Data assembled in form of (xn; yn)
2. We assume that there is a linear
relation between x and y:
y = ax + b
3. Can be extended to n dimensions:
y = a1x1 + a2x2 + a3x3 + … + b
The problem: Given the function f, find values of coefficients ais such
that the linear combination best fits the data
ANFIS Training : Least Square Estimation
Given data {(x1; y1 (xN ; yN)}, we may define the error associated to saying
y = ax + b by:
This is just N times the variance of data : {y1 - (ax1+b),…., yn - (axN +b)}
The goal is to find values of a and b that minimize the error. In other
words minimize the partial derivative of the error wrt a and b:
ANFIS Training : Least Square Estimation
Which gives us:
We may rewrite them as:
The values of a and b which minimize the error satisfy the following
matrix equation:
Hence a and b
are estimated
using:
ANFIS Training : Least Square Estimation
For the following data find least square estimator
2
a  x
   

b   x
a
   
b
 x 
 1 
1
  xy 




y



1
 x . 1   x .
SNo
1
2
3
4
5
6
7
8
9
TOTAL
2
X
2
3
4
6
8
1
2
11
14
51
 1

x    x
Y
9
11
13
17
21
7
9
27
33
147
 x    x y 
 x    y 

2
X2
XY
4
18
9
33
16
52
36 102
64 168
1
7
4
18
121 297
196 462
451 1157
ANFIS Training : Least Square Estimation
z 1  s1  (p 10  p 11 x  p 12 y); z 2  s 2  (p 20  p 21 x  p 22 y); z 3  s 3  (p 30  p 31 x  p 32 y); z 4  s 4  (p 40  p 41 x  p 42 y)
Output  o  z 1  z 2  z 3  z 4 
s1  (p 10  p 11 x  p 12 y)  s 2  (p 20  p 21 x  p 22 y)  s 3  (p 30  p 31 x  p 32 y)  s 4  (p 40  p 41 x  p 42 y)
Let the training
 x1

x2
x
 3
x4

x
 5
x
 6
y1
y2
y3
y4
y5
y6
data be :
t1 

t2 
t3 

t4 

t5

t 6 
E  ( t1  [ s1  (p 10  p 11 x 1  p 12 y 1 )  s 2  (p 20  p 21 x 1  p 22 y 1 )  s 3  (p 30  p 31 x 1  p 32 y 1 )  s 4  (p 40  p 41 x 1  p 42 y 1 )] 
2
:
E  ( t 6  [ s1  (p 10  p 11 x 6  p 12 y 6 )  s 2  (p 20  p 21 x 6  p 22 y 6 )  s 3  (p 30  p 31 x 6  p 32 y 6 )  s 4  (p 40  p 41 x 6  p 42 y 6 )] 
2
E
 p 10
 2 ( t1  [ s1  (p 10  p 11 x 1  p 12 y 1 )  s 2  (p 20  p 21 x 1  p 22 y 1 )  s 3  (p 30  p 31 x 1  p 32 y 1 )  s 4  (p 40  p 41 x 1  p 42 y 1 )] .(  s1 )  0
:
E
 p 42
 2 ( t1  [ s1  (p 10  p 11 x 1  p 12 y 1 )  s 2  (p 20  p 21 x 1  p 22 y 1 )  s 3  (p 30  p 31 x 1  p 32 y 1 )  s 4  (p 40  p 41 x 1  p 42 y 1 )] .(  y 1 )  0
Simplify
and use LSE.
ANFIS Training : Gradient descent
After the
least square estimate
- the parameter
- firing strength v
- variable
of all consequent
parameters , plug in :
values,
alues and
(x, y) values
Error at layer 1  E 
(Target Output - Actual Output)
2
2
Let the centre and spread parameters
E
ci

E O O i si  i
O O i si  i ci
where O is the output,
E
 i

in Layer 1 be represente d by c i (centre) and s i (spread).
fi is the fuzzy membership
function.
Similarly,
E O O i si  i
O O i si  i  i
T he above expression
s are used to update centre and spread parameters .
Download