Knowledge acquisition and processing: new methods for neuro

advertisement
Knowledge acquisition and processing:
new methods for neuro-fuzzy systems
Danuta Rutkowska
Department of Computer Engineering
Technical University of Częstochowa, Poland
E-mail: drutko@kik.pcz.czest.pl
SOFSEM 2004
Cognitive Technologies
Knowledge Acquisition and Inference
in the Framework of Soft Computing
and Computing with Words
SOFSEM 2004
Soft Computing, Computing with Words, ...
•
•
•
•
•
•
•
•
•
•
Soft computing
Computing with words
Perception-based systems
Computational Intelligence
Artificial Intelligence
Cognitive sciences
Neural networks
Fuzzy systems
Evolutionary algorithms
Intelligent systems
Soft computing techniques
Neuro-computing
Rough
sets
Fuzzy
logic
Soft
computing
Evolutionary
algorithms
Uncertain
variables
Probabilistic
techniques
Cognition
The word „cognition”
comes from the latin word
„cognitio”, which means
„knowledge”.
Cognitive sciences concern
thinking, perception, reasoning,
creation of meaning, and other
functions of a human mind.
Soft computing and cognition
The principal aim of soft computing
is to exploit the tolerance of uncertainty
and vagueness in the area of cognitive
reasoning.
[Nauck D., Kruse R.: NEFCLASS-J – A JAVA-Based
Soft Computing Tool, In. B. Azvine et al. (Eds.),
Intelligent Systems and Soft Computing, LNAI 1804,
Springer-Verlag, Heidelberg, New York (2000), pp.
139-160].
Artificial Intelligence and cognition
The aim of artificial intelligence
is to develop paradigms or algorithms
that allow machines to perform tasks
that involve cognition when performed
by humans
[A.P. Sage (ed.), Coincise Encyclopedia of
Information Processing in Systems and Organization
Pergamon Press, New York, 1990]
Perception and fuzzy systems
Perception is very important
in human cognition
The systems that incorporate
perceptions expressed by words
are fuzzy systems, introduced
by Prof. L.A. Zadeh.
Perception-based systems
Fuzzy systems are rule-based systems
(knowledge-based systems) that can be
viewed as perception-based systems.
The rule base of a fuzzy system
is composed of fuzzy IF-THEN rules
that are similar to the rules used
by humans in their reasoning.
Learning by examples
Learning by examples
is one of the simplest
cognitive capabilities
of a young child.
Artificial neural networks
with an inductive, supervised
learning algorithm, imitate
the cognitive behaviour.
Machine learning
Machine learning research has the potential
to make a profound contribution to the
theory and practice of expert systems,
as well as to other areas of artificial
intelligence. Its application to the
problem of deriving rule sets from
examples is already helping to circumvent
the knowledge acquisition bottleneck.
[P. Jackson, Introduction to Expert Systems,
Addison Wesley, 1999, Chapter 20, p.399]
Inductive learning
The most common form of
supervised learning task
is called induction.
An inductive learning program
is one which is capable of
learning from examples by a
process of generalization.
[P. Jackson, Introduction to Expert Systems,
Addison Wesley, 1999, Chapter 20, p.381]
Neural network (MLP)
Model of an artificial neuron
RBF network
Gaussian function
Normalized RBF network
General neuro-fuzzy architecture
Fuzzy reasoning for k-th rule
consequent
antecedent
k-th rule
R k : IF x is Ak THEN y is B k
input variable
T
x  x1 ,, xn   X  R n
k  1,  , N
output variable
yY  R
A  A   A
k
k
1
 A x 
k
n
k
fuzzification
fuzzy relation
input value
x  x1 ,, xn   X
T
1 if
 A x   
0 if
input fuzzy set
xx
xx
 B  y    A B x, y 
k
k-th output fuzzy set
k
k
Aggregation and defuzzification
aggregation for
logical approach
aggregation for
Mamdani approach
 B '  y   S  B  y 
 B '  y   T  B  y 
N
k 1
N
k
output fuzzy set
for all N rules
k 1
k
T-norm
S-norm
defuzzification
N
y
output value
 
k
k
y

y
 B
k 1
N
k


y
 B 
k 1
centre of consequent
fuzzy set Bk
Fuzzy implications: Mamdani, logical
Mamdani
approach
logical
approach
An example of a neuro-fuzzy network
More general form of this network
Another example of the NF network
T-norm
A triangular norm T is a function of two
arguments T: [0,1]×[0,1]→[0,1]
which satisfies the following conditions
for a,b,c,d∈[0,1]:
Monotonicity :T(a,b)≤T(c,d); a≤c; b≤d
Commutativity :T(a,b)=T(b,a)
Associativity :T (T(a,b),c)=T(a,T(b,c))
Boundary conditions :T(a,0)=0; T(a,1)=a
T-conorm (S-norm)
A T-conorm (S-norm) is a function of two
arguments S: [0,1]×[0,1]→[0,1],
which satisfies the following conditions
for a,b,c,d∈[0,1]
Monotonicity :S(a,b)≤S(c,d); a≤c; b≤d
Commutativity :S(a,b)=S(b,a)
Associativity :S (S(a,b),c)=S(a,S(b,c))
Boundary conditions :S(a,0)=a; S(a,1)=1
Neuro-fuzzy inference systems (NFIS)
APPROACHES TO DESIGN NFIS
MAMDANI
LOGICAL
TAKAGI - SUGENO
Fuzzy-logic inference system
FUZZIFIER
FUZZY INFERENCE
ENGINE
DEFUZZIFIER
y
x
FUZZY RULE BASE
(IF ... THEN ...)
Fuzzy-logic inference system: fuzzifier
Fuzzy-logic
inference system:
fuzzy rule base
Fuzzy-logic inference system: fuzzy inference engine
Fuzzy-logic
inference system:
defuzzifier
General architecture of Neuro-Fuzzy Inference System
I
II
III
 
x, y 
IV
I1,1 x, y1
x1
 1 x 
I1, 2

agr1 x, y 1
2
 
x, y 
x, y 

y1
I1, N x, y N
I 2,1
x2
 2 x 
.
.
.
1
2
I 2, 2

I 2 , N x, y N
.
.
.
y2

agr2 x, y 2
yN



.
.
.
I N ,1 x, y 1
xn
 N x 
I N ,2
2

I N , N x, y N

y
1
1
 
x, y 


agrN x, y N

1

NFIS
Flexible neuro-fuzzy
system:
Mamdani approach
IMPLICATIONS
e.g.
AGGREGATIONS OF RULES
e.g.
Definition: Fuzzy implication
A fuzzy implication is a function I:[0,1]2→[0,1]
satisfying the following conditions:
(I1)
if a1≤a3 then I(a1,a2)≥I(a3,a2), for all a1,a2,a3[0,1]
(I2)
if a2≤a3 then I(a1,a2)≤I(a1,a3), for all a1,a2,a3[0,1]
(I3)
I(0,a2)=1, for all a2[0,1]
(falsity implies anything)
(I4)
I(a1,1)=1, for all a1[0,1]
(anything implies tautology)
(I5)
I(1,0)=0 (booleanity)
Fuzzy implications
NAME
IMPLICATION I(a,b)
KLEENE DIENES
max 1  a, b
ŁUKASIEWICZ
min 1,1  a  b
REICHENBACH
GOGUEN
GÖDEL
1-a  a  b
FODOR
1
if


max1  a,b if
SHARP
1 if

0 if
ab
ab
NAME
YAGER
ab
ab
ZADEH
WILLMOTT
IMPLICATION I(a,b)
1


min1, b 

 a
1 if

b if
1
 a
b
if
a0
if
a0
ab
ab
if
a0
if
a0
max min a, b ,1  a
max 1  a, b ,


min 

maxa,1  b, min1  a, b
Flexible neuro-fuzzy
system:
Logical approach
IMPLICATIONS
e.g.
AGGREGATIONS OF RULES
e.g.
Flexible neuro-fuzzy system: AND-type compromise NFIS
I a, b  1   T a, b  S 1  a, b
  0,1
I a, b  1   min a, b   max 1  a, b

SYSTEM
0
MAMDANI TYPE
1
LOGICAL TYPE
(0,1)
COMPROMISE
(MAMDANI AND LOGICAL)
Flexible neuro-fuzzy system: OR-type compromise NFIS
SYSTEM
0
MAMDANI TYPE
1
LOGICAL TYPE
0.5
UNDEFINED
(0,0.5)
“MORE MAMDANI”
(0.5,1)
“MORE LOGICAL”
Flexible neuro-fuzzy system
L. Rutkowski and K. Cpałka „Flexible Neuro-Fuzzy Systems”, IEEE Trans.
Neural Networks, vol. 14, pp. 554-574, May 2003
Flexible neuro-fuzzy system: Soft NFIS (1/2)
1 n
~
T a;   1     ai  T a
n i 1
n
1
~
S a;   1     ai  S a
n i 1
1
~
I a, b;    1    a  b  T a, b
2
1
~
I a, b;    1    1  a  b  S 1  a, b
2
  0,1
  0,1
Flexible neuro-fuzzy system: Soft NFIS (2/2)
Flexible neuro-fuzzy system: NFIS realized by
parameterised families of triangular norms (1/2)
THE DOMBI TRIANGULAR NORMS
p  0,
Flexible neuro-fuzzy system: NFIS realized by
parameterised families of triangular norms (2/2)
Flexible neuro-fuzzy system: NFIS realized by
triangular norms with weighted arguments (1/2)
T  a1,a2;w1,w2  T 1  w1 1  a1 ,1  w2 1  a2
S  a1,a2;w1,w2  S w1a1,w2 a2
T  a1 ,a 2 ;0 ,w2  T 1,1  w2 1  a 2

1  w2 1  a 2 
S  a1 ,a 2 ;0 ,w2  S 0, w2 a 2
T  a1 ,a 2 ;w1 ,0  T 1  w1 1  a1 ,1


1  w1 1  a1 
w2 a 2
S  a1 ,a 2 ;w1 ,0  S w a1 ,0

w1 , w2  0,1
w1 a1
Flexible neuro-fuzzy system: NFIS realized by
triangular norms with weighted arguments (2/2)
i

0.5 1.0000
0.2395 0.2553
6.66% 7.81%
ii

0.5 1.0000
0.2392 0.2483
7.33% 7.81%
iii

0
0.2845 0.2196
10.00% 7.81%
iv
v
RMSE / MISTAKES [%]
(TESTING SEQUENCE)
RMSE / MISTAKES [%]
(LEARNING SEQUENCE)
FINAL VALUES
AFTER LEARNING
INITIAL VALUES
NAME OF FLEXIBILITY
PARAMETER
RMSE / MISTAKES [%]
(TESTING SEQUENCE)
RMSE / MISTAKES [%]
(LEARNING SEQUENCE)
FINAL VALUES
AFTER LEARNING
INITIAL VALUES
NAME OF FLEXIBILITY
PARAMETER
Flexible neuro-fuzzy system:
Glass Identification
– experimental results
 0.5 1.0000
p
10 9.9953
I
pagr
10 9.9998 0.1856 0.2191
p
10 9.9999


1 0.9576 3.33% 6.25%
I
agr 1 0.9931

1 0.8482

p
I
pagr
p

I
agr
 
wagr
w
0.5
10
10
10
1
1
1
1
1
1.0000
9.9601
9.9997
9.9836 0.1784 0.2596
0.9213 2.00% 6.25%
0.9939
0.8456
next
slide
w agr
Flexible
neuro-fuzzy
system:
Glass Identification
– weights
representation
k  1,  ,2
w
i  1,  ,9
Weights representation
in the Glass Identification
problem (dark areas
correspond to low values
and vice versa)
Flexible neuro-fuzzy system:
Glass Identification – comparison table
Method
Testing
Acc. [%]
Dong and Kothari (IG)
92.86
Dong and Kothari (IG+LA)
93.09
Dong and Kothari (GR)
92.86
Dong and Kothari (GR+LA)
93.10
our result
93.75
r1,1
Neuro-fuzzy relational system
T
 A1 x
r2,1
T
S
rK,1
x1
 A2 x
r1,2
x2
r2,2
b2
T

T
T
rK ,2
 AK x
b1
bM
y
S
div
T
xN

r1,M
T
r2,M
T
rK,M
T
S
Neuro-fuzzy relational system with fuzzy matrix R
Neuro-fuzzy connectionist system (basic architecture)
A11
A21
y1

1
N
A
y2
x1

2
1
A
A22
x2
yK

y
div
1
AN2
1
xN

K
1
A
A2K
L1
ANK

1
L2
L3
Rule generation
The neuro-fuzzy networks
reflect fuzzy IF-THEN rules.
The network architectures
are created based on the rules.
How to get the rules ?
Basic questions:
• How many rules ?
• What kind of the membership functions
(Gaussian, triangular, trapezoidal, etc.) ?
• How to determine parameter values
of the membership functions (centers, widths) ?
Many methods
There are many methods
of rule generation.
However, most of the rules
obtained by these methods,
when applied in neuro-fuzzy
systems for classification,
result in some misclassifications.
Perception-based approach
This method generates
fuzzy IF-THEN rules,
from a data set, by use
of fuzzy granulation.
The neuro-fuzzy systems,
which utilize these rules,
perform without misclassifications.
Multi-stage classification
The perception-based approach
allows to generate fuzzy rules
and perform a multi-stage
classification without
misclassifications.
This method will be illustrated
on the IRIS example.
IRIS data set:
150 data items that contain measurements
of iris flowers from three species of iris:
Setosa, Versicolor, and Virginica;
50 data items for each of the iris species.
The data include information about four
features of the iris flowers: sepal length,
sepal width, petal length, petal width.
Ranges of the measurements
of iris flowers (in centimeters)
Sepal length
4.3 – 7.9
Sepal width
2.0 – 4.4
Petal length
1.0 – 6.9
Petal width
0.1 – 2.5
Ranges within the classes
Setosa Versicolor Virginica
Sepal
4.3 – 5.8 4.9 – 7.0 4.9 – 7.9
length
Sepal
2.3 – 4.4 2.0 – 3.4 2.2 – 3.8
width
Petal
length
1.0 – 1.9 3.0 – 5.1 4.5 – 6.9
Petal
width
0.1 – 0.6 1.0 – 1.8 1.4 – 2.5
Granulated ranges of sepal length
4.3 – 4.9
Sestosa
4.9 – 5.8
Sestosa
5.8 – 7.0
7.0 – 7.9
Versicolor
Virginica
Versicolor
Virginica
Virginica
Granulated ranges of sepal width
2.0 – 2.2
Versicolor
2.2 – 2.3
Versicolor
Virginica
Versicolor
Virginica
2.3 – 3.4
Sestosa
3.4 – 3.8
Sestosa
3.8 – 4.4
Sestosa
Virginica
Granulated ranges of petal length
1.0 – 1.9
Sestosa
3.0 – 4.5
Versicolor
4.5 – 5.1
Versicolor
5.1 – 6.9
Virginica
Virginica
Granulated ranges of petal width
0.1 – 0.6
Sestosa
1.0 – 1.4
Versicolor
1.4 – 1.8
Versicolor
1.8 – 2.5
Virginica
Virginica
Linguistic labels for sepal length
4.3 – 4.9
short sepal
A11
4.9 – 5.8
medium long sepal
A12
5.8 – 7.0
long sepal
A13
7.0 – 7.9
very long sepal
A14
Linguistic labels for sepal width
2.0 – 2.2
very narrow sepal
A21
2.2 – 2.3
narrow sepal
A22
2.3 – 3.4
medium wide sepal
A23
3.4 – 3.8
wide sepal
A24
3.8 – 4.4
very wide sepal
A25
Linguistic labels for petal length
1.0 – 1.9
very short petal
A31
3.0 – 4.5
medium long petal
A32
4.5 – 5.1
long petal
A33
5.1 – 6.9
very long petal
A34
Linguistic labels for petal width
0.1 – 0.6
very narrow petal
A41
1.0 – 1.4
medium wide petal
A42
1.4 – 1.8
wide petal
A43
1.8 – 2.5
very wide petal
A44
Rule 1
IF sepal is short or medium long and
medium wide or wide or very wide
and petal is very short and very narrow
THEN Setosa
1
1
1
3
1
2
IF x1 is A and x2 is A and x3 is A and
1
x4 is A4 THEN Setosa
A21  A23  A24  A25
A  A11  A12
1
1
A  A31
1
3
A  A41
1
4
Rule 2
IF sepal is medium long or long and
very narrow or narrow or medium wide
and petal is medium long or long and
medium wide or wide THEN Versicolor
2
1
2
3
2
2
IF x1 is A and x2 is A and x3 is A and
2
x4 is A4 THEN Versicolor
A  A12  A13
2
1
A  A21  A22  A23
2
2
A  A32  A33
2
3
A  A42  A43
2
4
Rule 3
IF sepal is medium long or long or very
long and narrow or medium wide or
wide and petal is long or very long and
wide or very wide THEN Virginica
3
1
3
3
3
2
IF x1 is A and x2 is A and x3 is A
and x4 is A43 THEN Virginica
A  A12  A13  A14
3
1
A  A33  A34
3
3
A  A22  A23  A24
3
2
A  A43  A44
3
4
NF network for the iris classification
Results of the 1st stage classification
50 data vectors correctly classified to Setosa
32 data vectors correctly classified to Versicolor
42 data vectors correctly classified to Virginica
26 data vectors – „I do not know” decision:
Versicolor or Virginica
These data vectors participate in the 2nd stage
of the classification.
2nd stage classification
Two fuzzy IF-THEN rules are formulated,
based on the granulated ranges, obtained
for the data vectors with the „I do not know”
decision in the 1st stage.
The NF network in the 2nd stage is reduced
to the components associated with the
Versicolor and Virginica classes.
Results of the 2nd stage classification
12 data vectors correctly classified to Versicolor
1 data vector correctly classified to Virginica
13 data vectors – „I do not know” decision:
Versicolor or Virginica
These data vectors participate in the 3rd stage
of the classification. Two new rules are created.
Results of the 3rd stage classification
4 data vectors correctly classified to Versicolor
5 data vectors correctly classified to Virginica
4 data vectors – „I do not know” decision:
Versicolor or Virginica
These data vectors participate in the 4th stage
of the classification. Two new rules are created.
Results of the 4th stage classification
2 data vectors correctly classified to Versicolor
2 data vectors correctly classified to Virginica
All data vectors correctly classified
after 4 stages of the classification.
No misclassifications !
IRIS data: P1, P2
IRIS
5
4,5
4
P2 (sepal width)
3,5
3
SestosaP1P2
2,5
VersicolorP1P2
VirginicaP1P2
2
1,5
1
0,5
0
0
1
2
3
4
5
P1 (sepal length)
6
7
8
9
IRIS data: P1, P3
IRIS
8
7
P3 (petal lenght)
6
5
SestosaP1P3
4
VersicolorP1P3
VirginicaP1P3
3
2
1
0
0
1
2
3
4
5
P1 (sepal lenght)
6
7
8
9
IRIS data: P2, P4
IRIS
3
2,5
P4 (petal width)
2
SestosaP2P4
1,5
VersicolorP2P4
VirginicaP2P4
1
0,5
0
0
1
2
3
4
5
P2 (sepal width)
6
7
8
9
IRIS data: P3, P4
IRIS
3
2,5
P4 (petal width)
2
SestosaP3P4
1,5
VersicolorP3P4
VirginicaP3P4
1
0,5
0
0
1
2
3
4
P3 (petal length)
5
6
7
8
Diagnosis of a tumor of mucous
membrane of uterus
Attributes :
•
•
•
•
•
•
•
•
•
period of time after menopause
BMI (Body Mass Index)
9 attributes
LH (luteinizing hormone )
FSH (follicle-stimulating hormone )
PRL (prolactin )
E1 (estron)
Data:
E2 (estradiol)
52 records of positive diagnosis
Aromatase
estrogenic receptor
Diagnosis:
13 records of negative diagnosis
negative (class 0), positive (class 1)
Ranges of the attribute values
a1
a2
a3
a4
a5
a6
a7
a8
a9
0.5 - 34
20 - 46
0.5 – 120.3
1.36 – 155.4
2.4 – 128.1
156 - 542
0.04 – 1.48
2.28 – 11.85
0.72 – 3.85
Ranges within the classes
a1
a2
a3
a4
a5
a6
a7
a8
a9
Class 0
0.5 - 20
20 - 46
1.2 – 53.9
1.63 – 88.2
3.4 – 128.1
170 - 412
0.04 – 0.27
2.28 –
10.51
0.72 – 1.05
Class 1
0.5 - 34
20 - 45
0.5 – 120.3
1.36 – 155.4
2.4 – 76.6
156 - 542
0.05 – 1.48
3 – 11,85
0.91 – 3.85
Rules for the medical diagnosis
k
1
k
9
IF x1 is A and  and x9 is A THEN Class k
k  0,1
NF network for the medical diagnosis
A 01
A 02
Attribute 1
x1
Attribute 2
x2
Attribute 9
x9
..
.
П
Class 0
П
Class 1
A 09
..
.
1
A1
A 12
..
.
A 19
Results: correct diagnosis
3 cases with the “I do not know” response
after the first stage of classification;
62 correct diagnosis for all 65 input vectors.
(95.4% correct decisions, 4.6 % “I do not know” )
The “I do not know” answers, which mean
positive or negative diagnosis, refer to the
cases that are difficult to be recognized,
because they belong to overlapping regions.
Conclusions (perception-based classification)
The perception-based approach allows
to generate fuzzy IF-THEN rules
in the same way as humans do, and
perform the multi-stage classification
without misclassifications.
Final conclusions
Neuro-fuzzy systems are soft computing
methods utilizing artificial neural
networks and fuzzy systems.
Various connectionist architectures of
neuro-fuzzy systems can be constructed.
The knowledge acquisition concerns
fuzzy IF-THEN rules, and is performed
by a learning process.
The systems realize an inference (fuzzy
reasoning) based on these rules.
Download