Uploaded by crazzy_rangers

Sisteme inteligente de control curs-IANCU ION

advertisement
Research Center for Artificial Intelligence
Department of Computer Science
University of Craiova, Romania
Rules-Based Reasoning
Under Uncertainty and Imprecision
Ion Iancu
Editura Universitaria
2010
Referenţi ştiinţifici:
Prof. dr. George Georgescu,
Universitatea din Bucureşti
Prof. dr. Dumitru Buşneag,
Universitatea din Craiova
2
CONTENTS
Preface…….…………….……………………………………………5
1
Fuzzy sets…………………………..………………..……….......9
1.1 Basic notions…………………….............................................9
1.2 Operations on fuzzy sets........................................................11
1.3 Fuzzy numbers……………...................................................28
1.4 Fuzzy relations........................................................................35
2
Uncertainty.........................................................….……………39
2.1 Possibility and necessity measures …….......…….………….39
2.2 Belief and plausibility functions…...………………………..46
2.3 Dempster’s rule of combination….………………………….50
2.4 Approximations of basic assignment probability……………54
2.5 Algorithms for generalized belief functions…………………63
3
Uncertain and imprecise knowledge representation..………..75
3.1 Linguistic representation………... …….......…….………….75
3.2 Facts representation…………...……………………………..80
3.3 Implications………………………………………………….82
3.4 Rules representation…………………………………………86
4
Reasoning with imprecise and/or uncertain rules.....………..91
4.1 Generalized Modus Ponens rule……………...….…………..91
4.2 Uncertain Generalized Modus Ponens reasoning…..………101
4.3 Uncertain hybrid reasoning………………………………...106
4.4 Fuzzy logic control..………………………………………..113
4.5 Extended Mamdani fuzzy logic controller…………………120
4.5.1 The proposed model……………………………….121
4.5.2 An application…………………………………..…130
4.6 Fuzzy classifier system…………………………………… 134
References........................................................................................ 143
3
4
PREFACE
Many rule-based expert systems or applications of Artificial
Intelligence (AI) are based on two valued logic. However, classifications
in the real world often do not have these sharp boundaries. For instance
the characteristic of intelligence, in many cases, is only true to a degree.
Classical two-valued logic is not designed to deal with properties that are
a matter of degree. The multi-valued logic, in which an attribute can be
possessed to a degree from [0, 1], solved some of these vagueness issues.
In the 1930's quantum the philosopher Max Black drew the first fuzzy-set
membership functions and called the uncertainty of the structures
vagueness. In 1951, Menger coined the term ensemble flou which has
become the French counterpart of fuzzy set, with flou meaning hazy,
vague, fuzzy. But multi-valued logic systems were not used extensively
because they did not go far enough. A very important event was given by
the landmark paper by Zadeh (1965), in which he presented enough
mathematical theory in order to work with the concept of fuzzy sets.
The main difference between Zadeh’s fuzzy logic and multivalued logic consists in the fact that in fuzzy logic one can deal with
fuzzy quantifiers like very, few, most, where fuzzy logic truth itself can
be fuzzy. It is clear that fuzzy logic provides a system which is flexible
enough to serve as a framework for linguistic control. When rule-based AI
systems were first conceived in the middle of 1950's they were supposed
to provide the ability to simulate human decision in an uncertain
environment.
In a typical rule-based AI system, the knowledge is acquired,
stored and processed as facts and rules, not numerical entities. The
collection of rules are defined as the knowledge base. The framework of
the if − then rules is chained through, firing a premise only if it is true.
Knowledge is searched through logical paths of the knowledge tree via an
5
inference process. These systems exploit structured knowledge when they
are available, but in most cases the experts are unable to define the
propositional rules in the format required to approximate the behavior of
an expert .
Fuzzy systems have been introduced in the middle of 60's and
have the ability to learn the system knowledge, using numerical or
linguistic data as input, and produce an estimation of the input-output
relationships. The fuzzy system fires each fuzzy rule in parallel, but with
different degrees, and then infers a conclusion from a weighted
combination of the consequence from each fired rule. The main purpose
of fuzzy logic systems is to deal with systems which are inherently fuzzy:
the processes are too complicated to be fully understood in terms of exact
mathematical models and therefore incapable of being controlled in a
precise way using classical control techniques. Fuzzy logic systems have
been demonstrating, since the middle of 70's (Mamdani & Assilian,
1975), the ability to deal with the goals and constraints that are required
by these ill-defined fuzzy systems.
A problem area is a candidate for fuzzy logic if (Pedrycz, 1983):
(1) the considered system is complex or ill-defined
(2) there are major difficulties for creating an exact mathematical model
(3) there are extensive experience and intuition available from process
operators
(4) lack of measurements, due to costs or noise, makes it impossible to
apply conventional statistical and/or control methods
Fuzzy systems can be of two types: the more common rule-based
systems, or, relational-based system, which permits numerical analysis.
In rule-based fuzzy systems a series of rules are developed that equate the
fuzzy input membership functions to the fuzzy output membership
functions. The rules can be formulated using the same if - then rules,
typical of expert systems, or by using look-up-tables, which consolidate
all the rule-base information.
6
Rule-based systems are most commonly used in applications of
fuzzy control. Other names for the if- then rules are production, premiseaction or antecedent-consequent rules. The rules describe in qualitative
terms how the controller output will behave when it is subjected to
various inputs. The consequent part of the rule assigns a value to the
output set, based on the conditional part of the statement. The degree of
this assignment modifies the value of the output membership by applying
it to the degree of truth for the conditional expression. Each rule produces
a fuzzy output set and the union of these sets is the overall output.
The biggest problem with rule-based systems is to obtain the
appropriate rules and then to ensure that the rules are consistent and
complete (Graham et al., 1988). Adaptive techniques are available for
some types of applications which allow the rule-based systems to learn
and to self-modify (Graham et al., 1989). Also, genetic algorithm
techniques allow the process to self-generate the rules (Karr, 1991a,
1991b; Nakashima, 2000).
Professor Lotfi Zadeh introduced the possibility theory in 1978 as
an alternative to the probability theory and an extension of his theory of
fuzzy sets and fuzzy logic. D. Dubois and H. Prade further contributed to
its development, in a series of papers. Approximate reasoning, based on
fuzzy set theory and possibility theory, provides several techniques to
reason with fuzzy and uncertain concepts in knowledge-based systems.
Applying fuzzy techniques in knowledge-based systems can provide a
knowledge representation and inference which is closer to the humans
reasoning rather than the conventional knowledge-based systems based on
“classical logic”. This comes very close to the field of natural language
understanding and processing Fuzzy logic and approximate reasoning
enable us to
model human reasoning by means of computer
implementations.
Because the uncertainty and the imprecision cannot be ignored when
modeling human expertise, we present some possibilities to work with
imprecise and/or uncertain knowledge in inferential processes. The paper
is organized as follows:
7
• The chapter “Fuzzy Sets” contains the basics of fuzzy set theory
that are necessary for a correct understanding of the rest of this paper. We
present the fuzzy set definition and representation, operations with fuzzy
sets using the Zadeh’s formulas and t-norm, t-conorm and negation
operators. Also, a special type of fuzzy sets, referred to as fuzzy numbers,
is described further on restricted to LR representation. Finally, some
notions about fuzzy relations: definition, operations and the composition
of a fuzzy relation with another fuzzy relation or a fuzzy set are detailed.
• The chapter “Uncertainty” describes two possibilities to quantify
the uncertainty: the possibility and necessity measures, denoted as Π and
N , and the belief and plausibility functions, referred to as Bel and Pls ,
respectively.
• The
chapter
“Uncertain and Imprecise Knowledge
Representation” is dedicated to knowledge representation that will be
used in inferential methods from the last part of this chapter, such as:
linguistic variables, linguistic hedges, canonical form of an elementary
proposition, rules representation by fuzzy implications.
• The last chapter presents some methods for reasoning and their
applications in fuzzy logic control and fuzzy classification.
Because the analyzed domain of this book is very large, the proofs of
the theorems are omitted in order to present a high amount of theoretical
concepts. The corresponding proofs can be found in the works included in
the references.
This volume is addressed to graduate students and to each person
interested in approximate reasoning and its applications.
8
1
FUZZY SETS
1.1
BASIC
NOTIONS
A classical (crisp) set is defined as a collection of elements
x ∈ X . An element can either belong or not belong to a set A, A ⊆ X .
Such a classical set can be described in different ways: one can enumerate
its elements, one can use the analytical representation (for instance,
A = {x / x ≤ 5}) or the membership (characteristic) function. The
characteristic function χ A of a subset A ⊆ X is a mapping
χ A : X → {0,1} ,
where the value zero is used to represent the non-membership and the
value one is used to represent the membership. The truth and falsity of the
statement “ X is in A ” is determined by the ordered pair ( x , χ A ( x )) : the
statement is true if the second element of the ordered pair is 1 and the
statement is false if it is 0.
Fuzzy sets were introduced by Zadeh (1965) in order to represent
and manipulate data that was not precise, but rather fuzzy. Similarly with
the crisp case, a fuzzy subset A of a set X is defined as a collection of
ordered pairs with the first element from X and the second element from
the interval [0, 1] ; the set X is referred to as the universe of discourse for
the fuzzy subset A .
9
Definition 1.1. (Zadeh, 1965) If X is a nonempty set then a fuzzy set A
X is defined by its membership function μ A : X → [0, 1] , where
in
μ A (x ) represents the membership degree of the element x in the fuzzy
set A ; then A is represented as A = {( x , μ A ( x )) / x ∈ X } .
Example 1.1. We can define the set of natural numbers “close to 1” by
A = {(− 2 , 0.25) , (− 1, 0.5) , (0 , 0.75) , (1, 1) , (2 , 0.75) , (3, 0.5) , (4 , 0.25)}
or by
μ A (x ) =
1
.
2
1 + ( x − 1)
If A is a fuzzy set in X then we often use the notation
n
A = μ A ( x1 ) / x1 + μ A ( x 2 ) / x 2 + L + μ A ( x n ) / x n = ∑ μ A ( xi ) / xi
i =1
and
A = μ A ( x1 ) / x1 + μ A ( x 2 ) / x 2 + L = ∫X μ A ( x ) / x
for discrete case and continuous case, respectively.
Definition 1.2. A fuzzy subset A of a classical set X is called normal if
there exists x ∈ X such that μ A ( x ) = 1 ; otherwise A is subnormal.
A nonempty fuzzy set A can be always normalized by dividing μ A (x )
with sup μ A ( x ) .
x
Definition 1.3. Let A be a fuzzy subset of X ; the support of A , denoted
with supp ( A) , is the crisp subset of X given by
supp ( A) = {x ∈ X / μ A ( x ) > 0}.
10
Definition 1.4. An α -level set or α -cut of a fuzzy set A of X is a non-
fuzzy set Aα defined by
⎧{x ∈ X / μ A ( x ) ≥ α }
Aα = ⎨
⎩cl (supp ( A))
if
if
α >0
α =0
where cl (supp ( A)) is the closure of the support of A ; the 1-level set of
A is named core of A .
Example 1.2. (Fuller, 1995, 1998) For X = {− 2 , − 1, 0 , 1, 2 , 3, 4} and
A = 0 .0 / − 2 + 0 .3 / − 1 + 0 .6 / 0 + 1 .0 / 1 + 0 .6 / 2 + 0 .3 / 3 + 0 .0 / 4
we have
⎧{− 1, 0 , 1, 2 , 3} if 0 ≤ α ≤ 0.3
⎪
Aα = ⎨{0 , 1, 2}
if 0.3 < α ≤ 0.6
⎪{1}
if 0.6 < α ≤ 1
⎩
1.2
OPERATIONS
ON FUZZY SETS
The classical set of theoretical operations from the ordinary set
theory can be extended, in different ways, to fuzzy sets via their
membership functions. The basic operations were suggested by Zadeh
(1965) . Let A and B are fuzzy subsets of a nonempty (crisp) set X .
Definition 1.5. The intersection of A and B is defined as
μ A∩ B ( x ) = min{μ A ( x ), μ B ( x )}, ∀x ∈ X
Definition 1.6. The union of A and B is defined as
μ A∪ B ( x ) = max{μ A ( x ), μ B ( x )}, ∀x ∈ X
11
Definition 1.7. The complement ¬A of a fuzzy set A is defined as
μ ¬A ( x ) = 1 − μ A ( x )
Example 1.3. (Zimmermann, 1991) Let A and B be two fuzzy subsets
of X = {1, 2 , L , 10}
A = {(1, 0.2 ) , (2 , 0.5) , (3, 0.8) , (4 , 1) , (5, 0.7 ) , (6 , 0.3)}
B = {(3, 0.2 ) , (4 , 0.4 ) , (5, 0.6 ) , (6 , 0.8) , (7 , 1) , (8, 1)}.
Then
C = A ∩ B = {(3, 0.2 ) , (4 , 0.4 ) , (5, 0.6 ) , (6 , 0.3)}
D = A ∪ B = {(1, 0.2 ) , (2 , 0.5) , (3, 0.8) , (4 , 1) , (5, 0.7 ) , (6 , 0.8) , (7 , 1) , (8, 1)}
¬B = {(1, 1) , (2 , 1) , (3, 0.8) , (4 , 0.6 ) , (5, 0.4 ) , (6 , 0.2 ) , (9 , 1) , (10 , 1)} .
Example 1.4. (Zimmermann, 1991) We consider the fuzzy sets
A = „ x is considerable larger than 10“
B = „ x is approximately 11“,
given by their membership functions
⎧⎪0
⎪⎩ 1 + ( x − 10)− 2
μ A (x ) = ⎨
(
and
(
)
−1
μ B ( x ) = 1 + ( x − 11)4
Then
−1
x ≤ 10
for
x > 10
.
[(
) , (1 + (x − 11) )
]
[(
) , (1 + (x − 11) )
]
⎧⎪min 1 + ( x − 10)− 2
μ A∩ B ( x ) = ⎨
⎪⎩0
μ A∪ B ( x ) = max 1 + (x − 10)−2
12
)
for
−1
−1
4 −1
4 −1
for x > 10
for x ≤ 10
for x ∈ X .
Figure 1.1 Intersection and union of two fuzzy sets
Another operations with fuzzy sets are the algebraic operations. Some of
these are presented below.
Definition 1.8. Let A1 , L , An be fuzzy sets in
X 1 , L , X n ; the Cartesian
product is a fuzzy set in the product space X 1 × L × X n
with the
membership function
μ A ×L× A (x ) = min{μ A ( xi ) / x = (x1 , L , x n ), xi ∈ X i }.
1
n
i
i
Definition 1.9. The m -th power of a fuzzy set A is defined by
μ A ( x ) = [μ A ( x )] m , x ∈ X .
m
Definition 1.10. The algebraic (probabilistic) sum C = A + B is defined
as C = {( x ,μ A+ B ( x )) / x ∈ X } where
μ A+ B ( x ) = μ A ( x ) + μ B ( x ) − μ A ( x ) × μ B ( x )
.
Definition 1.11. The bounded sum
C = {( x ,μ A⊕ B ( x )) / x ∈ X } ,
C = A⊕ B
is defined by
μ A⊕ B ( x ) = min{1, μ A (x ) + μ B (x )} .
Definition 1.12. The bounded difference C = AΘB
C = {( x ,μ AΘB ( x )) / x ∈ X } ,
is given by
13
μ AΘB ( x ) = max{0, μ A ( x ) + μ B ( x ) − 1} .
Definition 1.13. The algebraic product C = A ∗ B is defined as
C = {( x ,μ A∗B ( x )) / x ∈ X } ,
μ A∗B ( x ) = μ A ( x ) × μ B ( x ) .
Example 1.5. (Zimmermann, 1991) Let A = {(3, 0.5), (5, 1), (7 , 0.6 )} and
B = {(3, 1), (5, 0.6 )} . Then, according to the above definitions, we obtain
A × B = { [(3, 3),0.5] ,[(5, 3),1], [(7 , 3) , 0.6], [(3,5) , 0.5], [(5, 5) , 0.6],[(7 , 5), 0.6] }
A 2 = {(3, 0.25) , (5, 1) , (7 , 0.36)}
A + B = {(3, 1) , (5, 1) , (7 , 0.6 )}
A ⊕ B = {(3, 1) , (5, 1) , (7 , 0.6 )}
AΘB = {(3, 0.5) , (5, 0.6 )}
A ∗ B = {(3, 0.5) , (5, 0.6 )}.
Definition 1.14. The inclusion and the equality operations are defined by
A = B ⇔ μ A ( x ) = μ B ( x ) ∀x ∈ X
A ⊆ B ⇔ μ A ( x ) ≤ μ B (x ) ∀x ∈ X .
Another way to define intersection and union of two fuzzy sets
was addressed by Bellman and Giertz in 1973, by interpreting the
intersection as “logical and” and union as “logical or”. There are used
triangular norms and triangular conorms in order to model logical
connectives “and” and “or”, respectively. Triangular norms and conorms
were introduced by Schweizer and Sklar (1960) in order to model the
distances in probabilistic metric spaces.
Definition 1.15. A function T : [0, 1]× [0 , 1] → [0 , 1] is a t-norm iff it is
commutative, associative, non-decreasing and T (x , 1) = x ∀x ∈ [0 , 1] . A
continuous t-norm T is called Archimedean if T ( x, x ) < x ∀x ∈ (0, 1) .
14
The most important t-norms are:
•
Minimum: Tm ( x , y ) = min{x , y}
•
Lukasiewicz: TL ( x , y ) = max{x + y − 1, 0}
•
Probabilistic: T p (x , y ) = xy
•
⎧min{x , y} if max{x , y} = 1
Weak: Tw (x , y ) = ⎨
otherwise
⎩0
Definition 1.16. A function S : [0 , 1]× [0 , 1] → [0 , 1] is a t-conorm iff it is
commutative, associative, non-decreasing and S (x , 0 ) = x ∀x ∈ [0 , 1] . A
continuous t-conorm S is called Archimedean if S ( x , x ) > x ∀x ∈ (0 , 1) .
The basic t-conorms are:
•
Maximum: S m ( x , y ) = max{x , y}
•
Lukasiewicz: S L (x , y ) = min{x + y , 1}
•
Probabilistic: S p (x , y ) = x + y − xy
•
⎧max{x , y} if min{x , y} = 0
Strong: S s ( x , y ) = ⎨
otherwise
⎩1
The names weak t-norm and strong t-conorm result from the following
inequalities:
Tw ( x , y ) ≤ T ( x , y ) ≤ min{x , y}
max( x , y ) ≤ S ( x , y ) ≤ S s {x , y}
for every t-norm T and t-conorm S .
In addition, for every t-norm T and t-conorm S ,
T (0 , 0 ) = S (0 , 0) = 0 , T (1, 1) = S (1, 1) = 1 .
Definition 1.17. A strong negation is an involutive decreasing function
from [0 , 1] into itself.
15
The relation between a t-norm T and a t-conorm S , via a strong
negation , is given by the next theorem.
Theorem 1.1. (Alsina, Trillas & Valverde, 1980) If T is a t-norm and
C is a strong negation then
((
))
((
))
S ( x , y ) = C T C( x ) , C ( y )
is a t-conorm and reciprocally,
T ( x , y ) = C S C( x ) , C ( y ) ;
namely, T and S are C-dual.
Ling (1965) proved that any continuous Archimedean t-norm can
be written in the form
T ( x, y) = f
( −1)
( f ( x) + f ( y) )
where f : [0 ,1] → [0 , ∞ ) is a strict decreasing continuous function and
f
( −1)
is the pseudo-inverse of f defined by
⎧1
⎪
f (−1) ( x ) = ⎨ f −1 ( x )
⎪0
⎩
if x ∈ [0 , f (1)]
if x ∈ [ f (1), f (0)]
if x ∈ [ f (0 ), ∞ ).
A representation for strong negations was given by Trillas (1979)
in the form
C ( x ) = t −1 (t (1) − t ( x ))
where t : [0,1] → [0, ∞ ) is a continuous and strictly increasing (or
decreasing) function with t (0 ) = 0 (or
t (0 ) = 1 )
and t (1) is a finite
number.
Instead of ''+'' and ''- '' from Ling’s and Trillas’s formulas we can
use general operations obtaining, in this way, t-norms and negations on a
easier way.
16
f : [0 , 1] → I ⊆ [0 , ∞ ) be a continuous
Theorem 1.2. (Iancu, 1997) Let
strictly decreasing function and Δ : I × I → I
properties:
with the following
(2.1) Δ( x, y ) = Δ( y, x )
(2.2) Δ( x, Δ( y, z )) = Δ(Δ( x, y ), z )
(2.3) Δ( x, y ) ≤ Δ( x, z ) if y ≤ z with equality iff y = z
(2.4) Δ is continuous
(2.5) Δ( x , e ) = x
for all x , y , z ∈ I and e = f (1) . Then
∀x, y ∈ [0, 1]
T ( x, y ) = f (−1) (Δ( f ( x ), f ( y )))
is a t-norm where f ( −1 ) is the pseudo-inverse of f .
Example 1.6. For I = [0 , ∞ ), Δ( x , y ) = x + y + xy , f ( x ) = 1 − x we obtain the
t-norm T ( x , y ) = max(2 x + 2 y − xy − 2, 0 ) .
Theorem 1.3. (Iancu, 1997) Let I ⊆ R and Δ : I × I → I an application
satisfying the following conditions, for all x , y , z ∈ I :
(3.1) - (3.4) identical to (2.1) - (2.4)
(3.5) there is e ∈ I such that Δ( x , e ) = x
∀x ∈ I
(3.6) ∀x ∈ I there is x' ∈ I such that Δ( x , x' ) = e and ϕ : I → I' ,
ϕ ( x ) = x' is a continuous strictly decreasing function
(3.7) let
J = [e , ∞ ) ⊆ I
and
t : [0 , 1] → J
be a continuous strictly
increasing function with t (0 ) = e and t (1) is a finite number.
We have that C ( x ) = t −1 (Δ(t (1) , ϕ(t ( x )))) is a strong negation for x ∈ [0, 1] .
Example
1.7.
For
I = R , Δ ( x , y ) = x + y − 1, e = 1 ,
J = [1, ∞ ) ,
t ( x ) = (2 x + 1) ( x + 1) and ϕ ( x ) = 2 − x one obtain C ( x ) = (1 − x ) (1 + 3 x ) .
17
Observation 1.1. Simultaneously usage of the functions Δ and f
(respectively t ) in the previous theorems allows to obtain t-norms
(respectively negations) in an easier way than in the case Δ = + . For
instance, if Δ = x + y then we don't work with functions of the type
f ( x ) = ax + b
in order to obtain the t-norm from Example 1, more
complicated forms being necessary.
If the ordinary t-norms (t-conorms) are used for combining the
information with great (small) belief degrees then the obtained results are
often contrary with the reality (Iancu, 1997). We consider for combination
n facts having the same belief degree, d ; the previous observation is
illustrated in table 1 and 2.
t-norm
xy
d
0.9
0.8
MAX ( 0, x + y − 1)
xy
x + y − xy
0.9
0.8
0.9
0.8
λxy
1 − ( 1 − λ) ( x + y − xy )
18
0.9
n
5
10
20
5
10
20
5
≥ 10
≥5
5
10
20
5
10
20
5
10
20
result
0.59049000000
0.34867844010
0.12157665459
0.32768000000
0.10737418240
0.01152921504
0.5
0
0
0.64285714286
0.47368421053
0.31034482758
0.44444444444
0.28571428571
0.16666666667
0.53656519764
0.23700106268
0.03550161915
and λ =
1
2
t-conorm
x + y − xy
0.8
5
10
20
Table 1
0.23272727273
0.03409185491
0.00060127649
d
0.1
n
5
10
20
5
10
20
5
≥ 10
≥5
5
10
20
5
10
20
5
10
20
result
0.40951000000
0.65132155990
0.87842334541
0.67232000000
0.89262581760
0.98847078495
0.5
1
1
0.35714285714
0.52631578947
0.68965517242
0.55555555556
0.71428571428
0.83333333333
0.46343480236
0.76299893732
0.96449838084
5
10
20
Table 2
0.76727272727
0.96590814509
0.99939872350
0.2
MIN ( 1, x + y )
x + y − 2 xy
1 − xy
0.1
0.2
0.1
0.2
λ( x + y ) + xy( 1 − 2 λ)
λ + xy( 1 − λ)
and λ =
1
2
0.1
0.2
Starting from a given t-norm (t-conorm) we can obtain a new tnorm (t-conorm) which gives results in accordance with the reality if it is
used for combining the information with great (small) belief degrees. The
first result with this property is given in (Pacholczyk, 1987), where are
presented the following operators with threshold a ∈ (0, 1) :
19
⎧ 1− a
⎪⎪1 − a x
- negation C a ( x ) = ⎨
⎪ a (1 − x )
⎪⎩1 − a
if x ≤ a
if x ≥ a
⎧ a ⎛1− a 1− a ⎞
T⎜
x,
y⎟
⎪
- t-norm Ta ( x , y ) = ⎨1 − a ⎝ a
a
⎠
⎪min( x , y )
⎩
if x ≤ a and y ≤ a
if
x > a or
y > a.
corresponding to t-norm T ( x , y ) ;
⎧S ( x , y )
- t-conorm S a ( x , y ) = ⎨
⎩max ( x , y )
if x ≥ a and y ≥ a
if x < a or y < a .
corresponding to t-conorm S (x , y ) .
The t-norm Ta and the t-conorm S a obtained from T ( x , y ) = xy and
respectively S ( x , y ) = x + y − xy has been used with very satisfactory
results (Pacholczyk, 1987) for constructing an expert system that
processes the uncertain questions - the SEQUI system.
In a set of papers (Iancu, 1997a; 1997b; 1998a; 1999a; 1999b;
2003; 2005) this result was generalized working in an ordered field
(I , ⊕ , ⊗), I ⊂ R
and with an arbitrary number n ≥ 1 of thresholds; in this
way, various classes of operators with threshold are obtained.
We remain in the conditions of Theorem 1.3 and note Δ( x , y ) = x ⊕ y.
We take ⊗: I × I → I with the following properties:
i) x ⊗ y < x ⊗ z iff y < z ∀x , y , z ∈ I and x > e ,
ii) (I , ⊕, ⊗) is a field.
1
the inverse element of x corresponding to ⊕ and ⊗
x
1 x
respectively. For the simplification of writing we note x ⊗ =
and
y y
x ⊗ x = x2.
We note Θx and
20
Theorem 1.4. (Iancu, 2003; 2005) If ⊕, Θ, ⊗ and t have the previous
meaning and n ∈ N , n ≥ 1 , 0 < a1 < a2 < L < a n < 1 ,
δ (i ) =
t (a n −i )Θt (a n −i +1 )
t (a ) ⊗ t (a n −i +1 )Θt (ai ) ⊗ t (a n −i )
and θ (i ) = i +1
then
t (ai +1 )Θt (a i )
t (ai +1 )Θt (ai )
⎧ −1 ⎛
⎞
t (1)Θt (a n )
⊗ t ( x )⎟⎟ if x ≤ a1
⎪t ⎜⎜ t (1)Θ
t (a1 )
⎠
⎪ ⎝
⎪ −1
C a1 ,Lan ( x ) = ⎨t (t (x ) ⊗ δ (i ) ⊕ θ (i )) if ai ≤ x ≤ ai +1 and 1 ≤ i < n
⎪
⎪t −1 ⎛⎜ (t (1)Θt ( x )) ⊗ t (a1 ) ⎞⎟ if x ≥ a
n
⎪ ⎜⎝
t (1)Θt (a n ) ⎟⎠
⎩
⎧ −1 ⎛
⎞
t (a1 ) ⊗ t (a n ) ⊗ (t (1) ⊕ t ( x ))
⎟⎟ if x ≤ a1
⎪t ⎜⎜
(
(
)
(
)
(
)
)
(
)
(
)
(
)
1
⊕
Θ
⊗
⊕
⊗
t
t
a
t
a
t
x
t
a
t
a
1
1
n
n ⎠
⎪ ⎝
⎪ −1
*
C a1 ,Lan ( x ) = ⎨t (t ( x ) ⊗ δ (i ) ⊕ θ (i )) if ai ≤ x ≤ ai +1 and 1 ≤ i < n
⎪
⎞
t (a1 ) ⊗ t (a n ) ⊗ (t (1)Θt ( x ))
⎪t −1 ⎛⎜
⎟ if x ≥ a n
⎜
⎪ ⎝ (t (1) ⊕ t (a1 )Θt (a n )) ⊗ t ( x )Θt (a1 ) ⊗ t (a n ) ⎟⎠
⎩
and
⎧ −1 ⎛
⎞
t (a1 ) ⊗ t (a n ) ⊗ t (1)
⎟⎟ if x ≤ a1
⎪t ⎜⎜
(
)
(
)
(
(
)
(
)
)
(
)
1
⊗
⊕
Θ
⊗
t
a
t
a
t
t
a
t
x
1
n
n
⎝
⎠
⎪
⎪
C a*1*,Lan ( x ) = ⎨t −1 (t (x ) ⊗ δ (i ) ⊕ θ (i )) if a i ≤ x ≤ a i +1 and 1 ≤ i < n
⎪
⎪t −1 ⎛⎜ t (a1 ) ⊗ t (a n ) ⊗ (t (1)Θt ( x )) ⎞⎟ if x ≥ a
n
⎪ ⎜⎝
(t (1)Θt (a n )) ⊗ t (x ) ⎟⎠
⎩
are strong negations and they have
21
⎛ ⎛
⎞⎞
⎞ ⎛
⎜ t⎜ a ⎟ ⊕ t⎜ a
⎟⎟
⎜ ⎜ ⎡⎢ n ⎤⎥ ⎟ ⎜ ⎡⎢ n +1 ⎤⎥ ⎟ ⎟
⎣2⎦ ⎠
⎝ ⎣ 2 ⎦ ⎠⎟
t −1 ⎜ ⎝
2
⎜
⎟
⎜
⎟
⎜
⎟
⎝
⎠
as fixed point, where [x] is the greatest integer which is smaller than or
equal to x .
Observation 1.2. It is easy to verify the following properties:
(i )
(ii )
(iii )
x ≤ a1 ⇔ N a1 ,....,an ( x ) ≥ a n
x ≥ a n ⇔ N a1 ,....,an ( x ) ≤ a1
x ∈ [ai , ai +1 ] ⇔ N a1 ,...,an ( x ) ∈ [a n −i , a n −i +1 ] ∀i ∈ {1, 2 ,L , n − 1}
{
}
where N a1 ,L, an ∈ C a1 ,L, an , C * a1 ,L, an , C ** a1 ,L, an .
Observation 1.3. The relation (ii) says that if the confidence of a
proposition p is greater than or equal to the threshold a n then the
confidence of non p is smaller than or equal to the threshold a1 . This
observation can be used for handling the confidences associated with the
information from a knowledge base.
Example 1.8. (Iancu, 2003; 2005) For ⊕ = + , ⊗ = × , n = 2 and
2x
one obtain
t (x ) =
x +1
⎧ 2a1 a 2 x + a 2 x − x + a1 a 2 + a1
if x ≤ a1
⎪
⎪ 2a1 x − a 2 x + x + a1 a 2 + a1
⎪ (a a − 1)x + a1 + a 2 + 2a1 a 2
C a1 ,a2 (x ) = ⎨ 1 2
if a1 ≤ x ≤ a 2
(
)
+
+
+
−
x
a
a
a
a
2
1
1
2
1 2
⎪
⎪
(1 − x )a1 (1 + a 2 )
if x ≥ a 2
⎪
⎩ x(1 + 2a1 − a 2 ) + 1 − 2a1 a 2 − a 2
22
⎧
a1 a 2 (1 + 3 x )
if x ≤ a1
⎪
⎪ x(1 + 3a1 − a 2 ) + a1 a 2
⎪ (a a − 1)x + a1 + a 2 + 2a1 a 2
C a*1 ,a2 (x ) = ⎨ 1 2
if a1 ≤ x ≤ a 2
(
)
+
+
+
−
2
1
x
a
a
a
a
1
2
1
2
⎪
⎪
(1 − x )a1a 2
if x ≥ a 2
⎪
⎩ x(1 + 3a1 − a 2 ) − 3a1 a 2
and
⎧
a1 a 2 (1 + x )
if x ≤ a1
⎪
(
)
+
−
+
x
a
a
a
a
1
1
2
1
2
⎪
⎪ (a a − 1)x + a1 + a 2 + 2a1 a 2
if a1 ≤ x ≤ a 2
C a*1*,a2 (x ) = ⎨ 1 2
⎪ x(a1 + a 2 + 2) + 1 − a1 a 2
⎪
(1 − x )a1 a 2
if x ≥ a 2
⎪
⎩ x(1 + a1 − a 2 ) − a1 a 2
Theorem 1.5. (Iancu, 2005) Let S be a t-conorm and S '∈ {S M , S }. For
0 < a1 < a 2 < L < a n < 1
⎧max( x , y ) if x < a1 or y < a1
⎪
S a1 ,L,an ;S ' ( x , y ) = ⎨S ( x , y )
if x ≥ a n and y ≥ a n
⎪ S' ( x , y )
otherwise
⎩
is a t-conorm.
Theorem. 1.6. (Iancu, 2005) Let (T , S ) be a pair (t-norm, t-conorm) dual
with respect to C ( x ) = t −1 (t (1)Θt ( x )) , S '∈ {S M , S } and T ' is the dual of S '
with respect to the same negation, i.e. T '∈ {TM , T } . We use the following
notations
k=
t (a1 )
,
t (1)Θt (a n )
23
ς = t( 1 ) ⊕ t (a1 )Θt (a n ) ,
⎛1
⎝k
⎞
⎠
α (z ) = t −1 ⎜ ⊗ t ( z ) ⎟ ,
⎞
t (z ) ⊗ (t (1) ⊕ t (a1 )) ⊗ (t (1)Θt (a n ))
⎟⎟ ,
(
(
)
(
)
(
)
)
(
)
(
)
(
)
⊕
Θ
⊗
⊕
⊗
t
1
t
a
t
a
t
z
t
a
t
a
1
n
1
n ⎠
⎝
⎛
γ (z ) = t −1 ⎜⎜
β (z , i ) = t −1 (t (1)Θt (z ) ⊗ δ (i )Θθ (i )) ,
where δ (i ) and θ (i ) have the same signification from the last theorem.
For 0 < a1 < a2 < L < a n < 1 we define
⎧t −1 (k ⊗ t (T (α ( x ) ,α ( y )))) if x ≤ a1 and y ≤ a1
⎪
⎪
⎪t −1 (k ⊗ t (T ' (α ( x ) , β ( y ,i ))))
⎪
⎪
if x ≤ a1 and y ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1
⎪
⎪
⎪t −1 (k ⊗ t (T ' (β ( x ,i ) ,α ( y ))))
⎪
if y ≤ a1 and x ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1
⎪
⎪
⎪
⎪⎪t −1 ((t (1)Θt (T ' (β ( x ,i ) , β ( y , j )))) ⊗ δ (k ) ⊕ θ (k ))
Ta1 ,L,an ;T ' ( x , y ) = ⎨
if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 , l = max(i , j ) and
⎪
⎪
there is an int eger k ∈ [n − l , n − 1] such that
⎪
⎪
T ' (β ( x ,i ), β ( y , j )) ∈ [C (a k +1 ), C (a k ))
⎪
⎪
⎪t −1 (k ⊗ t (T ' (β ( x ,i ) , β ( y , j ))))
⎪
if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 and
⎪
⎪
T ' (β ( x ,i ), β ( y , j )) < C (a n )
⎪
⎪
⎪
⎪⎩min( x , y )
if x > a n or y > a n
and
24
(
]
(
]
⎧ −1 ⎛
⎞
t (a1 ) ⊗ t (a n ) ⊗ t (T (γ ( x ) ,γ ( y )))
⎟⎟
⎪t ⎜⎜
⎪ ⎝ ς ⊗ (t (1)Θt (T (γ ( x ),γ ( y ))))Θt (a1 ) ⊗ t (a n ) ⎠
⎪
if x ≤ a1 and y ≤ a1
⎪
⎪
⎪
⎞
t (a1 ) ⊗ t (a n ) ⊗ t (T ' (γ (x ) , β ( y ,i )))
⎪t −1 ⎛⎜
⎟
⎜
⎪ ⎝ ς ⊗ (t (1)Θt (T ' (γ ( x ), β ( y , j ))))Θt (a1 ) ⊗ t (a n ) ⎟⎠
⎪
if x ≤ a1 and y ∈ (ai , ai +1 ], 1 ≤ i ≤ n − 1
⎪
⎪
⎪
⎪ −1 ⎛
⎞
t (a1 ) ⊗ t (a n ) ⊗ t (T ' (β ( x ,i ) ,γ ( y )))
⎟⎟
⎪t ⎜⎜
(
(
)
(
(
(
)
(
)
)
)
)
(
)
(
)
⊗
Θ
Θ
⊗
ς
β
γ
1
t
t
T
'
x
,
i
,
y
t
a
t
a
1
n ⎠
⎪ ⎝
⎪
if x ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1 and y ≤ a1
⎪
⎪
⎪
Ta*1 ,Lan ;T ' ( x , y ) = ⎨t −1 ((t (1)Θt (T ' (β ( x ,i ), β ( y , j )))) ⊗ δ (k ) ⊕ θ (k ))
⎪
if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 , l = max(i , j ) and
⎪
⎪
there is an int eger k ∈ [n − l , n − 1] such that
⎪
T ' (β ( x ,i ), β ( y , j )) ∈ [C (a k +1 ), C (a k ))
⎪
⎪
⎪
⎪ −1 ⎛
⎞
t (a1 ) ⊗ t (a n ) ⊗ t (T ' (β ( x ,i ) , β ( y , j )))
⎟⎟
⎪t ⎜⎜
(
(
)
(
(
(
)
(
)
)
)
)
(
)
(
)
⊗
Θ
Θ
⊗
t
t
T
'
x
,
i
,
y
,
j
t
a
t
a
ς
β
β
1
1
n
⎠
⎝
⎪
⎪
if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 and
⎪
T ' (β ( x ,i ), β ( y , j )) < C (a n )
⎪
⎪
⎪
⎪min( x , y ) if x > a n or y > a n
⎪
⎪
⎪
⎩
]
(
(
]
Then
(i) Ta1 ,L,an ;T ' is a t-norm C a1 ,L,an - dual with t-conorm S a1 ,L,an ;S'
(ii) Ta*1 ,L,an ;T ' is a t-norm C a*1 ,L,an - dual with t-conorm S a1 ,L,an ;S' .
25
Example 1.9 (Iancu, 2005) By particularization, one can be obtained
new extensions of another known t-operators. For instance, for n = 2 ,
a = a1 , b = a 2 , ⊕ = + , ⊗ = × , t ( x ) = x and T ' = TM = min in Theorem
1.6 it results
⎧ a ⎛1− b 1− b ⎞
T⎜
x,
y⎟
⎪
Ta / b ( x , y ) = ⎨1 − b ⎝ a
a ⎠
⎪min( x , y )
⎩
if
x≤a
and
y≤b
otherwise
that is an extension of Pacholczyk’s (1987) t-norm and
*
a/b
T
abT (γ ( x ), γ ( y ))
⎧
if
⎪
(x , y ) = ⎨ (1 − a + b )(1 − T (γ (x ), γ ( y ))) − ab
⎪min( x , y )
otherwise
⎩
where γ ( z ) =
(1 + a )(1 − b )z
(1 + a − b )z + ab
x ≤ a and
y≤b
, which is a t-norm with 1-threshold a and
parameter b .
Theorem. 1.7. (Iancu, 2003) Let (T , S ) be a pair (t-norm, t-conorm) dual
with respect to C ( x ) = t −1 (t (1)Θt ( x )) , S '∈ {S M , S } and T ' is the dual of S '
with respect to the same negation, i.e. T '∈ {TM , T } .
We use the following notations
k=
t (a1 ) ⊗ t (a n )
,
t (1)Θt (a n )
(t (1)Θt (a n )) ⊗ t (z ) ⊗ t (1) ⎞⎟
⎟
⎝ t (a1 ) ⊗ t (a n ) ⊕ (t (1)Θt (a n )) ⊗ t ( z ) ⎠
⎛
α (z ) = t −1 ⎜⎜
and
β (z , i ) = t −1 (t (1)Θt (z ) ⊗ δ (i )Θθ (i )) ,
where δ (i ) and θ (i ) have the same signification from the Theorem 1.4.
26
For 0 < a1 < a2 < L < a n < 1 we define
⎧ −1 ⎛
t (T (α ( x ) ,α ( y ))) ⎞
⎟⎟
⎪t ⎜⎜ k ⊗
(
)
(
(
(
)
(
)
)
)
Θ
α
α
t
t
T
x
,
y
1
⎝
⎠
⎪
⎪
if x ≤ a1 and y ≤ a1
⎪
⎪
⎪
⎪t −1 ⎛⎜ k ⊗ t (T ' (α ( x ) , β ( y ,i ))) ⎞⎟
⎪ ⎜⎝
t (1)Θt (T ' (α ( x ), β ( y ,i ))) ⎟⎠
⎪
if x ≤ a1 and y ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1
⎪
⎪
⎪
⎪ −1 ⎛
t (T ' (β ( x ,i ) ,α ( y ))) ⎞
⎟
⎪t ⎜⎜ k ⊗
t (1)Θt (T ' (β ( x ,i ) ,α ( y ))) ⎟⎠
⎪ ⎝
⎪
if x ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1 and y ≤ a1
⎪
⎪
⎪
**
Ta1 ,Lan ;T ' ( x , y ) = ⎨t −1 ((t (1)Θt (T ' (β ( x ,i ), β ( y , j )))) ⊗ δ (k ) ⊕ θ (k ))
⎪
if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 , l = max(i , j ) and
⎪
⎪
there is an int eger k ∈ [n − l , n − 1] such that
⎪
T ' (β ( x ,i ), β ( y , j )) ∈ [C (a k +1 ), C (a k ))
⎪
⎪
⎪
⎪ −1 ⎛
t (T ' (β ( x ,i ) , β ( y , j ))) ⎞
⎟⎟
⎪t ⎜⎜ k ⊗
(
)
(
(
(
)
(
)
)
)
Θ
β
β
t
t
T
'
x
,
i
,
y
,
j
1
⎝
⎠
⎪
⎪
if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 and
⎪
T ' (β ( x ,i ), β ( y , j )) < C (a n )
⎪
⎪
⎪
⎪min( x , y ) if x > a n or y > a n
⎪
⎪
⎪
⎩
]
(
(
]
Then Ta*1*,L,an ;T ' is a t-norm C a*1*,L,an - dual with t-conorm S a1 ,L,an ;S' .
27
Example 1.10. (Iancu, 2003) For n = 2 , a = a1 , b = a 2 , ⊕ = + , ⊗ = × ,
t ( x ) = x and T ' = TM = min in Theorem 1.7 we obtain a new parametric
extension for t-norms with 1-threshold
⎧ ab
T (α ( x ), α ( y ))
⎪
Ta / b ( x , y ) = ⎨1 − b
⎪⎩min( x , y )
where α ( z ) =
(1 − b )z
(1 − b )z + ab
if
x ≤ a and
y≤b
otherwise
, a is the threshold and b is the parameter.
As is presented in some of our papers (for instance, (Iancu,
1987a)) the operators with thresholds can be used with good results for
combining the belief degrees attached of the information from a
knowledge base. The choosing of the thresholds allows us to obtain the
results in accordance with the reality even if the operators used for
constructing the operators with threshold have not this property.
1.3
FUZZY
NUMBERS
An important concept in fuzzy set theory is the extension principle
used to extend any point operations to operations involving fuzzy sets.
Definition 1.18. Let X be the cartesian product X = X 1 × X 2 × L × X n ,
A1 , L , An be fuzzy sets in X 1 , L , X n , respectively and
f a mapping
from X to an universe Y , y = f (x1 , L , x n ) . Then the extension principle
allows us to define a fuzzy set B in Y by
B = {( y , μ B ( y )) / y = f ( x1 , L , x r ), ( x1 , L , x r ) ∈ X }
28
where
{
}
⎧
min μ A 1 ( x1 ), L , μ A n ( x n ) if f −1 ( y ) ≠ ∅
sup
⎪
μ B ( y ) = ⎨( x1 , L, xn )∈ f −1 ( y )
⎪0
otherwise
⎩
For r = 1 the extension principle reduces to
B = f ( A) = {( y , μ B ( y )) / y = f ( x ), x ∈ X }
where
⎧ sup μ A ( x ) if f −1 ( y ) ≠ ∅
⎪
μ B ( y ) = ⎨ x∈ f −1 ( y )
⎪⎩0
otherwise
Example 1.11. For A = {(− 1, .5), (0 , .8), (1, 1), (2 , . 4 )} and f ( x ) = x 2 it
results B = {(0 , .8), (1, 1), (4 , .4 )}
Definition 1.19. A fuzzy number A is a fuzzy set of the real line with a
normal, fuzzy convex and continuous membership function of bounded
support. The set of fuzzy numbers will be notated by F (R).
Definition 1.20. A quasi fuzzy number A is a fuzzy set of the real line
with a normal, fuzzy convex and continuous membership function
satisfying the limit conditions lim μ A (t ) = 0 , lim μ A (t ) = 0 .
t →∞
t → −∞
Definition 1.21. A fuzzy number A is called positive (negative) if its
membership function satisfies μ A ( x ) = 0 , ∀x < 0 (∀x > 0 ) .
Definition 1.22. A binary operation ∗ is called increasing (decreasing) if
x1 > y1 and x2 > y 2 . Then x1 ∗ x 2 > y1 ∗ y 2 (x1 ∗ x 2 < y1 ∗ y 2 ) .
From the extension principle it results:
29
a)
b)
c)
f ( x ) = − x the opposite of a fuzzy number A is
− A = {( x , μ − A ( x )) / x ∈ X } , where μ − A ( x ) = μ A (− x ) ;
1
then the inverse of a fuzzy number A is given by
if f ( x ) =
x
⎛1⎞
A −1 = {(x , μ A−1 ( x )) / x ∈ X }, where μ A−1 ( x ) = μ A ⎜ ⎟ ;
⎝ x⎠
for λ ∈ R − {0} and f ( x ) = λx then the scalar multiplication of a fuzzy
number is given by λA = {( x , μ λA ( x )) / x ∈ X }, where μ λA (x ) = μ A (λx ) .
for
The extensions of algebraic operations + , − , ×, : to operations on
~, −
~ and ~: , respectively.
~, ×
fuzzy numbers will be denoted by +
Some of important properties of
~
+ and ~
× operations, for real fuzzy
numbers, are:
+ operation
for ~
1. ~
− (A ~
+ B ) = (~
− A) ~
+ (~
− B)
+ is commutative
2. ~
3. ~
+ is associative
~ , that is
4. 0 ∈ R ⊆ F(R) is the neutral element for +
A~
+ 0 = A ∀A ∈ F(R)
+ there does not exist an inverse element, that is,
5. for ~
∃A ∈ F(R) - R : A ~
+ (~
− A) ≠ 0 ∈ R.
for ~
× operation
1. (~
− A) ~
×B=~
−( A ~
× B)
2. ~
× is commutative
3. ~
× is associative
~ , that is
4. 1∈ R ⊆ F(R) is the neutral element for +
A~
× 1 = A ∀A ∈ F(R)
~
5. for × there does not exist an inverse element, that is,
∃A ∈ F(R)- R : A ~
× A −1 ≠ 1
30
Theorem 1.8. (Dubois & Prade, 1980). If A and B are fuzzy numbers
with continuous and surjective membership functions from R to [0, 1] and
∗ is a continuous increasing (decreasing) binary operation, then A ~∗ B is
a fuzzy number whose membership function is continuous and surjective
from R to [0, 1] .
The membership function of
A~
∗ B can be determined from the
membership functions of A and B , according with the
Theorem 1.9. (Dubois & Prade, 1980). If A, B ∈ F ( R) with continuous
membership functions then the extension principle for the binary function
∗ : R × R → R gives the fuzzy number A ~∗ B :
μ A~∗ B ( z ) = sup min{μ A (x ), μ B ( y )} .
z = x∗ y
Example
1.12.
Let
A = {(1, 0.3) , (2 , 1) , (3, 0.4 )}
and
B = {(2 , 0.7 ) , (3, 1) , (4 , 0.2 )} be fuzzy numbers. Then
A~
× B = {(2 , 0.3) , (3, 0.3), (4 , 0.7 ) , (6 , 1) , (8, 0.2 ) , (9 , 0.4 ) , (12 , 0.2 )}.
Frequently, in applications one work with triangular or trapezoidal fuzzy
numbers, given by the following definitions.
Definition 1.23. A fuzzy set A is called triangular fuzzy number with
center a , left width α > 0 and right width β > 0 , denoted as
A = (a , α , β) , if its membership function is
⎧1 − (a − x ) α if x ∈ [a − α , a ]
⎪
μ A ( x ) = ⎨1 − ( x − a ) β if x ∈ [a , a + β ]
⎪0
otherwise
⎩
31
A triangular fuzzy number with center a is seen as a fuzzy quantity “ x is
approximately equal to a ”.
Definition 1.24. A fuzzy set A is called trapezoidal fuzzy number with
tolerance interval [a , b] , left width α > 0 and right width β > 0 , denoted
as A = (a , b , α , β ) , if its membership function is
⎧1 − (a − x ) α if x ∈ [a − α , a ]
⎪1
if x ∈ [a , b]
⎪
μ A (x ) = ⎨
⎪1 − ( x − b ) β if x ∈ [b , b + β ]
⎪⎩0
otherwise
A trapezoidal fuzzy number with tolerance interval [a , b] is seen as a
fuzzy quantity “ x is approximately in the interval [a , b] ”.
Computational efficiency has a particular importance when fuzzy
set are used to solve real problems. The LR-representation of fuzzy
numbers, suggested by Dubois and Prade (1979), increases computational
efficiency, without limiting the generality.
Definition 1.25. Any fuzzy number A can be represented as
⎧ L((a − x ) α )
⎪1
⎪
μ A (x ) = ⎨
⎪ R(( x − b ) β )
⎪⎩0
if x ∈ [a − α , α ]
if x ∈ [a , b]
if x ∈ [b , b + β ]
otherwise
where [a , b] is the core of A , L , R : [0 , 1] → [0 , 1] are continuous and nonincreasing shape functions with L(0 ) = R (0 ) = 1 and L(1) = R(1) = 0 ; a
such representation is referred as A = (a , b , α , β )LR .
Definition 1.26. Any quasi fuzzy number A can be represented as
32
⎧ L((a − x ) α )
⎪1
⎪
μ A (x ) = ⎨
⎪ R(( x − b ) β )
⎪⎩0
if x ≤ a
if x ∈ [a , b]
if x ≥ b
otherwise
where [a , b] is the core of A , L , R : [0 , ∞ ] → [0 , 1] are continuous and
non-increasing
shape
lim L(x ) = lim R(x ) = 0
x →∞
x →∞
functions
;
a
such
L(0 ) = R (0 ) = 1
with
representation
is
referred
and
as
A = (a , b , α , β )LR .
For a = b we use the notation A = (a , α , β)LR and A is named quasitriangular fuzzy number. If
L( x ) = R (x ) = 1 − x
then instead of
A = (a , b , α , β )LR the notation A = (a , b , α , β ) is used. In what follows we
will name, shortly, the quasi-fuzzy numbers as fuzzy numbers.
Example 1.13. (Zimmerman, 1991). Let L(x ) =
1
1
, R(x ) =
,
2
1+ 2 x
1+ x
α = 2, β = 3, a = 5. Then A = (5, 2 , 3)LR is given by
⎧ ⎛5− x⎞
1
⎪ L⎜ 2 ⎟ =
2
⎠
⎛5− x⎞
⎪ ⎝
1+ ⎜
⎟
⎪
⎝ 2 ⎠
μ A (x ) = ⎨
1
⎪ R⎛ x − 5 ⎞ =
⎜
⎟
⎪ ⎝ 3 ⎠
2(x − 5)
1+
⎪
3
⎩
if
x ≤5
if
x ≥5
The operations with LR fuzzy numbers are considerably simplified:
Dubois and Prade gave exact formulas for ~
+ and ~
− and approximate
~
~
expressions for × and : .
33
Theorem 1.10. (Dubois & Prade, 1980) Let LR fuzzy numbers
A = (a , α , β )LR and B = (b , γ , δ )LR . Then
1) (a , α , β)LR ~
+ (b , γ , δ )LR = (a + b , α + γ , β + δ )LR
2) − (a , α , β )LR = (− a , β , α )LR
− (b , γ , δ )LR = (a − b , α + δ , β + γ )LR .
3) (a , α , β)LR ~
Example
1.14.
For
B = (2, 0.6, 0.2)LR
it
L(x ) = R( x ) =
1
,
1+ x2
results
A = (1, 0.5, 0.8)LR
A~
+ B = (3, 1.1, 1)LR
and
and
A~
− B = (− 1, 0.7 , 1.4 )LR .
Theorem 1.11. (Dubois & Prade, 1980) If A and B are LR fuzzy
numbers, then
(a , α , β)LR ~× (b , γ , δ)LR ≅ (ab , aγ + bα , aδ + bβ)LR if A and B are
positive;
(a , α , β)LR ~× (b , γ , δ)LR ≅ (ab , bα − aδ, bβ − aγ )LR if A < 0 and B > 0 ;
(a , α , β)LR ~× (b , γ , δ )LR ≅ (ab , − bβ − aδ, − bα − aγ )LR if A and B are
negative.
Example
1.15.
(Zimmerman,
1991).
For
A = (2, 0.2, 0.1)LR ,
B = (3, 0.1, 0.3)LR and
⎧1 if -1 ≤ x ≤ 1
L(x ) = R( x ) = ⎨
⎩0 otherwise
we obtain
⎧ ⎛2− x⎞
⎪ L⎜ 0.2 ⎟ if
⎪ ⎝
⎠
μ A (x ) = ⎨
⎪ R⎛⎜ x − 2 ⎞⎟ if
⎪⎩ ⎝ 0.1 ⎠
34
x≤2
⎧1 if 1.8 ≤ x ≤ 2.2
= ⎨
⎩0 otherwise
x≥2
⎧ ⎛3− x⎞
⎪ L⎜ 0.1 ⎟ if
⎪ ⎝
⎠
μ B (x ) = ⎨
⎪ R⎛⎜ x − 3 ⎞⎟ if
⎪⎩ ⎝ 0.3 ⎠
x≤3
x≥3
⎧1 if 2.7 ≤ x ≤ 3.3
= ⎨
⎩0 otherwise
therefore A and B are positive numbers.
According to the last theorem, it results
A~
× B ≅ (2 × 3, 2 × 0.1 + 3 × 0.2, 2 × 0.3 + 3 × 0.1)LR = (6 , 0.8, 0.9)LR .
1.4
FUZZY
RELATIONS
Fuzzy relations are natural extensions of classical relations and
they are important because they describe interactions between variables.
We consider only binary relations, because the extension to n -any
relations is straight forward.
Definition 1.27. Let X and Y be nonempty sets. A fuzzy relation R is a
fuzzy subset of
X ×Y :
R = {(( x , y ), μ R ( x , y )) / (x , y ) ∈ X × Y } where
μ R ( x , y ) is the degree of membership of (x , y ) in R .
Example 1.16. Fie X = Y = R and R = „considerably larger than“. This
relation can be defined by
35
⎪⎧0
⎪⎩ 1 + ( y − x )− 2
μ R~ (x , y ) = ⎨
(
)
−1
if
x≤ y
if
x > y.
The same relation can be defined using a table:
x1
x2
x3
y1
y2
y3
y4
0.8
1
0.1
0.7
0
0.8
0
0
0.9
1
0.7
0.8
where X = {x1 , x 2 , x3 } and Y = {y1 , y 2 , y 3 , y 4 } .
The operations with fuzzy relations can be defined by analogy with the
operations of fuzzy sets.
Definition 1.28. Let R and S be two fuzzy relations in the same product
space X × Y ; then their union/intersection is defined by
μ R ∪ S (x , y ) = max{μ R ( x , y ) , μ S ( x , y )},
μ R ∩ S ( x , y ) = min{μ R ( x , y ) , μ S ( x , y )}, ∀(x ,y ) ∈ X × Y
We can use any t-norm T and t-conorm S instead of min and
max operations, respectively. Some other operations, such as the
projection and the cylindrical extension of fuzzy relations can also be
useful.
Definition 1.29. Let R be a fuzzy relation on X × Y . The projection of R
on X is a fuzzy subset of X , defined by
⎫
⎧⎛
⎞
Π X (R ) = ⎨⎜ x , sup μ R ( x , y )⎟ / ( x , y ) ∈ X × Y ⎬
y
⎠
⎭
⎩⎝
The total projection is given by
36
Π T (R ) = sup sup{μ R ( x , y ) / ( x , y ) ∈ X × Y }.
x
y
Definition 1.30. The largest relation in X for which the projection is R
is called the cylindrical extension of R .
Example 1.17. An example of fuzzy relation and its projections is
⎛
⎜
⎜ x1
⎜
⎜ x2
⎜x
⎜ 3
⎜ ΠY
⎜
⎝ ΠT
y1
y2
y3
y4
y5
y6
0.1
0.2 0.3
0.6
1
0.2
0.3
0.5
0.4
1
0.5
0.6
0.4
0.8
1
0.7
0.8
0.8
0.4
0.8
1
1
1
0.8
ΠX ⎞
⎟
1 ⎟
⎟
1 ⎟
1 ⎟
⎟
1 ⎟
⎟
1 ⎠
The cylindrical extension of Π Y is
y1
⎛
⎜
⎜ x1 0.4
⎜
⎜ x2 0.4
⎜ x 0.4
⎝ 3
y2
y3
y4
y5
0.8
1
1
1
0.8
1
1
1
0.8
1
1
1
⎞
⎟
0.8 ⎟
⎟
0.8 ⎟
0.8 ⎟⎠
y6
Fuzzy relations in different product spaces can be combined by the
operation “composition”. Different versions of “composition” have been
suggested but the sup-min composition is the best known and the most
frequently used.
Definition 1.31. Let R and S be two fuzzy relations on X × Y and
Y × Z , respectively. Their sup-min composition is defined as
⎫
⎧
(R o S ) = ⎨⎡⎢(x , z ), sup min{R(x , y ), S ( y , z )}⎤⎥ / x ∈ X , y ∈ Y , z ∈ Z ⎬
⎩⎣
y∈Y
⎦
⎭
Example 1.18. (Fullér, 1995). The sup-min composition of the relations
37
z1
⎛
⎜
y1 y 2 y3 y 4 ⎞
⎛
⎜
⎟
⎜ y1 0.4
⎜ x1 0.5 0.1 0.1 0.7 ⎟
⎜
R=⎜
⎟ and S = ⎜ y 2 0
0⎟
⎜ x 2 0 0.8 0
⎜ y 0.9
⎜ x 0.9 1 0.7 0.8 ⎟
⎜ 3
⎝ 3
⎠
⎜ y 4 0.6
⎝
is
z2
0.9
0.4
0.5
0.7
z3 ⎞
⎟
0.3 ⎟
⎟
0 ⎟
0.8 ⎟
⎟
0.5 ⎟
⎠
z1
z2
z3 ⎞
⎛
⎜
⎟
⎜ x1 0.6 0.8 0.5 ⎟
RoS =⎜
⎟
⎜ x2 0 0.4 0 ⎟
⎜ x 0.7 0.9 0.7 ⎟
⎝ 3
⎠
Changing the operation min from the last definition with a t-norm
T one obtain the sup− T composition. Following ( Zadeh, 1973), the
sup-min composition of a fuzzy set and fuzzy relations can be obtained as
follows.
Definition 1.32. Let T be a t-norm. The membership function of the
composition of a fuzzy set A in X and a fuzzy relation R in X × Y is
defined by
μ Ao R ( y ) = sup T (μ A ( x ), μ R ( x , y )), ∀y ∈ Y
x∈ X
38
2
UNCERTAINTY
2.1
POSSIBILITY
AND NECESSITY MEASURES
In 1974 Sugeno introduced the concept of fuzzy measure as in the
next definition:
Definition 2. 1. Given the universe X (supposed to be finite, for sake of
simplicity) a fuzzy measure is a set function g from the set
2 X of
subsets of X to the interval [0, 1] , such that
i)
g (∅ ) = 0 , ii) g ( X ) = 1
iii)
∀A, B ∈ 2 X , if A ⊆ B then g ( A) ≤ g (B )
The following inequalities hold
∀A, B ∈ 2 X , g ( A ∩ B ) ≤ min(g ( A), g (B ))
∀A, B ∈ 2 X , g ( A ∪ B ) ≥ max( g ( A), g (B ))
In order to combine the uncertainties of two events A and B we use the
relations
1)
A ∩ B = ∅ ⇒ g ( A ∪ B ) = g ( A)* g (B )
where ∗ is a t-conorm
A ∪ B = X ⇒ g ( A ∩ B ) = g ( A) ⊥ g (B )
2)
where ⊥ is a t-norm
The most important fuzzy measures are:
•
the possibility measure Π :
∀A, B ∈ 2 X , Π ( A ∪ B ) = max(Π ( A), Π (B ))
•
the necessity measure: N:
∀A, B ∈ 2 X , N ( A ∩ B ) = min( N ( A), N (B ))
It is easy to verify that Ν is a necessity measure if and only if Π is defined as
Π ( A) = 1 − Ν (¬A) ∀A ⊆ X .
Let
x1 , x 2 , L , x n
be
elementary
events
of
X;
the
values
π i = Π ({xi }) , i ∈ {1, 2, L , n}, define possibility distribution of Π . The
possibility measure Π
can be defined using a possibility distribution
π : X → [0, 1] with
Π ( A) = sup π ( x ) .
x∈ A
It results that the necessity measure is defined with the relation
Ν ( A) = inf {1 − π ( x ) / x ∉ A}.
Some important properties of these measures are:
max(Π ( A), Π (¬A)) = 1 , min( N ( A), N (¬A)) = 0
N ( A) + N (¬A) ≤ 1 , Π ( A) + Π (¬A) ≥ 1
Π ( A) ≥ N ( A)
or, more generally,
g ( A) ∗ g (¬A) = 1 , g' ( A) ⊥ g' (¬A) = 0 ,
where ∗ is a t-conorm, ⊥ is a t-norm, g is a fuzzy measure and
40
g' ( A) = 1 − g (¬A) .
Example 2.1. If E ⊆ X is a certain event then, we can define a
possibility measure and a necessity measure as follows
⎧1
1) Π E ( A) = ⎨
⎩0
if
if
A∩ E ≠ ∅
A∩ E = ∅
where Π E ( A) = 1 has the meaning: the event X is possible
⎧1
2) N E ( A) = ⎨
⎩0
if E ⊆ A
otherwise
where N E ( A) = 1 says: the event X is necessary true.
Having a basic fuzzy event defined by a fuzzy set A , the possibility
and the necessity measures for an event defined by the fuzzy set F are
(Dubois, 1983):
Π T ; A (F ) = sup T (μ F ( x ), μ A ( x ))
x∈ X
N S ,C ; A (F ) = inf S (μ F ( x ), C (μ A (x )))
x∈ X
If the t-norm T and the t-conorm S are C-dual then
Π T ; A (F ) = C (N S ,C ; A (¬F ))
The inequality from Π and N from the crisp case, generally, is not
true in the fuzzy case; a such case is (Yager, 1983a):
C ( x ) = 1 − x , T (x , y ) = min( x , y ) , S ( x , y ) = max( x , y ) , max μ A ( x ) ≤ 0.5 ,
x∈X
when
N S ,C ; A (F ) > Π T ; A (F ) ∀F ∈ 2 X
41
Following (Iancu, 1988b) we present a general method for constructing
possibility and necessity measures.
Definition 2.2. We call a maximization operator (or the Pedrycz's
operator)
associated
to
a
t-norm
T,
the
application
τ T : [ 0, 1] × [ 0, 1] → [ 0, 1] with the properties:
xτ T y ≤ zτ T y if x ≤ z ,
T (x ,y )τ T y ≥ x ,
T ( xτ T y , y ) ≤ x .
Theorem 2.1. (Iancu, 1988b) Let f : [ 0, 1] → I ⊆ J ⊆ R be a continuous
and strictly decreasing function and Δ: J × J → J an application which
satisfies the conditions (1.2.1) - (1.2.6) from Theorem 1.3 for x , y , z ∈ J
and e = f (1) . Then there is the maximization operator associated with tnorm T ( x ,y ) = f (−1) (Δ( f ( x ), f ( y ))) and it is given by
⎧⎪1
xτ T y = ⎨ −1
⎪⎩ f Δ f ( x ) , ϕ f ( y )
((
(
)))
if x ≥ y
if x < y.
A R-implication associated with a t-norm T is defined by
I T ( x , y ) = sup{z ∈ [0 , 1] / T ( x , z ) ≤ y} x , y ∈ [0, 1]
We consider the following R-implications:
;T
a ⎯R⎯
→ b = sup{x ∈ [0 , 1] / T (a , x ) ≤ b}
a ⎯⎯
⎯→ b = C (inf {x ∈ [0 ,1] / S (b , x ) ≥ a})
R ;T ,C
R ;T
⎯ C (a )
= sup{x ∈ [0 ,1] / T (C (b ), x ) ≤ C (a )} = C (b ) ⎯⎯→
where T is a t-norm, C is a strict negation and S is the t-conorm C -dual
with T.
Using the Pedrycz's operator we obtain
42
if a ≤ b
⎧1
R ;T
⎯ b = bτT a = ⎨ −1
a ⎯⎯→
⎩ f (Δ( f (b ),ϕ( f (a ))))
if a > b
⎧1
⎯→ b = C (a )τT C (b ) = ⎨ −1
a ⎯⎯
R ;T ,C
⎩ f (Δ( f (C (a ),ϕ( f (C (b ))))))
We define
{
if a ≤ b
if a > b.
}
;T
ΓT ; A (B ) = inf μ A ( x ) ⎯R⎯
→ μ B (x )
x∈ X
{
}
LT ,C ; A (B ) = inf μ A ( x ) ⎯R⎯
⎯→ μ B ( x ) ,
;T ,C
x∈ X
Λ T ,C ; A (B ) = C (ΓT ; A (B )) ,
VT ,C ; A (B ) = C (LT ,C ; A (B ))
.
Theorem 2.2. (Iancu, 1988b) ΓT ; A and LT ,C ; A are necessity measures
and Λ T ,C ; A and VT ,C ; A are possibility measures.
Example 2.2. (Iancu, 1988b) For
f ( x ) = 2 − x , Δ ( x , y ) = 2 x + 2 y − xy − 2, ϕ ( x ) =
T ( x , y ) = xy ; then, for C ( x ) = 1 − x we obtain
⎧1
⎪
ΓT ; A (B ) = inf ⎨ μ B ( x )
x∈ X
⎪ μ (x )
⎩ A
⎧1
⎪
LT ,C ; A (B ) = inf ⎨1 − μ A (x )
x∈X
⎪1 − μ (x )
B
⎩
3 − 2x
we have
2− x
if μ A ( x ) ≤ μ B ( x )
if μ A ( x ) > μ B ( x )
if μ A ( x ) ≤ μ B ( x )
if μ A ( x ) > μ B ( x )
43
⎧0
⎪
Λ T ,C ; A (B ) = sup ⎨ μ A ( x ) + μ B ( x ) − 1
x∈ X
⎪
μ A (x )
⎩
⎧0
⎪
VT ,C ; A (B ) = sup ⎨ μ A ( x ) + μ B ( x ) − 1
x∈ X
⎪
μ B (x )
⎩
if μ A ( x ) ≤ 1 − μ B ( x )
if μ A (x ) > 1 − μ B (x )
if μ A ( x ) ≤ 1 − μ B (x )
if μ A ( x ) > 1 − μ B ( x ).
Observation 2.1. We call R-measures the ones constructed by Rimplications.
Observation 2.2.
Using the previous method, the implication
a → b = S ( C( a ) , b) (called S-implication) generates
the measures Π
and N .
Observation 2.3.
There are some implications which cannot generate
the necessity and possibility measures using the previous method. For
(
a → b = S C( a ) , T ( a , b )
instance, we consider the implication
)
; for
T ( x ,y ) = xy , S ( x ,y ) = x + y − xy , C ( x ) = 1 − x and the universe Ω we
obtain
Ν T ,C ; A (Ω ) = inf S (C (μ A (ω )), T (μ A (ω ), μ Ω (ω )))
ω∈Ω
(
)
⎡3 ⎤
= inf μ A2 (ω )−μ A (ω )+1 ∈ ⎢ ,1⎥ ,
ω∈Ω
⎣4 ⎦
therefore, generally, Ν T ,C ; A ( Ω) ≠ 1.
Some properties of R-measures are:
{
}
(P1) Μ( Ω) = 1 and Μ( ∅) = 0 for Μ ∈ ΓT ; A , L T ,C ; A , Λ T ,C ; A ' VT ,C ; A ,
(
)
{
}
(P2) Μ( B1 ∩ B2 ) = min Μ( B1 ) , Μ( B2 ) for Μ ∈ ΓT ; A , L T ,C ; A ,
44
(
)
{
}
(P3) Μ( B1 ∪ B2 ) = max Μ( B1 ) , Μ( B2 ) for Μ ∈ Λ T ,C; A , VT ,C ; A ,
{
}
(P4) B1 ⊆ B2 ⇒ Μ( B1 ) ≤ Μ( B2 ) for Μ ∈ ΓT ; A , L T ,C ; A , Λ T ,C ; A , VT ,C ; A ,
{
∈ {Λ
}
(P5) A1 ⊆ A2 ⇒ Μ1 ( B) ≥ Μ 2 ( B) for Μ i ∈ ΓT ; A i , L T ,C ; A i , i ∈ {1, 2} ,
(P6) A1 ⊆ A2 ⇒ Μ1 ( B) ≤ Μ 2 ( B) for Μ i
T ,C ; A i
}
, VT ,C ; Ai , i ∈ {1, 2} ,
(P7) ΓT ; A ( B) = 1 ⇔ L T ,C ; A ( B) = 1,
(P8) Λ T ,C ; A ( B ) = 0 ⇔ VT ,C ; A ( B ) = 0 ,
(P9) Π T ; A ( B) = 1 ⇒ Λ T ,C ; A ( B) = 1,
(P10) Ν T ,C ; A ( B) = 0 ⇒ ΓT ; A ( B) = 0 ,
(P11) L T ,C ; A ( B) < 1 ⇔ ΓT ; A ( B) < 1,
(P12) Λ T ,C ; A ( B ) > 0 ⇔ VT ,C ; A ( B ) > 0 ,
(P13) ΓT ; A ( B) = L T ,C ; A ( B) = 1 ⇒ Ν T ,C ; A ( B) ≥ γ ,
(P14) L T ,C ; A ( B) = VT ,C ; A ( B) = 0 ⇒ Π T ; A ( B) ≤ γ
where Ω is the universe, A, B, A1 , A2 , B1 , B2 ∈ 2 Ω and γ is the fixed
point of the negation.
Their justification is immediately. For instance, for (P13) we have
ΓT ; A ( B) = L T ,C ; A ( B ) = 1 ⇒ μ A ( ω ) ≤ μ B ( ω )
since μ B ( ω ) ≤ γ
(
∀ω ∈ Ω ;
or C( μ B ( ω ) ) ≤ γ we have
)
(
)
T μ A ( ω ) , C( μ B ( ω ) ) ≤ T μ B ( ω ) , C( μ B ( ω ) ) ≤ T ( γ , 1) = γ
and therefore
⎛
⎞
Ν T ,C ; A ( B) = C⎜ sup T μ A ( ω ) , C( μ B ( ω ) ) ⎟ ≥ C( γ ) = γ .
⎝ ω ∈Ω
⎠
(
)
For particular cases one can obtain a lot of other properties. For
instance, for the measures given in Example 2.2 we have
(P15) Ν T ,C ; A ( B) = 1 ⇒ ΓT ; A ( B) = L T ,C ; A ( B) = 1 ,
45
(P16) Π T ; A ( B) = 0 ⇒ VT ,C ; A ( B ) = Λ T ,C ; A ( B) = 0 ,
(P17) ΓT ; A ( B) ≤ 0.5 ⇒ ΓT ; A ( B ) ≤ Ν T ,C ; A ( B) ,
(P18) Ν T ,C ; A ( B) < 0.5 ⇒ ΓT ; A ( B) < 0.5 ,
(P19) Π T ; A ( B) > 0.5 ⇒ Λ T ,C ; A > 0.5,
(P20) ΓT ; A ( B) > 0 ⇒ Ν T ,C ; A ( B) > 0 ,
(P21) Λ T ,C ; A ( B ) < 1 ⇒ Π T ; A ( B) < 1,
(P22) Π T ; A ( B) < 1 ⇒ L T ,C ; A ( B ) = 0.
2.2
BELIEF
AND PLAUSABILITY FUNCTIONS
Independently from the development of fuzzy sets and possibility
theory, Shafer (1976) has proposed a theory of evidence where he has
introduced the mathematical concept of belief function. Most of the
measures in this theory can be defined in terms of the basic probability
assignment (bpa) m , which satisfies the following conditions:
m: 2 Ω → 0, 1 ,
m(∅ ) = 0,
∑ m( A) = 1.
A⊆ Ω
where Ω is the frame of discernment (or universe).
Subsets of Ω with nonzero basic probabilities are called focal
elements. The basic probability assignment m determines the lower
probability and the upper probability of a subset A of Ω , called belief
function, denoted as Bel ( A) , and plausibility function, denoted as Pls ( A) ,
respectively. These functions are defined by
46
Definition 2.3. Given the universe Ω , a belief function is an application
Bel: 2 Ω → [0, 1] with the properties
i) Bel (∅ ) = 0 ,
ii) Bel (Ω ) = 1 ,
iii) ∀n ∈ N , ∀Ai ⊆ Ω , i ∈ {1, 2 , L , n}
n +1
⎛n ⎞ n
⎛n ⎞
Bel ⎜ U Ai ⎟ ≥ ∑ Bel ( Ai ) − ∑ Bel (Ai ∩ A j ) + L + (− 1) Bel ⎜ I Ai ⎟ .
⎝ i =1 ⎠ i =1
⎝ i =1 ⎠
i< j
The plausibility function is given by
∀A ⊆ Ω ,
Pls( A) = 1 − Bel (¬A) .
These two quantities are obtained from the bpa as follows (Shafer,
1976):
Bel ( A) =
∑ m( B )
B⊆ A
Pls ( A) =
∑ m(B ) .
A∩ B ≠ ∅
For computing the belief degree in a fuzzy set A due to a nonfuzzy bpa (i. e. all focal elements B j are crisp sets) one can use the
formulas (Smets, 1981)
Bel ( A) =
μ A (x )
∑ m(B j )× xinf
∈B
B j ⊆Ω
Pls ( A) =
∑
B j ⊆Ω
( )
j
m B j × sup μ A ( x ) .
x∈B j
The probabilistic constraint of a fuzzy focal element A is expressed by
decomposing it into the level sets of A , which form a group of consonant
crisp focals.
The decomposition of a fuzzy focal element A is a collection of
nonfuzzy subsets such that (Dubois & Prade, 1982, 1985a; Yen, 1992):
- they are the level-sets of A
Aα1 ⊃ Aα 2 ⊃ ...... ⊃ Aα n , α 1 < α 2 < .....α n ;
and
- their basic probabilities are
47
( )
m Aαi = (α i − α i −1 ) × m( A) i ∈ {1, 2,....., n}, α 0 = 0, α n = 1.
Now, the previous formulas become (Yen, 1992)
(
(m(B )∑
)
(α ))
Bel ( A) = ∑ B m(B )∑α (α i − α i −1 ) f B ,A (α i )
Pls ( A) = ∑ B
i
αi
(α i − α i −1 )g B ,A
i
where f B ,A (α ) = inf μ A ( x ) and g B ,A (α ) = sup μ A ( x ) .
x∈Bα
x∈Bα
The following example ( Iancu, 1997c) illustrates how one applies
these formulas for computing the generalized belief functions. We want
to determine the uncertainty in the assertion "George's result in
mathematics is very good", denoted as the fuzzy set A . For this we
compute the support pair as [Bel ( A), Pls( A)], using the following focal
elements:
i) George obtained the marks between 60 and 70 (denoted as the fuzzy
set B ) with 0.3 basic probability and
ii) George obtained the marks about 90 (denoted as the fuzzy set C )
with 0.7 basic probability, i. e., m(B ) = 0.3, m(C ) = 0.7 ; the frame of
discernment is Ω = [0 , 100] .
The fuzzy sets A, B and C are characterized below by lists in the form of
A( xi ) / xi :
A = {0.25 / 40, 0.5 / 50, 0.75 / 60, 0.8 / 70, 0.9 / 80, 1 / 90, 1 / 100} ,
B = {0.25 / 30, 0.5 / 40, 0.75 / 50, 1 / 60, 1 / 70, 0.75 / 80, 0.5 / 90, 0.25 / 100}
C = {0.25 / 50, 0.5 / 60, 0.8 / 70, 1 / 80, 1 / 90, 1 / 100}.
We decompose the fuzzy focal B into the following non-fuzzy focal
elements:
B0.25 = {30, 40, 50, 60, 70, 80, 90, 100} with mass 0.25 × m(B )
B0.5 = {40 , 50 , 60 , 70 , 80 , 90}
48
with mass 0.25 × m(B )
B0.75 = {50, 60, 70, 80}
with mass 0.25 × m(B )
B1 = {60 , 70}
with mass 0.25 × m(B )
and the fuzzy focal C into the following non-fuzzy focal elements:
C0.25 = {50, 60, 70, 80, 90, 100}
with mass 0.25 × m(C )
C 0.5 = {60, 70, 80, 90, 100}
with mass 0.25 × m(C )
C0.8 = {70, 80, 90, 100}
with mass 0.3 × m(C )
C1 = {80, 90, 100}
with mass 0.2 × m(C ) .
Then,
m( B) ∑α ( α i − α i −1 ) f B , A ( α i ) =
i
[
]
m( B ) × 0.25 × f B , A ( 0.25) + f B , A ( 0.5) + f B , A ( 0.75) + f B , A ( 1) =
0.3 × 0.25 × [0 + 0.25 + 0.5 + 0.75] = 0.1125
and
m( C ) ∑α ( α i − α i −1 ) f C , A ( α i ) =
i
[
]
m( C ) × 0.25 × f C , A ( 0.25) + 0.25 × f C , A ( 0.5) + 0.3 × f C , A ( 0.8) + 0.2 × f C , A ( 1) =
0.7 × [0.25 × 0.5 + 0.25 × 0.75 + 0.3 × 0.8 + 0.2 × 0.9] = 0.51275
therefore,
Similarly,
Bel ( A) = 0.1125 + 0.51275 = 0.62525
Pls ( A) = 0.9775.
The basic operator to update information in Probability Theory is
conditioning. In the Theory of evidence there is a generalization of this
concept, given by Dempster (1967). Another formulas have been
proposed by various authors:
- Dempster’s rule
Pls D (B A) =
Pls ( A ∩ B )
, Bel D (B A) = 1 − Pls D (¬B A)
Pls( A)
49
- geometrical rule (Suppes & Zanotti, 1977)
Bel ( A ∩ B )
, Pls g (B A) = 1 − Bel g (¬B A)
Bel g (B A) =
Bel ( A)
- Shafer’s strong conditioning rule ( Shafer, 1976; Dubois & Prade,
1986a)
Bel ( A ∩ B )
Bel ( A − B )
, Pls S (B A) = 1 −
Bel S (B A) =
Bel ( A)
Bel ( A)
- Planchet’s weak conditioning rule (Planchet, 1989)
Bel P (B A) =
Pls P (B A) =
Bel (B ) − Bel (B − A)
,
Pls( A)
Pls( A) + Pls (B ) − Pls ( A ∪ B )
Pls( A)
- De Campos, Lamata & Moral (1990) and Fagin, Halpern (1989) rule
Pls( A ∩ B )
P * (B A) =
Pls ( A ∩ B ) + Bel ( A ∩ ¬B )
P* (B A) =
Bel ( A ∩ B )
Bel ( A ∩ B ) + Pls ( A ∩ ¬B )
which is also called upper and lower probabilities.
2.3
DEMPSTER’S
RULE OF COMBINATION
We consider two independent evidential sources with the same
frame of discernment, Ω . Let Bel1 and Bel2 be the belief functions
determined by the basic probability assignment m1 and respectively m2
and having the focal elements
50
A1 , L , As and respectively B1 , L , Bt .
According to Dempster’s rule (Dempster, 1967), a new belief function
Bel = Bel1 ⊕ Bel 2 can be obtained
m12 (∅ ) = 0
∀A ∈ 2 Ω , A ≠ ∅ ,
∑
(m1 I m2 )( A) = i , j / Ai ∩B j = A
m1,2 ( A) = (m1 ⊕ m2 )( A) =
1 − (m1 I m2 )(∅ )
where k =
∑
i , j / Ai ∩ B j = ∅
( )
m1 ( Ai ) × m2 B j
m1 ( Ai ) × m2 (B j )
1− k
is the conflict of degree between the
two sources. The effect of the normalizing factor 1 − k consists in
eliminating the conflicting pieces of information between the two sources
to combine. When k = 1 the combined bpa m12 does not exist and the
bodies of evidences are said to be in full contradiction.
Several interesting and valuable alternative rules have been proposed in
literature to circumvent the limitations of Dempster’s rule of combination.
Because, generaly, there is a dependence between the sources , Garibba
and Servida (1988) gave a new aggregation method, extending the
Dempster’s rule .
The intersection from the last formula is extended to (Garibba &
Servida, 1988)
(m1 ∗ m2 )( A) = i , j / A∨∗B = A(m1 ( Ai ) ∧ m2 (B j ))
i
j
where ∗ represents a set of operators, such as I, U, ⊂ and ∨ and ∧ are
generalized union and intersection operators. Usually, the the operators
∗ ≡ I , ∨ ≡ max , ∧ ≡ min are used. In this case the last formula becomes
(m1 I m2 )( A) =
max
i,j/Ai ∩ B j = A
min(m1 ( Ai ), m2 (B j ))
and its normalized form is
51
(m1 I m2 )( A) .
∑ (m1 I m2 )(B )
m12 ( A) =
B∈2Ω , B ≠ ∅
Working with n consonante bpa, the previous rule is
m1,L,n ( A) =
max
i1,L,in /Ai1 ∩L∩ Ain = A
∑
( ( )
( ))
.
min(m (A ) ,L , m (A ))
min m1 Ai1 ,L , mn Ain
max
i ,L,in /Ai1 ∩L∩ Ain = B
B∈2Ω 1
1
i1
n
in
Dubois and Prade (1896b ; 1988) and Smets (1993) proposed the formula
m DP1 (∅ ) = 0
∀A ∈ 2 Ω , A ≠ ∅ , mDP1 ( A) =
∑
i , j / Ai ∪ B j = A
m1 ( Ai ) × m2 (B j )
which reflects the disjunctive consensus and is usually preffered when
one knows that one of the source S 1 or S 2 is mistaken but without
knowing which one among S 1 and S 2 .
Murphy’s rule of combination (Murphy, 2000 ; Yager, 1985 ;
Dubois & Prade, 1988) consists in average of belief functions associated
with m1 and m2 :
BelM ( A) =
Bel1 ( A) + Bel2 ( A)
, ∀A ∈ 2 Ω .
2
Smets’s rule of combination (Smets & Kennes, 1994) of two
independent sources of evidence eliminates the division by 1 − k involved
in Dempster’s rule :
m S (∅ ) = k =
∑ m ( Ai ) × m2 (B j )
1
i , j / Ai ∩ B j =∅
∀A ∈ 2 Ω , A ≠ ∅ , mS ( A) =
∑
i , j / Ai ∩ B j = A
m1 ( Ai ) × m2 (B j ) .
Yager’s rule of combination (Yager, 1983b ; 1985 ; 1987) admits
that in case of conflict the result is not reliable, so that k plays the role of
52
an absolute discounting term added to the weight of ignorance. This rule
is given by
mY (∅ ) = 0
∀A ∈ 2 Ω , A ∉ {∅ ,Ω}, mY ( A) =
mY (Ω ) = m1 (Ω )m2 (Ω ) +
∑
i , j / Ai ∩ B j = A
∑
i , j / Ai ∩ B j =∅
m1 ( Ai ) × m2 (B j )
m1 ( Ai ) × m2 (B j )
Dubois and Prade (1988) defined a new rule of combination,
according to the following principle : if one observe a value in a set X
while the other observes this value in a set Y then the truth lies in X ∩ Y
as long X ∩ Y ≠ ∅ . This rule is defined by
m DP 2 (∅ ) = 0
∀A ∈ 2 Ω , A ≠ ∅ , m DP1 ( A) =
∑ m ( Ai ) × m2 (B j ) +
1
i , j / Ai ∪ B j = A ,
Ai ∩ B j =∅
∑ m ( Ai ) × m2 (B j )
1
i , j / Ai ∩ B j = A ,
Ai ∩ B j ≠ ∅
.
Lefevre, Colot and Vanoorenberghe (2000) presented an unified
framework to embed all the existing combination rules involving
conjunctive consensus in the same general mechanism of construction :
Step1. Computation of the total conflicting mass based on conjunctive
consensus
k=
∑ m ( A )× m (B )
1
i , j / Ai ∩ B j =∅
i
j
2
Step 2. Rellocation of the conflicting masses on A ⊂ Ω, A ≠ ∅ with some
given coefficients wm ( A) ∈ [0, 1] such that
∑w
A⊆ Ω
m
( A) = 1 according to
m(∅ ) = wm (∅ ) ⋅ k
∀A ∈ 2 Ω , A ≠ ∅, m( A) =
∑ m ( A )× m (B ) + w
1
i , j / Ai ∩ B j = A
i
2
j
m
( A) ⋅ k
53
The particular choice of the set of coeficients wm (⋅) provides a
paticular rule of combination. This rule provides all existing rules
involving conjunctive consensus developed in the literature based on
Shafer’s model. For instance (Lefevre, Colot & Vanoorenberghe, 2000)
•
Dempster’s rule can be obtained by choosing
wm (∅) = 0 and wm ( A) =
•
1
∑ m1 ( Ai )m2 (B j )
1 − k i , j / Ai ∩ B j = A
Yager’s rule is obtained by choosing
wm (Ω) = 1 and wm ( A ≠ Ω) = 0
•
Smets’ rule is obtained by choosing
wm (∅) = 1 and wm ( A ≠ ∅) = 0
•
the second Dubois and Prade’s rule is obtained by choosing
1
∀A ⊆ 2 Ω , wm ( A) =
∑ m1 Ai m2 (B j )
1 − k i , j / Ai ∪ B j = A,
( )
Ai ∩ B j =∅
Various examples of using these rules of combination can be found in
(Smarandache & Dezert, 2004).
2.4
APPROXIMATIONS
OF
BASIC
ASSIGNMENT
PROBABILITY
Given a frame of discernment of size Ω = N , a bpa m can have
up to 2 N focal elements all of which have to be represented explicitly to
capture the complete information encoded in m . It results that the
combination of two bpa’s requires computation of up to 2 N +1
54
intersections. Orponen (Orponen, 1990) showed that the combination of
various pieces of evidence using Dempster's rule has a #P complexity.
Reducing the number of focal elements of the bpa's under consideration
while retaining
the essence of the information is an important problem for DempsterShafer theory. The most important algorithms known in the literature in
order to solve this problem are the following.
The Bayesian approximation
This approximation (Voorbraak, 1989) reduces a given bpa m to
a probability distribution m
⎧ ∑ m(S )
⎪ S / A⊆ S
if A = 1
⎪
m B ( A) = ⎨ ∑ C m(C )
⎪ C / C ⊆Ω
⎪⎩0
otherwise
Example 2.3.
Let m be a bpa over the frame of discernment
Ω = {a,b, c, d, e} with the values
⎧0.33
⎪0.3
⎪⎪
m( A) = ⎨0.27
⎪0.06
⎪
⎪⎩0.04
if A = {a ,b}
if A = {a ,b , c}
if A = {b , c , d }
if A = {c , d }
if A = {d , e}
Applying the Bayesian approximation to m yields the following result
⎧0.245
⎪0.35
⎪⎪
mB ( A) ≅ ⎨0.245
⎪0.143
⎪
⎪⎩0.015
if A = {a}
if A = {b}
if A = {c}
if A = {d }
if A = {e}
55
This example shows that the Bayesian approximation is not
reasonable in the cases when the number of focal elements of the input
bpa is ≤ Ω .
The k-l-x method
The basic idea of this approximation (Tessem, 1993) is to
incorporate into the approximation mklx only at least k and at most l
focal elements with the highest values in the original bpa and having the
sum of the m -values at least 1 − x , where x ∈ [0,1) . Finally, the values
from the approximation are normalized in order to guarantee the basic
properties of bpa.
Example 2.4. For the bpa m given in the previous example and the
values k = 2 , l = 3 and x = 0.1 the following result is obtaining
⎧ 11
⎪ 30 ≅ 0.366 if A = {a ,b}
⎪
⎪1
mklx ( A) = ⎨ ≅ 0.333 if A = {a ,b ,c}
⎪3
⎪3
⎪10 = 0.3 if A = {b ,c ,d }
⎩
Summarization
This method (Lowrance, Garvey and Strat, 1986) works likewise
as klx . Let k be the number of focal elements to be contained in the
approximation m S of a given bpa m . M denotes the set of the k − 1
subsets of Ω with the highest value in m . Then m S is given by
56
⎧
if A ∈ M
⎪m ( A )
⎪
m S ( A) = ⎨ ∑ m( A' ) if A = A0
⎪ A' ⊆ A ,A'∉M
⎪⎩0
otherwise
where A0 is defined as
A0 =
U A'
A'∉M ,m ( A' )> 0
.
Example 2.5. For the bpa m from the Example 2.3 and k = 3 , mS has
the following values
⎧0.33 if A = {a ,b}
⎪
m S ( A) = ⎨0.3 if A = {a ,b , c}
⎪0.37 if A = {b , c , d , e}
⎩
The D1 approximation
Let m be a bpa to be approximated and k the desired number of
focal elements of the approximated bpa m D . The following notations are
usefully:
a) M + is the set of k − 1 focal elements of m with the highest values
{
}
M + = A1 ,L , Ak −1 ⊆ Ω / ∀A ∉ M + : m( Ai ) ≥ m( A), i = 1, 2,L , k − 1
b) M − is the set containing all other focal elements of m :
M − = {A ⊆ Ω / m( A) > 0, A ∉ M + }.
Given a focal element A ∈ M − of m the collection M A of supersets of
A is computed; if M A is empty (i.e. M + contains no supersets of A )
then the set M 'A is computed:
57
{
}
M 'A = B ∈ M + / B ≥ A , B ∩ A ≠ ∅ ,
where A represents the cardinal of the set A .
The ideas of the D1 algorithm are (Bauer, 1996):
i) all the members of M + are kept as focal elements of m D ;
ii) for every A ∈ M − , the value m( A) is dispensed uniformly among the
members of M A with the smallest cardinality;
iii) if M A is empty then the value m( A) is shared among the smallest
members of M 'A and the value to be assigned to a focal element depends
on the size of its intersection with A .
iv) the procedure of distribution masses is invoked recursively until all
m( A) are assigned to the members of
M + or the set
M 'A becomes
empty. In this case, the remaining value is assigned to Ω which thus
becomes a focal element of m D .
The approximation m D of a bpa with n focal elements can be computed
in time O(k (n − k )) .
Example 2.6. For the bpa m from the Example 2.3 and k = 3 , the
algorithm D1 yields the following values
⎧0.33 if A = {a ,b}
⎪
m D ( A) = ⎨0.51 if A = {a ,b , c}
⎪0.16 if A = {a ,b , c , d , e}
⎩
A mixed algorithm
An analysis of the approximation of the original bpa is made in
(Bauer, 1996) and the conclusion is: the "best" approximation algorithm
58
with respect to decision making does not exist. However, the k − l − x ,
D1 and Bayesian approximations yield definitely better results than the
summarization does.
A new algorithm obtained as a combination between the k − l − x ,
summarization and D1 algorithms is proposed in (Iancu, 2008c). Let m
be the bpa to be approximated; this combination is constructed in three
steps, to obtain a new approximation m M :
S1) Given the parameters k ,l , x , having the same signification as in the
k − l − x method, we keep at least k and at most l focal elements, from
the original bpa, with the sum of m -values at least 1 − x ; let M be the
set of these focal elements.
S2) The set M is considered as the set M + from the D1 algorithm. The
focal elements not included in the set M at the step S1 play the role of
M − set in the D1 algorithm.
S3) The components of all focal elements A ∈ M − not distributed among
the members of M A or M 'A are included in a new focal element; the m M value of this set is computed as sum of m -values of its components. This
idea is used by the summarization method to construct the set A0 .
Example 2.7. For the bpa m from the Example 2.3 and k = 3, l = 2 and
x = 0.1 , the algorithm M yields the following values:
Step S1): Removing {c , d } and {d , e} from m , the constraints concerning
the number of focal elements and the numerical mass deleted are satisfied.
Thus, the following approximation is obtained
⎧0.33 if A = {a ,b}
⎪
m M 1 ( A) = ⎨0.3 if A = {a ,b , c}
⎪0.27 if A = {b , c , d }
⎩
Steps S2 and S3: The Step S2 is applied with the following sets
parameters:
59
M + = {A1 , A2 , A3 } , M − = {A4 , A5 }
m M 2 ( A1 ) = m M 2 ({a ,b}) = m M 1 ({a ,b}) = 0.33
m M 2 ( A2 ) = m M 2 ({a ,b , c}) = m M 1 ({a ,b , c}) = 0.3
m M 2 ( A3 ) = m M 2 ({b ,c , d }) = m M 1 ({b , c , d }) = 0.27
m M 2 ( A4 ) = m M 2 ({c , d }) = m({c , d }) = 0.06
m M 2 ( A5 ) = m M 2 ({d ,e}) = m({d , e}) = 0.04
The set A3 ∈ M + is the unique superset of A4 ∈ M − such that the value
of A3 is increased by 0.06 . Furthermore A3 covers half of the elements
of A5 which adds up another 0.4 / 2 = 0.02 to m M value of A3 . The rest
is assigned to {e}, the set constructed in the Step S3. The approximation
m M of the original bpa m is:
⎧0.33
⎪0.3
⎪
m M ( A) = ⎨
⎪0.35
⎪⎩0.02
if
if
if
if
A = {a ,b}
A = {a ,b ,c}
A = {b , c , d }
A = {e}
An analysis of the error measure associated to an approximation
algorithm can be made using the pignistic probability P induced by a bpa
that can be considered the standard function for decision making in
Dempster-Shafer Theory (Smets, 1988). It is given by
P ({x}) =
m( A )
.
A
A / x∈ A⊆ Ω
∑
The error quantifies the maximal deviation in the pignistic probability
induced by an approximated bpa. Let P0 be the pignistic probability
induced by the original version of a bpa m and Pm' the one induced by
the approximation m' . Then the error measure is defined as
60
Error (m' ) =
∑ P0 ( A) − Pm' ( A) .
A⊆ Ω
For the approximations from the previous examples, we obtain:
Error (m S ) = 0.3225 ,
Error (mklx ) = 0.19 ,
Error (m D ) = 0.466 ,
Error (m M ) = 0.186 .
One can observe that the best result is given by our approximation m M .
We notice that, in all experiments the mixed algorithm gave better results
than the D1 algorithm from which it is derived. If we increase the number
of focal elements of the approximation algorithms the error decreases,
because a greater number of focal elements from the original version of
the bpa and the approximated bpa coincide.
The mixed algorithm can be implemented as follows
input: m , k ,l , x ; output: m M
P1) S := focal elements of bpa m , sorted in decreasing order w.
r. t. m -values
P2) keep the focal elements of m that satisfied the condition
(nf
< l ) and ( (nf < k ) or (tmass < 1 − x ) )
where nf and tmass represent the number and total mass of these focal
sets
P3) M + := the sets given by P2
M − := S − M +
m M ( A) := m( A) ∀A ∈ M +
R := ∅
m M (R ) := 0
for all A ∈ M −
do
61
M A := {B ∈ M + / A ⊂ B}
if M A ≠ ∅
then
M 'A := {B ∈ M A / B is min imal in M A }
for all B ∈ M 'A
do
m M (B ) := m M (B ) +
m( A)
M 'A
end do
else
N A := {B ∈ M + / B ≥ A , A ∩ B ≠ ∅}
if N A = ∅
then
R := R ∪ A
m M (R ) := m M (R ) + m( A)
else
N 'A := {B ∈ N A / B is min imal in N A } = {B1 ,L , Bn }
for all a ∈ A
do
let n1 the number of the sets Bi ∈ N 'A : a ∈ A ∩ Bi
m M (Bi ) := m M (Bi ) +
m( A )
A ⋅ n1
end do
n
⎛
⎞
for all b ∈ A with b ∉ ⎜ A ∩ U Bi ⎟
i =1
⎝
⎠
do
R := R ∪ {b}
m M (R ) := m M (R ) +
62
m( A)
A
end do
end if
end if
enddo
2.5
ALGORITHMS
FOR
GENERALIZED
BELIEF
FUNCTIONS
A major difficulty in applying the generalized Dempster-Shafer
theory to intelligent systems lies in computing the functions f A ,B and
g A ,B
’
where A denotes a continuous fuzzy focal element and B denotes
a continuous fuzzy subset of the frame of discernment. The functions
f A ,B and g A ,B have the following properties:
1)
f A ,B (α ) and g A ,B (α ) are defined on the interval [0, 1]
2)
0 ≤ f A ,B (α ) ≤ h(B ) ,
0 ≤ g A ,B (α ) ≤ h(B )
where h(B ) = sup μ B ( x ) is the height of B
B
x
3)
f A ,B is monotonically nondecreasing
4)
g A ,B is monotonically nonincreasing.
Definition 2.3. A take-off-point α t
of a function f A, B is the value
where the function becomes positive:
f A ,B (α ) = 0 ,
f A ,B (α ) > 0 ,
0 ≤ α ≤ αt
α > αt .
63
Definition 2.4. A saturation point α s of a function
f A ,B
is the value
Definition 2.5. A falloff point α f of a function g A ,B
is the value
where the function reaches B ’s height:
f A ,B (α ) = sup μ B ( x ), α s ≤ α ≤ 1
x
f A ,B (α ) < sup μ B ( x ), α < α s .
x
where the function becomes lower than B ’s height:
g A ,B (α ) = sup μ B ( x ), 0 ≤ α ≤ α f
x
g A ,B (α ) < sup μ B ( x ), α > α f .
x
Definition 2.6. A reset point α r of a function g A ,B
is the value where
the function reaches zero:
g A ,B (α ) > 0, α < α r
g A ,B (α ) = 0, α r ≤ α ≤ 1 .
The following theorems facilitie the computation of
f A ,B and g A ,B .
Theorem 2.3. (Yen, 1992) Suppose A is a continuous convex fuzzy
subset. Then all its level sets Aα are intervals for 0 < α ≤ 1.
Theorem 2.4. (Yen, 1992) A convex fuzzy subset B of the frame of
discernment X has the smallest membership value for an interval
I = [a , b] ⊂ X at one of I ’s endpoints, that is inf μ B ( x ) = μ B (c ) where
x∈[a,b ]
c ∈ {a , b} .
64
Theorem 2.5. (Yen, 1992)
Suppose B is a continuous convex fuzzy
subset of the frame X , and I is an interval of X .
The highest
membership value of B within the interval I is either B ’s height or the
membership value of the I ’s endpoints, that is
if I ∩ Bh ( B ) ≠ ∅
⎧h ( B )
sup μ B ( x ) = ⎨
x∈I
⎩μ B ( xe ) otherwise
where x e is an endpoint of I .
From the previous theorems we get the following formulas
f A ,B (α ) = min{μ B ( x L ), μ B ( x R )}
if Aα ∩ Bh ( B ) ≠ ∅
⎧h ( B )
g A ,B (α ) = ⎨
⎩max{μ B ( x L ), μ B ( x R )} otherwise
where Aα = [x L , x R ] .
In order to reduce the effort in computing f A ,B and g A ,B , a subclass of
convex fuzzy sets will be used.
Definition 2.7. A continuous fuzzy subset A of the frame of discernment
X is a strong convex fuzzy set if it satisfies the following conditions:
1) A is convex
2) ∀x1 , x 2 ∈ X , μ A (x1 ) ∉ ⎧⎨0 , sup μ A ( x )⎫⎬ ,
⎩ x∈A
⎭
μ A ( x 2 ) ∉ ⎧⎨0 , sup μ A ( x )⎫⎬
⎩
x∈A
⎭
and ∀λ ∈ (0 , 1) it results
μ A ( x ) > min{μ A ( x1 ), μ A ( x 2 )} .
Theorem 2.6. (Yen, 1992) A continuous fuzzy set A is strongly convex
if its membership function is in the form of
65
⎧0 ,
⎪ LF ( x ),
⎪⎪ A
μ A ( x ) = ⎨h ,
⎪ RF ( x ),
⎪ A
⎪⎩0 ,
x≤a
a ≤ x ≤ b; a ≠ b
b≤ x≤c
c ≤ x ≤ d; c ≠ d
x≥d
where LFA ( x ) is a continuous monotonically increasing function, h is
the height of A and
RFA ( x ) is a continuous monotonically decreasing
function.
Computing the function f A ,B (Yen, 1992)
We consider that A and B are continuous strong fuzzy sets characterized
by a A , b A , c A , d A , LFA , RFA and a B , bB , c B , d B , LFB , RFB , respectively.
General algorithm
{ (
)
(
)}
1. Compute f A ,B (0 ) = min μ B LFA−1 (0) , μ B RFA−1 (0)
If f A ,B (0) ≠ 0 then go to step 3
else go to next step .
2. Find the take-off point xt . First compute the intersection of A ’s
support and B ’s support:
(x1 , x2 ) = support ( A) ∩ support (B ) .
Then the take-off point is
α t = max{μ A ( x1 ), μ A ( x 2 )}.
Thus, we have
f A ,B (α ) = 0 for 0 ≤ α ≤ α t .
If α t = 1 then stop
else β ← α t and continue with next step.
66
3. If LFA−1 (β ) < bB and RFA−1 (β ) > c B
then find the smallest crossing point α c by solving
the equation
(
)
(
)
LFB LFA−1 (α c ) = RFB RFA−1 (α c )
If α c ∈ [β , 1] then go to step 5
else continue with next step.
4. Compute the saturation point α s .
If α s ∈ [β , 1] and f A ,B (α s ) = 1
then f A ,B (α ) = 1 for α ∈ [α s , 1]
αc ← αs
else α c ← 1 .
5. Define the function in the interval [β , α c ] :
If μ B (LFA−1 (β )) < μ B (RFA−1 (β ))
(
)
then f A ,B (α ) = μ B LFA−1 (α ) for α ∈ [β , α c ]
(
)
(
)
else if μ B LFA−1 (β ) > μ B RFA−1 (β )
(
)
then f A ,B (α ) = μ B RFA−1 (α ) for α ∈ [β ,α c ]
else if
(
)
dμ B LFA−1 (α )
dα
<
α =β
(
)
dμ B RFA−1 (α )
dα
α =β
(
)
(RF (α )) for α ∈ [β ,α ] .
then f A ,B (α ) = μ B LFA−1 (α ) for α ∈ [β ,α c ]
else f A ,B (α ) = μ B
−1
A
c
6. If α c = 1 then stop
else β ← α c and goto step 3.
67
Algorithm for linear strong convex fuzzy sets
A linear strong convex fuzzy set is characterized by an L function
defined below, with the four parameters a , b , c and d .
,x≤a
⎧0
⎪x − a
⎪
,a ≤ x ≤ b
⎪b − a
⎪
L[a , b , c , d ] ( x ) = ⎨1
,b ≤ x ≤ c
⎪d − x
⎪
,c ≤ x ≤ d
⎪d − c
⎪⎩0
,x≥d
If this case the algorithm is simplified because there is at most one
crossing point.
1. Compute f A ,B (0) .
If f A ,B (0) ≠ 0 then α t ← 0 , go to step 3
else continue.
2. Find the take-off point α t . Define f A ,B (α ) = 0 for α ∈ [0, α t ] .
3. Find the crossing point α c , solving equation
(
)
(
)
LFB LFA−1 (α c ) = RFB RFA−1 (α c ) .
If α c is found
then define f A ,B (α ) for α ∈ [α t ,α c ] similar to step 5 of the
general algorithm
else α c ← α t .
4. Find the saturation point α s .
If α s is found
then define f A ,B (α ) for α ∈ [α c , α s ] similar to step 5 of the
general algorithm
68
define f A ,B (α ) = 1 for α ∈ [α s , 1]
else define f A ,B (α ) for α ∈ [α c , 1] similar to step 5 of the
general algorithm.
Example 2.8. Suppose A and B are two linear strong convex
continuous fuzzy subsets of a frame of discernment
[− 100, 100]
whose
membership functions are characterized below
,x≤0
⎧0
⎪x
⎪
,0 ≤ x ≤ 5
⎪5
⎪
, 5 ≤ x ≤ 10
μ A ( x ) = L[0 , 5 , 10 , 15] ( x ) = ⎨1
⎪15 − x
⎪
, 10 ≤ x ≤ 15
⎪ 5
⎪⎩0
, x ≥ 15
⎧0
⎪ x −1
⎪
⎪ 2
⎪
μ B ( x ) = L[1, 3 , 11, 16 ] ( x ) = ⎨1
⎪16 − x
⎪
⎪ 5
⎪⎩0
, x ≤1
,1 ≤ x ≤ 3
, 3 ≤ x ≤ 11
, 11 ≤ x ≤ 16
, x ≥ 16
Assume that A is a focal element and we compute the contribution of A
to the degree of belief in B . We first compute the following functions
(
)
LFB LFA−1 (α ) = LFB (5α ) =
(
)
5α − 1
2
RFB RFA−1 (α ) = RFB (15 − 5α ) =
1 + 5α
.
5
69
Step 1. Compute f A ,B (0) :
Since support ( A) = [0 , 15] we have
f A ,B (0 ) = min{μ B (0), μ B (15)} = μ B (0) = 0 .
Step 2. Compute the take-off point α t :
and define
(x1 , x2 ) = support ( A) ∩ support (B ) = (1, 15)
α t = max{μ A (1), μ A (15)} = μ A (1) = 0.2
f A , B (α ) = 0 for 0 ≤ α ≤ 0.2 .
Step 3. Solving
(
)
LFB LFA−1 (α c ) =
5α c − 1 1 + 5α c
=
= RFB RFA−1 (α c )
2
5
we get the crossing point α c =
(
7
and
15
(
)
f A ,B (α ) = μ B LFA−1 (α ) =
7
5α − 1
for 0.2 ≤ α ≤ .
2
15
Step 4. Solving the equation
(
)
f A ,B (α s ) = μ B RFA−1 (α s ) =
we get the saturation point α s =
(
4
and, thus, we have
5
)
f A ,B (α ) = μ B RFA−1 (α ) =
7
4
1 + 5α
≤α ≤ ,
for
5
15
5
f A ,B (α ) = 1 for
Summarizing, we get
70
1 + 5α s
=1
5
4
≤ α ≤ 1.
5
)
, 0 ≤ α ≤ 0.2
⎧0
⎪ 5α − 1
⎪
⎪ 2
⎪
f A ,B (α ) = ⎨1 + 5α
⎪ 5
⎪
⎪1
⎪⎩
7
15
7
4
, ≤α ≤
15
5
4
, ≤α ≤1
5
, 0.2 ≤ α ≤
Computing g A,B for strong convex fuzzy sets (Yen, 1992)
Similar to the strategy for computing f A,B , the algorithm for
computing the function g A,B involves finding a few critical points: falloff
points and reset points. Unlike f A ,B , g A ,B has no crossing points .
Theorem 2.7. (Yen, 1992) Suppose A and B are continuous strong
convex fuzzy sets. Then g A,B (α ) < h(B ) if and only if
[LF
−1
A
(α ), RFA−1 (α )]∩ [LFB−1 (h(B )), RFB−1 (h(B ))] = ∅ .
Theorem 2.8. (Yen, 1992) Suppose A and B are continuous strong
convex fuzzy sets. If
(
)
g A ,B (α 1 ) = μ B LFA−1 (α 1 ) < h(B )
then
(
)
g A ,B (α ) = μ B LFA−1 (α )
Similarly, if
(
for α1 ≤ α ≤ 1 .
)
g A ,B (α 1 ) = μ B RFA−1 (α 1 ) < h(B )
then
(
)
g A ,B (α ) = μ B RFA−1 (α )
for α 1 ≤ α ≤ 1 .
71
Using the previous two theorems, the function g A ,B is computed
with the following algorithm:
1. Compute g A ,B (0) :
(
)
If Bh ( B ) ∩ LFA−1 (0 ), RFA−1 (0 ) ≠ ∅
then g A,B (0 ) = h(B ) and
continue with next step
{ (
(
))}
else g A ,B (0 ) = max μ B LFA−1 (0 ), μ B RFA−1 (0 )
α f ← 0 and goto step 3.
2. Find the falloff point α f .
[
]
If LFA−1 (1) , RFA−1 (1) ∩ [bB , c B ] ≠ ∅
then g A ,B (α ) = h(B ) for 0 ≤ α ≤ 1 , stop
else α f = max{μ A (bB ), μ A (c B )}
g A ,B (α ) = h(B ) for 0 ≤ α ≤ α f .
3. Compute the reset point α r .
[
]
If α r ∈ α f , 1 and g A,B (α r ) = 0
then g A ,B (α ) = 0 for α r ≤ α ≤ 1
else α r ← 1 .
[
]
4. Define the function in the interval α f , α r :
If μ B (LFA−1 (α f
)) < μ (RF (α ))
(α ) = μ (RF (α ))
then g A ,B
−1
A
B
B
(
f
−1
A
)
else g A ,B (α ) = μ B LFA−1 (α )
for α f ≤ α ≤ α r
for α f ≤ α ≤ α r .
Example 2.9. Suppose A and B are two continuous fuzzy subsets of a
frame of discernment [− 100, 100] whose membership functions are
72
⎧0
⎪x −8
⎪
⎪ 10
⎪
μ A ( x ) = L[8 , 18 , 19 , 21] ( x ) = ⎨1
⎪ 21 − x
⎪
⎪ 2
⎪⎩0
⎧0
⎪ x −1
⎪
⎪ 2
⎪
μ B ( x ) = L[1, 3, 11, 16 ] ( x ) = ⎨1
⎪16 − x
⎪
⎪ 5
⎪⎩0
,x≤8
, 8 ≤ x ≤ 18
, 18 ≤ x ≤ 19
, 19 ≤ x ≤ 21
, x ≥ 21
, x ≤1
,1 ≤ x ≤ 3
, 3 ≤ x ≤ 11
, 11 ≤ x ≤ 16
, x ≥ 16
Assume that A is a focal element. The function g A,B is computed with
the following algorithm.
Step 1. Compute g A,B (0) :
support ( A) ∩ B1 = [8, 21] ∩ [3, 11] = [8, 11] ≠ ∅
Thus g A,B (0) = 1.
Step 2. Compute the falloff point α f :
[LF
Bh(B ) = [3,11] ,
(1), RFA−1 (1)] = [18, 19] ,
[3, 11] ∩ [18, 19] = ∅ ,
μ A (LFB−1 (1)) = μ A (3) = 0 ,
μ A (RFB−1 (1)) = μ A (11) = 0.3 ,
α f = max{0, 0.3} = 0.3 .
−1
A
73
It results
g A ,B (α ) = 1 for 0 ≤ α ≤ 0.3 .
Step 3. Solving the equation
(
)
g A ,B (α r ) = μ B LFA−1 (α r ) =
8 − 10α r
=0
5
we get α r = 0.8 and, thus,
g A ,B (α ) = 0 for 0.8 ≤ α ≤ 1 .
[
]
Step 4. Define g A,B for the interval α f , α r :
Because
μ B (LFA−1 (α f )) = 1 > μ B (RFA−1 (α f )) = 0
we have
(
)
g A ,B (α ) = μ B LFA−1 (α ) = μ B (10α + 8) =
8 − 10α
for 0.3 ≤ α ≤ 0.8 .
5
Summarizing, we have
⎧1
⎪ 8 − 10α
⎪
g A ,B (α ) = ⎨
⎪ 5
⎪⎩0
74
, 0 ≤ α ≤ 0.3
, 0.3 ≤ α ≤ 0.8
, 0.8 ≤ α ≤ 1.
3
UNCERTAIN AND IMPRECISE KNOWLEDGE
REPRESENTATION
3.1
LINGUISTIC
VARIABLES
This concept was introduced by Zadeh (1975a, 1975b, 1975c) to
provide a means of approximate characterization of phenomena that are
too complex or too ill-defined to be amenable to description in
conventional quantitative terms. Just as numerical variables take
numerical values, in fuzzy logic, linguistic variables take on linguistic
values which are words (linguistic terms) with associated degrees of
membership in the set. Thus, instead of a variable height assuming a
numerical value of 1.75 meters, it is treated as a linguistic variable that
may assume, for example, linguistic values of tall with a degree of
membership of 0.92, "very short" with a degree of 0.06, or "very tall"
with a degree of 0.7.
Definition 3.1. A linguistic variable V is characterized by: its name x ,
an universe U , a term set T (x ) , a syntactic rule G for generating
names of values of x , and a semantic rule M for associating meanings
with values.
Example 3.1. For example, if speed of a car is interpreted as a linguistic
variable,
then
its
term
set
could
be
T ( x ) = {slow, moderate, fast , very slow, more or less fast }
where
each
term is characterized by a fuzzy set in an universe of discourse
U = [0, 100]. We might interpret : slow as “a speed below about 40 mph”,
moderate as “speed close to 55 mph”, fast as “a speed above about 70
mph”.
The meaning of a term or, more generally, of a fuzzy set can be
modified by a linguistic hedge or a modifier. A modifier is a word such
as very, more or less, slightly, etc.
If T is a linguistic term and m is a modifier then its membership function
is defined as (m T )( x ) = M (T ( x )) , where M is a transformation associated
to the modifier m . According to the modification of the support of fuzzy
set associated with the linguistic term, there are two classes of modifiers :
intensive which restrict the support and extensive which dilate the
support. Exemples of intensive modifiers are :
(litle A)(x) = (μ A (x))1.3 , (slightly A)(x) = (μ A (x))1.7 , (very A)(x) = (μ A (x))2 ,
(extremely A)(x) = (μ A (x))3 , (very very A)(x) = (μ A (x))4
μ
μ very
Figure 3.1 :
The modifier ”very”
Exemples of extensive modifiers are :
(more or less A)(x) = (μ A (x))1/ 2 ,
⎧2(μ ( x))2 ,
if 0 ≤ μ A ( x) ≤ 0.5
(indeed A)(x) = ⎪⎨ A
⎪⎩1 − 2(1 − μ A (x))2 , if 0.5 < μ A ( x) ≤ 1
76
μ
μ more or less
Figure 3.2 : The modifier “more or less”
We will present some modifiers of trapezoidal distributions.
Depending on the transformation M , one can obtain various possibility to
modify
a
primary
term
T.
For
instance,
if
M = max(0, min (1, α × ϕ int ensive ( x ) + β )) ,
⎧ψ 1 ( x )
⎪
ϕ int ensiv ( x ) = ⎨1
⎪Ψ ( x )
⎩ 2
for x ≤ A
for A ≤ x ≤ B
for x ≥ B
where Ψ1 is a non-decreasing function and Ψ2 is a non-increasing
function, one obtain various classes of intensive operators (Desprès,
1986):
i1) for α > 1 and α + β = 1 one obtain the class of " λ -precise" operators
defined
as
(λ − precise T)(x) = max(0, min (1, λ × ϕ intensiv(x) +1− λ))
which
restrict the support of T ;
μ
μ λ − precise
Figure 3.3 : The
modifier " λ -precise"
77
i2) the operators obtained for β = 0 restrict the core; they are named " μ very" and are defined by (μ − very T)(x) = max (0, min (1 , μ ×ϕ intensive(x))) .
modifier " μ -very"
Figure 3.4 : The
i3) for α = 1 and 0 < β ≤ 1 one obtain the class "ν -very exact"; these
operators restrict the support and the core both and are defined by
(ν −very exact T)(x) = max (0, min (1, ϕ intensive(x) +ν )) .
μ
μν−very exact
Figure 3.5: The
i)
modifier "ν -very exact"
Extensive modifiers
These
modifiers
are
M = min(1, max(0 , α × ϕ extensive ( x ) + β )) and ϕ extensive
obtained
for
having the same
form as ϕ intensive .
Depending on
α and β there is the following classes of extensive
operators (Desprès, 1986):
78
ii1) " π - rather", obtained for 0 < α < 1 and α + β = 1 and defined by
(π − rather T )(x ) = min(1, max(0, π × ϕ extensive (x ) + 1 − π )) ;
this
class
spreads the support of primary term.
μ
μ π − rather
Figure 3.6 : The modifier " π -rather"
ii2) the second class, named " ρ -approximative", is obtained for α > 1
and
β =0 ;
it
spreads
the
core
and
is
defined
by
(ρ − approximative T )(x ) = min(1, max(0, ρ × ϕ extensive (x ))) .
μ
μ ρ − approximat ive
Figure 3.7 : The
modifier " ρ -aproximative"
ii3) another class spread the core and the support and is obtained for
α = 1 and 0 < β < 1 ; it is named " σ - around" and it is defined by
(σ − around T )(x ) = min(1, max(0, ϕ extensive (x ) + σ )) .
μ
μ σ − around
Figure 3.8 : The modifier " σ - around"
79
3.2
FACTS
REPRESENTATION
An elementary piece of information can be expressed as a
proposition of the form: an attribute of an entity takes a particular value.
An elementary proposition can be symbolical expressed by a triple
(attribute, object value). This triple can be reduced of the canonical form
“ X is A ” where X is a variable representing the attribute of the entity
and A is the value. The proposition “ X is A ” can be understood as “the
quantity X satisfies the predicate A ”.
As Zadeh pointed out (1973, 1975a, 1975b, 1975c, 1978, 1979)
the semantic content of the proposition “ X is A ” can be represented by
π X (u ) = μ A (u ), ∀u in the frame of discernment U , where π X is the
possibility distribution restricting the possible value of X and μ A is the
membership function of the set A . This relation means: the possibility
that X may take u as its value is nothing but the membership degree of
u in A . Note that if ∃u 0 ∈ U : π X (u 0 ) = 1 and π X (u ) = 0 ∀u ≠ u 0 the
fact is precise, it is imprecise but non-fuzzy if ∀u ∈ U, π X (u ) ∈ {0, 1}
and it is fuzzy if
π X (u ) ∈ [0 , 1] , ∀u ∈ U . The complete absence of
information about the value of X is represented by π X (u ) = 1 ∀u ∈ U .
Any fact (precise, imprecise or fuzzy) may be uncertain. To
represent our confidence in the truth of a fact it can be qualified by a
numerical degree η ∈ [0, 1] which is a degree of necessity or certainty i.e.
an estimation of the extent to which it is necessary that X is A . The
information “ X is A ” with certainty λ is represented by
π X (u ) = max(μ A (u ), 1 − η )
which means that there is a possibility equal to 1-η that the value of X
lies outside the supp ( A) and a certainty equal to η that X take his value
80
in supp ( A) . Note that “ X is A ” with certainty 0 is equivalent to
“ X is U ”.
Let p a proposition; provided that p is non-fuzzy (i. e. does not
contain any vague predicate), the excluded-middle and the noncontradiction lows hold, then p and ¬p can be regarded as mutually
exclusive alternatives. Then, a possibility distribution (Zadeh, 1978) π
can be attached to the set {p, ¬p} by two numbers π( p ), π(¬p ) ∈ [0, 1] ,
which represent the possibility that p is true and the possibility that ¬p is
true, respectively. The normalization condition
max(π ( p ), π (¬p )) = 1
must hold and it departs from probability theory expressing that at least
one of the alternatives must be completely possible, since the alternatives
are mutually exclusive and cover all the possibilities. The quantity
n( p ) = 1 − π ( p ) can be viewed as a measure of necessity since it
expresses that the necessity of p corresponds to the impossibility of ¬p .
As recalled in (Prade, 1985) possibility and necessity measures are cases
of plausibility and belief functions studied by Shafer (1976).
The uncertainty of a fuzzy fact A can be expressed, too, as a pair
(Bel ( A) , Pls( A)) ;
if we have a basic fuzzy set F , we can measure the
uncertainty of a fuzzy event A as being the pair defined by its possibility
and necessity degrees (Nec( A), Pos( A)) from the set
{(N S ,C;F ( A), ΠT ;F ( A)), (ΓT ;F ( A)), (ΛT ,C;F ( A)), (LT ,C;F ( A)),VT ,C;F ( A)}.
Another possibility is to define the uncertainty as a linguistic
variable modeled as a fuzzy number. It is possible to know the probability
of occurrence of a fuzzy event; in this case we can transform the
probability m into a fuzzy number N = (m, α , β ) (Singer, 1990). We
assume that the values of μ N are equal or less than 0.1 if the deviation
from the middle value is x = ±0.05m ; in other words we assume that the
81
probability with a value differing from the middle value m by ± 5% has
a possibility value only 0.1. With this assumption we have
m- x
0.05m
1= 1−
= 0.1, α = β = m / 18 .
α
α
Therefore, the probability of occurrence m corresponds to the
⎛ m m⎞
uncertainty given by fuzzy number ⎜ m, , ⎟ .
⎝ 18 18 ⎠
The uncertainty can be expressed using the theory of support logic
programming (Baldwin, 1987), too. In this case, the uncertainty of an
assertion A is a support pair [n , p ] , with n ≤ p , and has the following
interpretation: A is necessarily supported to degree n , not A is
necessarily supported to degree 1 − p and p − n measures the unsureness
associated with the support for the pair ( A, not A) .
3.3
IMPLICATIONS
The notion of fuzzy implication plays a major role in order to
represent the rules. Let p = ” x is in A ” and q = ” y is in B ” where A
and B are crisp sets. The interpretation of the material implication
p → q is that the degree of truth of p → q quantifies to what extent q
is at least as true as p , i. e.
⎧1 if τ ( p ) ≤ τ (q )
⎩0 otherwise
τ ( p → q) = ⎨
where τ (⋅) denotes the truth value of a proposition.
Further on, we consider that A and B are fuzzy sets in U and V ,
respectively. The membership function of the implication
82
X is A → Y is B
should be a function of μ A (x ) and μ B ( y ) :
μ A→ B (u ,v ) = I (μ A (u ), μ B (v )) .
We shall use the notation μ A→ B (u ,v ) = μ A (u ) → μ B (v ) .
One possible extension of material implication is
⎧1 if μ A (u ) ≤ μ B (v )
⎩0 otherwise
μ A (u ) → μ B (v ) = ⎨
However, it is easy to see that this fuzzy implication operator is
not appropriate for real-life applications. Namely, let μ A (u ) = μ B (v ) = 0.6 ;
then, μ A (u ) → μ B (v ) = 1 . But for μ A (u ) = 0.6 and μ B (v ) = 0.599 we
obtain μ A (u ) → μ B (v ) = 0 . This example shows that small changes in the
input can cause a big deviation in the output.
In order to define an implication, the following definition is very
important.
Definition 3.2. A fuzzy implication is a function I : [0, 1] → [0 , 1]
2
satisfying the following conditions:
I1: If x ≤ z then I (x, y ) ≥ I (z , y ) for all x, y, z ∈ [0, 1]
I2: If y ≤ z then I ( x, y ) ≤ I ( x , z ) for all x, y, z ∈ [0, 1]
I3: I (0 , y ) = 1 (falsity implies anything) for all y ∈ [0, 1]
I4: I (x, 1) = 1 (anything implies tautology) for all x ∈ [0, 1]
I5: I (1, 0 ) = 0 (Booleanity).
The following properties could be important in some applications:
I6: I (1, x ) = x (tautology cannot justify anything) for all x ∈ [0, 1]
I7: I (x , I ( y , z )) = I ( y , I ( x , z )) (exchange principle) for all x, y, z ∈ [0, 1]
83
I8: x ≤ y if and only if
(implication defines ordering) for all
x, y, z ∈ [0, 1]
I9: I (x, 0 ) = N (x ) for all x ∈ [0, 1] is a strong negation
I10: I ( x, y ) ≥ y for all x, y ∈ [0, 1]
I11: I ( x, x ) = 1 (identity principle) for all x ∈ [0, 1]
I12: I (x , y ) = I ( N ( y ), N ( x )) for all x, y, z ∈ [0, 1] and a strong negation N
I13: I is a continuous function.
The most important families of implications are given (Czogala & Leski,
2001) by
Definition 3.3. A S -implication associated with a t-conorm S and a
strong negation N is defined by I S ,N (x , y ) = S (N (x ), y )
A
R-implication
associated
with
a
t-norm
I T ( x , y ) = sup{z ∈ [0 , 1] / T ( x , z ) ≤ y} ∀x , y ∈ [0 , 1]
T
is
defined
by
A QL-implication is defined by I T ,S ,N ( x , y ) = S ( N ( x ),T ( x , y ))
A t-norm implication associated with a t-norm T is defined by
I T (x , y ) = T (x , y ) .
Generally, QL -implications violates property I1; conditions under
which I1 is satisfied by a QL -implication can be found in (Fodor, 1991).
Although the t-norm implications do not verify the properties of material
implication they are used as model of implication in many applications of
fuzzy logic.
The most important implications, obtained for N ( x ) = 1 − x , are the
following (Czogala & Leski, 2001):
•
Kleene-Dienes: I (x, y ) = max(1 − x , y ) ,
which is a S -implication for S ( x , y ) = max( x , y ) and QL -implication
for T ( x , y ) = max(0, x + y − 1) and S (x , y ) = min(1, x + y )
84
•
Reichenbach: I (x, y ) = 1 − x + xy ,
which is a S -implication for S ( x , y ) = x + y − xy
•
Lukasiewicz: I (x, y ) = min(1 − x + y , 1) ,
which is S -implication for S (x , y ) = min(1, x + y ) , a R -implication for
T ( x , y ) = max(0, x + y − 1)
and
a
QL -implication
for
T (x , y ) = min(x , y ) and S (x , y ) = min(1, x + y )
•
⎧1 if x ≤ y
Rescher-Gaines: I (x, y ) = ⎨
,
⎩0 otherwise
which is a R -implication for T (x , y ) = min(x , y )
•
⎧1 if x ≤ y
,
Godel: I ( x, y ) = ⎨
⎩ y otherwise
which is a R -implication for T (x , y ) = min(x , y )
•
Goguen: I (x, y ) = min( y / x , 1) ,
which is a R -implication for T ( x , y ) = xy
•
Zadeh: I (x, y ) = max(1 − x , min(x , y )) ,
which
is
a
QL -implication
for
T (x , y ) = min(x , y )
and
S ( x , y ) = max( x , y ) .
•
if x ≤ y
⎧1
Fodor: I (x, y ) = ⎨
,
⎩max(1 − x , y ) otherwise
which is a R -implication for T = min0 , a S -implication for S = max0
and a QL -implication for T = min and S = max0 , where
85
if x + y ≤ 1
⎧0
min0 (x , y ) = ⎨
,
⎩min(x , y ) if x + y > 1
if x + y ≥ 1
⎧1
max0 (x , y ) = ⎨
⎩max(x , y ) if x + y < 1
The Lukasiewicz’s implication verifies all the properties I1-I13 and the
Fodor’s implication verifies the properties I1-I12.
Typical examples of t-norm implications are Mamdani: I ( x , y ) = min(x , y )
and Larsen: I (x , y ) = xy .
3.4
RULES
REPRESENTATION
Let X and Y be two variables whose domains are U and V ,
respectively. A causal link from X to Y is represented as a conditional
possibility distribution (Zadeh, 1978; 1979) π Y / X which restricts the
possible values of Y for a given value of X . For the rule
if X is A then Y is B
we have
∀u ∈ U , ∀v ∈ V
π Y / X (v ,u ) = I (μ A (u ), μ B (v )) = μ A (u ) → μ B (v ) .
Frequently, the causal link from X to Y is described as a set of rules of
the form:
if X is Ai then Y is Bi , i ∈ {1, 2 , L , m} .
86
The information given by these rules can be combined thus (Dubois &
Prade, 1985b; 1987):
∀u ∈ U ,∀v ∈ V : π Y / X (v ,u ) = min μ Ai → Bi (u , v ) .
i∈{1,L,m}
A natural consistency condition for these rules is (Lebailly, MartinClouaire & Prade, H., 1987):
(
)
A1 ∩ L ∩ Am ≠ ∅ ⇒ h(B1 ∩ B2 ∩ L ∩ Bm ) = sup min μ B1 (v ), L , μ Bm (v ) = 1
v∈V
.
A rule with multiple consequent can be treated as a set of rules
with a single conclusion; for instance, the rule
if antecedent then C1 and C 2 and …and C n
is equivalent to the rules
if antecedent then C1
if antecedent then C 2
………………………..
if antecedent then C n .
A rule with multiple premise can be broken up into simple rules (Demirli
& Turksen, 1992) when the rules are represented with any S - implication
or any R -implication and the observations are normalized fuzzy sets.
There are various forms in order to represent a rule “if X is A then
Y is B ”, depending on the type of the sets A and B . For instance
(Lebailly, Martin-Clouaire & Prade, H., 1987):
•
for a non-fuzzy condition (represented by a rectangle or by a
point) and an arbitrary conclusion (represented by a λ -trapezium)
if u ∉ A
⎧1
∀u ∈ U , ∀v ∈ V : π Y / X (v ,u ) = ⎨
⎩μ B (v ) if u ∈ A
87
•
for a fuzzy and certain condition (represented by a trapezoidal
distribution) and a non-fuzzy, but eventually uncertain, conclusion
(represented by a λ -rectangle)
∀u ∈ U , ∀v ∈ V : π Y / X (v ,u ) = max(μ B (v ) ,1 − min(λ , μ A (u ))) ,
where λ is the certainty degree associated to the rule
•
for a fuzzy condition (represented as a trapezoidal distribution)
and a fuzzy conclusion (represented by a trapezoidal or λ trapezoidal distribution)
if μ A (u ) ≤ μ B (v )
⎧1
∀u ∈ U , ∀v ∈ V : π Y / X (v ,u ) = ⎨
⎩μ B (v ) otherwise
If p and q are non-fuzzy propositions, then the necessity measure
Nec( p → q ) corresponds to the degree for which it is sufficient to have
p in order to infer that q is true. Likewise, the necessity measure
Nec(¬p → ¬q ) → Nec(q → p )
evaluates to what extent it is necessary that p to be true for having q true.
Thus, from
Nec( p → q ) ≥ a ∈ [0, 1] and Nec(¬p → ¬q ) ≥ a' ∈ [0, 1]
one can represent the rule “if p then q”, where p =” X is A ” and
q =”Y is B ”, A and B being non-fuzzy, by a conditional possibility
distribution (Martin-Clouaire & Prade 1985):
⎧1
⎪1 − a
⎪
∀u ∈ U , ∀v ∈ V , π Y / X (v ,u ) = ⎨
⎪1 − a'
⎪⎩1
A rule
88
if
if
if
if
u ∈ A, v ∈ B
u ∈ A, v ∉ B
u ∉ A, v ∈ B
u ∉ A, v ∉ B .
if X is A then Y is B
with A non-fuzzy set, for which is known the degree of sufficiency s is
equivalent with
if X is A then Y is B'
with
μ B' (v ) = max(μ B (v ),1 − s ) .
Similarly, if the degree of necessity n is known then
μB' (v ) = max(μ B (v ),1 − n ) .
When A is fuzzy, the rule
“if X is A then Y is B with the degree of certainty s”
can be represented by
π Y / X (v ,u ) = max(μ B (v ), 1 − min(s , μ A (u ))) .
In some cases the expert expresses the uncertainty by a number from
[0,1] ; for instance in DIABETO (Buisson, Farreny. & Prade, 1986) the
number 0.8 is used in order to express the epithet “possible”. In other
systems, the uncertainty is given by linguistic terms.
Using support pairs (Baldwin, 1987), the uncertainty is expressed by two
numbers n , p ∈ [0 , 1] , n ≤ p , under the form
if X is A then Y is B : [n , p ]
and has the following interpretation: if the premise is true then the
conclusion is necessarily supported to degree n and not conclusion is
necessarily supported to degree 1 − p (or is possible supported with the
degree p ).
89
4
REASONING WITH IMPRECISE AND/OR
UNCERTAIN RULES
4.1
GENERALIZED
MODUS PONENS RULE
Zadeh (1979) introduced the theory of approximate reasoning in order
to work with imprecise and uncertain information. The basic problem of
approximate reasoning is to find the membership function of the
consequence C from the rule base {R1 , L , Rn } and the fact A :
R1 :
if x is A1 then y is C1
R2 :
if x is A2 then y is C 2
…………………………….
Rn :
if x is An then y is C n
x is A
Fact
________________________________
y is C
Consequence
In fuzzy logic and approximate reasoning, the most important fuzzy
implication inference rule is the Generalized Modus Ponens (GMP),
based on the compositional rule of inference suggested by Zadeh (1973).
Definition 4.1. (compositional rule of inference)
Rule
if x is A then y is B
Fact
x is A'
_____________________________________
Consequence
y is B'
where the consequence B' is determined as a composition of the fact and
the fuzzy implication operator
B' = A' o( A → B )
that is
μ B' (v ) = sup min(μ A' (u ), μ A→ B (u ,v ))
u∈U
where A and A' are fuzzy subsets of the universe U and B and B' are
fuzzy subsets of the universe V .
In many practical cases instead of
sup − min composition one use
sup− T composition, where T is a t-norm:
B' = A' o T ( A → B )
that is
μ B' (v ) = sup T (μ A' (u ), μ A→ B (u ,v )) .
u∈U
Suppose that A, B and A' are fuzzy numbers. The Generalized Modus
Ponens should satisfy some rational properties.
Property 4.1. Basic property:
if x is A then y is B
x is A'
_____________________
y is B
92
A = A'
B' = B
Property 4.2. Total indeterminance
if x is A then y is B
x is ¬A
_________________________
y is unknown
A
B
A'
B'
Property 4.3. Subset
if x is A then y is B
x is A' ⊂ A
_____________________
y is B
A
B' = B
A'
93
Property 4.4. Superset
if x is A then y is B
x is A'
_______________________
y is B' ⊃ B
A
A'
B
B'
Not any combination (t-norm, implication) satisfies all four properties
listed above; for instance the combination (min, Mamdani) do not
verifies the total indeterminance and superset properties. Suppose we are
given a set of fuzzy rules
R1 :
if x is A1 then y is B1
R2 :
if x is A2 then y is B2
……………………………………….
Rn :
if x is An then y is Bn
x is A'
Fact
________________________________
y is C
Consequence
The i -th fuzzy rule
Ri : if x is Ai then y is Bi
is implemented by a fuzzy implication I i defined as
I i (u , v ) = μ Ai → Bi (u , v ) = μ Ai (u ) → μ Bi (v ) .
94
There are two main approaches to determine the membership function
of consequence C . If the combination operator is denoted by
⊕ ∈ {min, max} ( or, more generally, ⊕ ∈ {T , S } ) we can:
•
combine the rules first:
μ R (u ,v ) = μ A1→ B1 (u ,v ) ⊕ L ⊕ μ An → Bn (u ,v )
μ C (v ) = sup T (μ A' (u ), μ R (u ,v ))
u∈U
•
fire the rules first:
μ B'k (v ) = sup T (μ A' (u ) , μ Ak → Bk (u , v )), k ∈ {1, 2,L n}
u∈U
μ C (v ) = μ B '1 (v ) ⊕ μ B '2 (v ) ⊕ L ⊕ μ B 'n (v ) .
A question appears: the two methods give the same result? The answer is
given by the following theorems (Fullér, 1995):
Theorem 4.1. If T = min or T (x , y ) = xy and ⊕ = max the two method
give the same result.
Theorem 4.2. If ⊕ = min and T is an arbitrary t-norm then the
conclusion inferred combining the rules first is included in the conclusion
inferred firing the rules first.
An analysis of the conclusion inferred by GMP reasoning when
between the premise “ X is A ” and the observation “ X is A' ” there is one
of the relations A ⊆ A' , A' = A , A ⊇ A' , A and A' have a partial
overlapping is presented in (Iancu, 1998c; 2008a; 2008b), where one
works with fuzzy if-then rules with a single input single output and tnorm
t ( x , y ) = max((1 + λ )( x + y − 1) − λxy ) , λ ≥ −1
as a composition operation, using the following set of
operators.
implication
Reichenbach : I R (u , v ) = 1 − μ A (u ) + μ A (u )μ B (v )
95
Willmott: I W (u , v ) = max(1 − μ A (u ) , min(μ A (u ) , μ B (v )))
Mamdani: I M (u , v ) = min(μ A (u ) , μ B (v ))
⎧1 if μ A (u ) ≤ μ B (v )
⎩0 otherwise
Rescher-Gaines: I RG (u , v ) = ⎨
Kleene-Dienes: I KD (u , v ) = max(1 − μ A (u ) , μ B (v ))
if μ A (u ) ≤ μ B (v )
⎧1
Brouwer-Gödel: I BG (u , v ) = ⎨
⎩μ B (v ) otherwise
⎧1
⎪
Goguen: I G (u , v ) = ⎨ μ B (v )
⎪ μ (u )
⎩ A
if
μ A (u ) ≤ μ B (v )
otherwise
Lukasiewicz: I L (u , v ) = min(1 − μ A (u ) + μ B (v ) , 1)
if μ A (u ) ≤ μ B (v )
⎧1
⎩max(1 − μ A (u ), μ B (v )) otherwise
Fodor: I F (u , v ) = ⎨
Working with the rule
if X is A then Y is B
and the observation
X is A' ,
where A and A' are fuzzy subsets of the universe U and B is fuzzy
subset of the universe V , the following results are obtained
Theorem 4.3. If the premise contains the observation
(i. e. μ A' (u ) ≤ μ A (u ) ∀u ∈ U ) then
1) μ B ' (v ) = μ B (v ) for every of the cases
1.1 I = I R and λ ≥ 0
1.2 I = I R , λ < 0 and μ B (v ) ≥
1.3 I = I W and λ ≥ 0
96
λ
λ −1
1.4 I = I W , λ < 0 and μ B (v ) ≥ −
λ
4
1 .5 I = I M
1.6 I ∈ {I KD , I L } and λ ≥ 0
1.7 I = I KD , λ < 0 and μ B (v ) ≥ −
1.8 I ∈ {I BG , I G }
1.9 I = I F , λ ≥ 0 or μ B (v ) ≥ −
λ
4
λ
4
2) μ B ' (v ) ≤ μ B (v ) for I = I RG
3) μ B ' (v ) < −
λ
4
for I ∈ {I W , I KD , I F },
λ < 0 and μ B (v ) < −
4) μ B ' (v ) ≤ −
(μ B (v )(1 + λ ) − λ )
4λ (1 − μ B (v ))
λ
4
2
for I = I R , λ < 0 and μ B (v ) ≤
λ
λ −1
− λ (1 + μ B (v )) + 4(1 + λ )μ B (v )
5) μ B ' (v ) <
4
for I = I L and λ < 0.
2
Theorem 4.4. If the premise and the observation are identical then:
1) μ B' (v ) = μ B (v ) for the cases
1.1 ∀I ∈ {I R , I W , I M , I RG , I KD , I BG , I G , I L } and λ ≥ 0
1.2 I = I F λ ≥ 0 or μ B (v ) ≥ 0.5
2) for λ < 0 we have
a) μ B' (v ) = μ B (v ) for
a1 ) I ∈ {I M , I RG , I BG , I G }
a 2 ) I = I R and
λ
λ −1
≤ μ B (v )
97
⎛ λ
⎞
b) μ B ' (v ) = max⎜ − , μ B (v )⎟ for the cases
⎝ 4
⎠
b1 ) I ∈ {I W , I KD }
b2 ) I = I F and μ B (v ) < 0.5
[
(
1 + λ )μ B (v ) − λ ] 2
c )μ B' (v ) = −
4λ (1 − μ B (v ))
for I = I R and μ B (v ) ≤
λ
λ −1
− λμ B2 (v ) + 2μ B (v )(λ + 2) − λ
4
for I = I L
d) μ B ' (v ) =
Theorem 4.5. If the observation contains the premise (i. e.
μ A' (u ) ≤ μ A (u ) ∀u ∈ U ) then
a) μ B' (v ) = μ B (v ) if I = I M
b) μ B' (v ) ≥ μ B (v ) if
I ∈ {I R , I W , I RG , I KD , I BG , I G , I L , I F }.
Theorem 4.6. If A and A' have a partial overlapping then
1) μ B' (v ) = 1 if core( A' ) ⊄ support ( A) and
I ∈ {I R , I W , I RG , I KD , I BG , I G , I L }
2) μ B' (v ) ≥ μ B (v ) if core( A' ) ⊆ support ( A) and
I ∈ {I KD , I BG , I G , I L } or
I = I W and μ B (v ) ≤ 0.5 or
I = I R and λ ≥ 0
3) μ B' (v ) ≤ μ B (v ) if I = I M
4) μ B ' (v ) ∈ [μ B (v ), − λ + (1 + λ )μ B (v ))
if core( A' ) ⊆ support ( A) , I = I R and λ < 0
5) μ B ' (v ) ∈ [0, 1] if core( A' ) ⊆ support ( A) and
98
I = I RG
6) μ B ' (v ) ≥ 0.5 if core( A' ) ⊆ support ( A) ,
and μ B (v ) > 0.5
7) μ B' (v ) = 1 if core( A' ) ⊄ Aμ B (v ) and I = I F
I = IW
μ B ' (v ) ≥ μ B (v ) if core( A' ) ⊂ Aμ B (v ) and I = I F
where Aα denotes the α -cut of A .
Remark 4.1. If the observation is more precise than the premise of the
rule then it gives more information than the premise. However, it does not
seem reasonable to think that the generalized modus ponens allows to
obtain a conclusion more precise than that of the rule. The result of the
inference is valid if
μ B ' (v ) = μ B (v ) ∀v ∈V .
Sometimes, the deduction operation allows the reinforcement of the
conclusion, as in the example (Mizumoto & Zimmerman, 1982).
Rule: If the tomato is red then the tomato is ripe.
Observation: This tomato is very red.
If we know that the maturity degree increases with respect to color, we
can infer
this tomato is very ripe.
On the other hand, in the example
Rule: If the melon is ripe then it is sweet
Observation: The melon is very ripe
we do not infer that
the melon is very sweet
because it can be so ripe that it can be rotten.
99
Remark 4.2. This examples show that the expert must choose the
deduction operation depending on the knowledge base. If he has not
supplementary information about the connection between the variation of
the premise and the conclusion, he must be satisfied with the conclusion
μ B' (v ) = μ B (v ), ∀v ∈ V .
The Theorem 4.3 says that for this we can choose λ ≥ 0 .
Remark 4. 3. When the observation and the premise of the rule coincide
then the convenient behavior of the fuzzy deduction is to obtain an
identical conclusion. But, the Theorem 4.4, in the case of λ < 0 , can give
a different conclusion. This fact indicates the appearance of an uncertainty
in the conclusion, that is totally unreasonable. In order to avoid this
possibility we suggest to use the value λ ≥ 0 .
Remark 4.4. The result obtained by Theorem 4.5 is very general and it
does not offer enough information about the conclusion inferred. The
result of inference depends on the compatibility between the observation
and the premise of the rule.
To express this compatibility, the following quantities (Despres, 1986) are
frequently used:
(a) D.I = sup{u∈U / μ A (u )=0} μ A' (u ) ,
named uniform degree of non-determination; it appears when the support
of the premise does not contain the support of the observation;
(b) I = sup{u∈U / μ A' (u ) ≥ μ A (u )} (μ A' (u ) − μ A (u )) .
The uncertainty propagated is expressed with the help of D.I and I and
it corresponds to value μ B' on the set {v ∈V / μ B (v ) = 0} .
Theorem 4.7. If μ A' (u ) ≥ μ A (u ) ∀u ∈ U then the uncertainty propagated
during the inference is
μ B ' (v ) < I if λ > 0 and the implication is in the set {I R , I W , I KD , I L , I F }
100
μ B' (v ) = I if λ = 0 and the implication is in the set {I R , I W , I KD , I L , I F }
μ B' (v ) > I if λ < 0 and the implication is in the set {I R , I W , I KD , I L , I F }
μ B' (v ) > D.I if the implication is in the set {I RG , I BG , I G }
μ B' (v ) > I if the implication is I M .
A study of Generalized Modus Ponens reasoning using t-norms with threshold
was initiated in (Iancu, 2009c), where the t-norm product with a single threshold
⎧1 − k
xy
if x ≤ k and y ≤ k
⎪
Tk ( x , y ) = ⎨ k
⎪⎩min( x , y ) if x > k or y > k
and the Fodor’s implication are used. A comparison with the case when the
standard t-norm product is used shows that the t-norm with threshold
gives better results.
4.2
UNCERTAIN
GENERALIZED MODUS PONENS
REASONING
We consider the case when the rules and the facts are uncertain, the
uncertainty being expressed by belief degrees or by linguistic variables.
The reasoning with belief degrees has the form
if x is A then y is B : α
x is A'
: β
_______________________
y is B' : γ
where α , β, γ ∈ [0, 1] represent the belief degrees corresponding to the
rule, the fact and the conclusion, respectively. The conclusion is obtained
101
as B' = A' o T ( A → B ) and its associated belief degree is γ = ∗T (α , β ) ,
where o T and ∗T are t-norms operators. For multiple premises, the
inferred schema becomes
An ∧ L ∧ A1 → C : α
A' n
: βn
………………………………
A'1
: β1
______________________
C' : γ
where the conclusion C' is obtained with the formula
C' = A'1 o T (L o T ( A' n o T ( An ∧ L ∧ A1 → C )) ...)
and the belief degree is
γ = ∗T (α ∗T (β 1 , L , β n )) .
If the same conclusion is obtained from m different rules, with the belief
degrees γ 1 , γ 2 , L , γ m , then the global belief degree is computed with the
formula γ = ∗S (γ 1 , γ 2 , L, γ m ) where ∗S is a t-conorm. Now, consider the
case of linguistic uncertainty (Leung & Lam, 1989; Leung, Wong & Lam,
1989). Suppose there is a rule and a fact
if
X is A then Y is B :
FN R
:
FN F
X is A'
____________________________________
Y is B' : FN C
FN R , FN F and FN C are fuzzy numbers denoting the uncertainty of the
rule, of the fact and of the conclusion, respectively. If the object X is
non-fuzzy, A and A' must be the same atomic symbol in order to apply
this rule. Therefore, B' = B and the fuzzy uncertainty FN C is calculated
using fuzzy number multiplication of FN R and FN F :
102
FN C = FN R ⊗ FN F .
If
X and Y are fuzzy objects, the conclusion B' is obtained using
Generalized Modus Ponens and the uncertainty FN C is
FN C = (FN R ⊗ FN F ) × D
where D is the degree of matching between A and A' .
If X is fuzzy and Y is not-fuzzy then B' = B and the uncertainty FN C is
computed as
FN C = (FN R ⊗ FN F ) × M
where M is the “similarity” between the fuzzy sets F and F' of A and
A' , respectively:
if Ν (F ; F' ) > 0.5 then M = Π (F ; F' )
else
M = ( N ( F ; F ' ) + 0 .5 ) × Π ( F ; F ' ) ,
Π (F ; F' ) = max(min(μ F (u ), μ F (v ))) , Ν (F ; F' ) = 1 − Π (¬F ; F' ) .
Because a rule with multiple consequence can be treated as
multiple rules with a single conclusion, only the problem of multiple
propositions in the antecedent and a single proposition in the consequence
need to be considered. If the object in the consequent proposition is nonfuzzy, no special treatment is needed. However, if the consequent
proposition is fuzzy, the fuzzy set of the value B' in the conclusion is
calculated using the following two basic algorithms (Mizumoto, 1985):
a) rule:
if A1 and A2 then Y is B
fact:
conclusion:
A'1 , A' 2
Y is B'
algorithm: the fuzzy set representing B' in the conclusion is obtained
by union of the fuzzy sets F1 and F2 , where the fuzzy set F1 is obtained
by GMP from the rule “ if A1 then Y is B ” and the fact A'1 , while the
103
fuzzy set F2 is obtained from the rule “ if A2 then Y is B ” and the fact
A' 2 .
b) rule:
fact:
if A1 or A2 then Y is B
A'1 , A' 2
Y is B'
conclusion:
algorithm: the same as above except fuzzy intersection rather than
union should be applied on the fuzzy sets F1 and F2 .
The above two algorithms can be applied repeatedly to handle any
combination of antecedent propositions. For instance
Rule: IF ( the productivity is high OR the production-cost is low) AND
the sales are high
THEN the performance of the company should be good
Facts: The productivity is very high
The production-cost is low
The sales are rather high
where “low”, “high”, “very low” and “rather high” are fuzzy concepts.
Let F1 be the fuzzy set obtained by making an inference from the rule
IF the productivity is high THEN the performance of the company
should be good
and the fact
The productivity is very high;
F2 be the fuzzy set obtained by making an inference from the rule
IF the production-cost is low THEN the performance of the
company should be good
and the fact
The production-cost is low;
F3 be the fuzzy set obtained by making an inference from the rule
IF the sales are high THEN the performance of the company should
be good
104
and the fact
The sales are rather high.
The fuzzy set F representing the fuzzy value of the object ”the
performance of the company” in the conclusion is determined as follows:
F = fuzzy union between F12 and F3 , where F12 is the fuzzy intersection
between F1 and F2 .
As a result, F will indicate the fuzzy concept “good” and the conclusion
“the performance of the company is good” is drown. The fuzzy
uncertainty of the conclusion deduced from rules with multiple antecedent
propositions is calculated by employing fuzzy number arithmetic
operators in the formulae used by MYCIN’s CF model. For example, for:
rule:
if A1 and A2 then C : FN R
fact:
conclusion:
we have
: FN F1 , FN F 2
A'1 , A' 2
C' : FN C'
(
)
FN C' = min _fn FN F1 , FN F2 ⊗ FN R
where FN R , FN F1 , FN F 2 and FN C' represent the uncertainty of the rule,
the facts and the conclusion, respectively, ⊗ is the fuzzy multiplication
and the minimum of two fuzzy numbers is given by
μ min_ fn ( A ,B ) ( z ) = max min(μ A ( x ) , μ B ( y )) .
z = min ( x,y )
Similarly, the maximum is defined as
μ max_ fn ( A ,B ) ( z ) = max max(μ A ( x ) , μ B ( y )) .
z = max ( x,y )
If logical OR is used, the calculation is the same except that the
fuzzy maximum is taken rather than the minimum. For the combination of
antecedent propositions, the two calculations can be applied repeatedly to
handle fuzzy uncertainties corresponding to the matched facts and the
rule.
105
In some cases, there is more than one rule with the same consequent
proposition. Each of these rules can be treated as giving contributed
evidence towards the conclusion. The conclusion CR is obtained from the
evidence contributed by these rules and facts. For instance:
rules
facts
r1 :
if A1 then B
r2 :
if A2 then B
A'1 , A' 2
conclusion C R obtained from C1 : FN C1 and C 2 : FN C2 ,
where
C1 and
C 2 are the conclusions obtained from r1 & A'1 and
r2 & A' 2 , respectively, and FN C1 , FN C2 represent the uncertainties of the
conclusions. If the object involved in the consequent proposition is fuzzy
then the fuzzy set corresponding to the combined conclusion
C R is
obtained by taking the fuzzy intersection between the fuzzy sets
corresponding to C1 and C 2 . The two uncertainties FN C1 and FN C2
can be combined, to obtain the overall uncertainty, using a formula
similar to the evidence combination from MYCIN’s CF model:
FN C = FN C1 ⊕ FN C2 ΘFN C1 ⊗ FN C2 .
4.3
UNCERTAIN
HYBRID REASONING
Further on we present an Uncertain Reasoning System which is a
generator of expert systems that knows to process uncertain and imprecise
knowledge. Uncertainty is measured in three ways:
1. Using the support logic programming technique of
Baldwin(1987), where a conclusion is supported by a certain
degree of evidence and its negation is also supported to certain
106
degree, and these dual measures of uncertainty are named support
pair. A support pair (n , p ) comprises a necessary and possible
support and it is interpreted as an interval in which lies the
probability. A voting interpretation is also useful: the lower
(necessary) support, n , represents the proportion of a population
voting in favor of a proposition and
proportion voting against;
( p − n)
(1 − p )
represents the
represents the proportion
abstaining. This method is used in PROSUM system (Iancu,
1997c).
2. Using the method of linguistic variables. For each fact which does
not contain a linguistic term as an argument, the uncertainty is
expressed either by a linguistic variable or by a probability of
occurrence of the respective fact. The system translates the
linguistic variables and the probabilities into fuzzy numbers. This
technique is used in RESYFU system (Iancu, 1997d).
3. A mixture of both previous methods, the uncertainty associated to
the answers of the queries being established by the user, such as in
UNRESY system (Iancu, 2000).
The method used for the management of uncertainty is established by a
dialogue user-system and it requires some techniques for the following:
•
Propagation imprecision and uncertainty in deductive inferences
from the condition part to the conclusion part of the rule
•
Evaluating an approximate matching between an imprecise rule
condition and an imprecise fact
•
Combining items of information issued from different sources.
The system works with two types of unification, which uses generalized
belief functions. Knowledge is represented as Prolog clauses with
addition of a uncertainty
A : − B1 , B2 ,..., Bn : unc
107
where A is an atom, B1 ,B2 ,..., Bn are literals and unc is either a linguistic
variable var or a support pair [n , p ] . The previous representation has the
following interpretation: for each assignment of each variable occurring in
the clause, if B1 , B2 , ..., Bn are all true then A is true with degree var or
A is necessarily supported to degree n and ¬A is necessarily supported
to degree 1 − p . If the body of the clause is empty we have an unit clause,
represented as A : unc and having an interpretation that is immediately
deduced from one of the rules. The linguistic variable is an element of the
set
L = { impossible , extremely unlikely , verylowchance , small chance ,
it may , meaningful chance , most likely , extremelylikely , certain }
Each element in the above set represents a statement of linguistic
~ = (m ,α , β )
probability and their semantic is provided by fuzzy number m
⎧ m− x
⎪⎪1 − α
μ m~ (x ) = ⎨
⎪1 − x − m
⎪⎩
β
for x ≤ m , α > 0
for x ≥ m , β > 0
where m is the mean value and α and β are the left and right spreads.
We obtain the following representation for the semantics of the proposed
term set L:
impossible
extremely unlikely
very low chance
small chance
it may
meaningful chance
most likely
extremely likely
certain
108
= (0, 0, 0)
= (0.015, 0.01, 0.05)
= (0.14, 0.06, 0.05)
= (0.29, 0.05, 0.06)
= (0.495, 0.09, 0.07)
= (0.715, 0.05, 0.06)
= (0.85, 0.06, 0.05)
= (0.985, 0.05, 0.01)
= (1, 0, 0).
If a fact has as uncertainty a probability m , we transform it into a fuzzy
~ = ( m , m 18, m 18 ). If the user of UNRESY
number (Singer, 1990): m
system desires that the answers of the queries to have the uncertainty
expressed as a support pair then
a) each linguistic variable is translated in a support pair as follows:
impossible
extremely unlikely
very low chance
small chance
it may
meaningful chance
most likely
extremely likely
certain
= [0, 0]
= [0, 0.1]
= [0.1, 0.2]
= [0.2, 0.4]
= [0.4, 0.6]
= [0.6, 0.8]
= [0.8, 0.9]
= [0.9, 1]
= [1, 1].
b) if a predicate of a rule contains a linguistic term defined as a fuzzy set
A then the uncertainty associated to such predicate is estimated as a pair
[Bel ( A), Pls( A)] .
c) a probability m interpreted as a pair [m , m] .
If the uncertainty of the answer must be a fuzzy number then:
i) the pair [n , p ] is interpreted as a term Li = (mi , α i , β i ) from L for
which mi is the nearest to (n + p ) 2
ii) if a predicate contains a linguistic term defined as a fuzzy set A then
the probability of the event represented by this predicate is Bel ( A) ; after
that this probability is transformed in a fuzzy number as above.
The system works with two types of unification:
a) syntactic unification, in which matching of two terms occurs if
substitutions can be made to make the two terms equivalent symbol by
symbol;
109
b) semantic unification, which is used if a predicate contains as argument
a linguistic term defined as a fuzzy set.
We consider the rule
If ( X finished secondary-school and its result of mathematics is good and
its result of computer science is very good and its result of English is fair
to good) then my recommendation for the choice of an university faculty
should be computer science with belief unc
which is represented in the form
faculty(X, computer-science):-school(X, finished),
math(X, good_),
comp-sci(X,very_good_),
English(X,fair_to_good_): unc
The terms X, computer-science and finished are purely syntactic terms
which are matched appropriately by the standard unification procedure of
Prolog; very_good_, good_ and fair_to_good_ are semantic terms. The
character "_" at the end of a term says that it is a linguistic term. If we
obtained an unification X = john in the predicate school ( X , finished )
then, in order to compute the uncertainty for the conclusion
faculty ( john , computer − science) , it is necessary to determine the
uncertainties for
math( john , good _ ) , comp − sci ( john , very _ good _ )
english( john , fair _ to _ good _ ) .
and
The
uncertainty
for
math( john , good _ ) , for instance, is computed in this way:
1. if math ( john , good _ ) is a conclusion of a rule or a fact from the
knowledge base, this predicate is used further on with the uncertainty
brought by it;
2. Let Y be the linguistic variable with the name " result " , V (Y ) the
set
of
names
of
linguistic
values
of
math ( john , val _ ): unc , with val _ ∈ V (Y )
110
Y.
If
we
obtained
we use this piece of
information in order to compute the uncertainty for math ( john , good _ )
and this calculus depends on the form of unc :
(a) if unc = [a , b] , then for fuzzy set good_ we consider the focal sets in
the form:
val _ with m(val _ ) = p ∈ [a , b] and ¬val _ with m(¬val _ ) = 1 − p .
Because Bel (good _ ) and Pls ( good _ ) are monotonous functions with
respect to p , we compute the support pair c, d for math( john , good _ )
as follows:
- from the focal sets val _ with m(val _ ) = a and
¬val _ with
m(¬val _ ) = 1 − a we obtain a support pair c1 , d1 ;
- from the focal sets val _ with m(val _ ) = b and
¬val _ with
m(¬val _ ) = 1 − b we obtain a support pair c2 , d 2 ;
- c = min{c1 , c2 } and d = max{d1 , d 2 }.
(b) If unc is a linguistic variable translated into a fuzzy number (n , α , β )
then we compute Bel (good _ ) using for fuzzy set good _ the focal sets
val _ with m(val _ ) = n and ¬val _ with m(¬val _ ) = 1 − n ;
Afterward, the value computed is transformed into a fuzzy number.
3. If the knowledge base does not contain information about
math ( john , good _ ) , the system asks the fuzzy set good _ and their focal
sets
and
afterwards
computes
Bel (good _ )
or
the
pair
[Bel (good _ ), Pls(good _ )] .
Given a program as a set of program clauses representing facts and
rules, the programming system has to calculate uncertainty associated
with solutions queries to the system. A proof path for a final solution is
determined in the normal Prolog style and the uncertainty is determined
for each branch in the proof path. If more than one proof path is available,
then the uncertainties are combined from different proof paths to obtain
the overall uncertainty for the conclusion. The determination of
111
uncertainty value for one proof path is done by determining the
uncertainties for disjunction, conjunction and negation statements as well
as combining those values when combining values from different proof
paths supporting the same conclusion. Using the dialogue with the
system, the user can choose the wanted formulas from those implemented
or can propose other in order to manage the uncertainty.
If we present to UNRESY system the rule
performance(comp_sci,good):-result(math,very_good_),
result(English,good):certain
and we want to work with uncertainty expressed as pair then the system
generates the following clauses of a Prolog program:
conj(N1,P1,N2,P2,N,P):-N=……, P=……..
rule(N1,P1,N2,P2,N,P):-N=…., P=….
performance(comp_sci,good,N,P):go(result,[math,very_good_],N01,P01),
result(English,good,N02,P02),
conj(N01,P01,N02,P02,N03,P03),
rule(N03,P03,1,1,N,P).
……………………………………………………………………………………………………………………………………
go(X,L,N,P):-cf(X,L,N,P),!.
go(X,L,N,P):cf(X,L1,N1,P1),pr_match(X,L,L1,N1,P1,N,P),!
go(X, L, N, P):-pr_bel([X|L],N,P).
…………………………………………………………………………………………………………………………………
include “pr_claus.pro”
The rules with the head go( X , L , N , P ) solve the semantic unification;
pr_claus.pro represents the clauses for computing the generalized belief
functions. The system can process any Prolog statement and can decide if
two rules with the same head are or not in conflict.
112
4.4
FUZZY
LOGIC CONTROL
Conventional controllers are derived from control theory
techniques based on mathematical models of the open-loop process,
called system, to be controlled. Fuzzy control provides a formal
methodology for representing, manipulating and implementing human’s
heuristic knowledge about how to control a system. Fuzzy logic control is
the result of converting the linguistic control strategy based on expert
knowledge into control rules and of combining fuzzy logic theory with
inference processes. This fuzzy logic control is very useful when the
needed models are not known or when they are too complex for analysis
with conventional quantitative techniques.
In fuzzy logic controller (FLC) the dynamic behavior of a fuzzy
system is characterized by a set of linguistic description rules based on
expert knowledge. The expert knowledge is usually of the form
IF (a set of conditions are satisfied ) THEN (a set of consequences can be
inferred).
Because the antecedents and the conditions are associated with fuzzy
concepts such a rule is called fuzzy conditional statement. A fuzzy
control rule is a fuzzy conditional statement in which the antecedent is a
condition in its application domain and the consequent is a control action
for the system under control. Fuzzy logic control systems usually consist
of four major parts: Fuzzification interface, Fuzzy rule-based, Fuzzy
inference machine and Defuzzification interface.
113
Fuzzifier
crisp x in U
fuzzy set in U
Fuzzy
Fuzzy
Inferrence
Rule
Base
Engine
fuzzy set in V
Defuzzifier
crisp y in V
Figure 4.1: Fuzzy logic controller
A fuzzification operator has the effect of transforming crisp data into
fuzzy set. In most of cases one use fuzzy singletons as fuzzifiers:
fuzzifier ( x0 ) = x0 where x0 is a crisp input value from a process.
x0
1
x0
Figure 4.2: Fuzzy singleton as fuzzifier
For a system with two inputs and a single output, the fuzzy rule-base has
the form
R1:
if x is A1 and y is B1 then z is C1
R2 :
if x is A2 and y is B2 then z is C 2
………………………………………………
Rn :
114
if x is An and y is Bn then z is C n
Ai ⊂ U , Bi ⊂ V , Ci ⊂ W
where
∀i ∈ {1, 2, L , n} and the rules are
aggregated by union or intersection. If the crisp inputs x0 and y0 are
presented then the control action is obtained as follows:
•
The
firing
level
of
i -th
the
rule
is
determined
by
is
computed
by
α i = min(μ A (x0 ), μ B ( y0 )) ; evidently, any t-norm can be used
i
i
instead of min
•
The
output
of
i -th
μ C' (w) = α i → μ C (w) ∀w ∈ W
i
•
the
rule
i
The overall system output, C , is obtained from the individual rule
outputs C' i by an aggregation operation
μ C (w) = Agg (μ C '1 (w), L , μ C 'n (w)) ∀w ∈ W .
Defuzzification methods
The output of the inference process is a fuzzy set that specifies the
possibility distribution of control action. In the on-line control, a crisp
control action is usually required. Consequently, one must defuzzify the
fuzzy control action inferred, namely z 0 = defuzzifier (C ) . The most used
defuzzification operators are
•
Center of Area. The defuzzified value of a fuzzy set C is
defined by
∫ zμ (z )dz
C
z0 =
W
∫ μ (z )dz
C
or z 0 =
∑ z μ (z )
∑ μ (z )
j
C
C
j
j
W
depending on the form of membership function μ C : continuous or
discrete
115
•
First of maxima. The defuzzified value of a fuzzy set C is the
smallest maximizing element, i. e.
{
}
z0 = min z / μC ( z ) = max μC (w)
•
W
Middle of Maxima. The defuzzified value of a discrete fuzzy
set C is defined as a mean of all values {z1 , L , z n }of the
universe of discourse having maximal membership grades:
z0 =
1 n
∑ zi .
n i =1
If C is not discrete then
z0 =
∫ zdz
G
∫ dz
G
where G denotes the set of maximizing element of C .
•
Max Criterion. This method chooses an arbitrary value, from
the set of maximizing elements of C :
{
}
z0 ∈ z / μC ( z ) = max μC (w) .
W
Inference mechanisms
We present the most important inference mechanisms in fuzzy
logic control systems and, for simplicity, we consider two fuzzy control
rules of the form
R1:
if x is A1 and y is B1 then z is C1
R2 :
if x is A2 and y is B2 then z is C 2
Fact:
x is x0 and y is y 0
_____________________________________
z is C
Consequence
116
Mamdani’s model. The fuzzy implication is modeled by Mamdani’s
minimum operator, for the aggregation of rules is used the max operator,
the conjunction operator is min and the t-norm from Generalized Modus
Ponens rule is min .
The firing levels of the rules are
α 1 = min(μ A1 ( x0 ), μ B1 ( y 0 )), α 2 = min(μ A2 ( x0 ), μ B2 ( y 0 ))
Using the GMP rule, the conclusion given by first rule is
μ C '1 (w) = sup min(min(μ x0 (u ), μ y0 (v )), min(min(μ A1 (u ), μ B1 (v )), μ C1 (w)))
(u ,v )∈U ×V
Because
μ x0 (u ) = 0 ∀u ≠ x0 and μ y0 (v ) = 0 ∀y ≠ y 0
the supremum becomes minimum and therefore it results
μ C '1 (w) = min(α 1 , μ C1 (w)) , μ C '2 (w) = min(α 2 , μ C2 (w)).
The overall system output is computed as
μ C (w) = max(μ C '1 (w), μ C '2 (w)).
Finally, to obtain a deterministic control action, any defuzzification
strategy can be employed.
Tsukamoto’s model. All linguistic terms are supposed to have monotonic
membership functions. The firing levels of the rules are computed such as
in the Mamdani’s model. The individual crisp control actions z1 and z 2
are computed from the equations
α 1 = μ C ( z1 ) , α 2 = μ C ( z 2 )
1
2
And the overall crisp control action is expressed using the discrete center
of area
117
z0 =
α1 z1 + α 2 z 2
.
α1 + α 2
Sugeno’s model. Sugeno and Takagi used the following architecture
(Takagi & Sugeno, 1985).
R1:
if x is A1 and y is B1 then z1 = a1 x + b1 y
R2 :
if x is A2 and y is B2 then z 2 = a 2 x + b2 y
Fact:
x is x0 and y is y 0
_____________________________________________
Consequence
z0
The firing levels of the rules are computed as in the previous models, the
individual rule outputs are
z1∗ = a1 x0 + b1 y 0 , z 2∗ = a 2 x0 + b2 y 0
and the crisp control action is expressed as
z0 =
α 1 z1* + α 2 z *2
.
α1 + α 2
Larsen’s model. The fuzzy implication is modeled by Larsen’s product
operator, rules aggregation is made by union and the conjunctive operator
is min . The firing levels are computed as
α 1 = min(μ A1 ( x 0 ), μ B1 ( y 0 )) , α 2 = min(μ A2 ( x 0 ), μ B2 ( y 0 ))
and the conclusion is given by
μ C (w) = max(α 1 μ C1 (w), α 2 μ C2 (w)) .
In order to obtain a deterministic control action, is used a defuzzification
strategy.
Example 4.1. As an example, we illustrate Sugeno’s reasoning method.
118
R1:
if x is BIG and y is SMALL then z1 = x + 2 y
R2 :
if x is MEDIUM and y is BIG then z 2 = 3 x − 2 y
Fact:
x is 3 and y is 2
_________________________________________________
Consequence
1
z0
0.9
0. 3
u
α1 = 0.3
v
1
x + 2y = 7
0.8
0.5
3
α2 = 0.5
u
2
v
3x − 2 y = 5
min
Figure 4.3:
Sugeno’s inference mechanism
According to the previous figure, we have
μ BIG ( x 0 ) = μ BIG (3) = 0.9 , μ SMALL ( y0 ) = μ SMALL (2) = 0.3
and
μ MEDIUM (x0 ) = μ MEDIUM (3) = 0.5 , μ BIG ( x0 ) = μ BIG (2) = 0.8 .
The firing levels of the rules are
α1 = min{μ BIG ( x0 ), μ SMALL ( y0 )} = min{0.9, 0.3} = 0.3
119
α 2 = min{μ MEDIUM (x0 ), μ BIG ( y0 )} = min{0.5, 0.8} = 0.5
The individual rule outputs are computed as
z1∗ = x0 + 2 y 0 = 3 + 4 = 7 , z 2∗ = 3x0 − 2 y 0 = 9 − 4 = 5
so the crisp control action is
z0 =
7 × 0.3 + 5 × 0.5
= 5.75
0.3 + 0.5
4.5
EXTENDED MAMDANI FUZZY LOGIC CONTROLLER
An extension of the Mamdani model in order to work with interval
inputs is presented in (Liu, Geng & Zhang, 2005), where the fuzzy sets
are represented by triangular fuzzy numbers and the firing level of the
conclusion is computed as the product of firing levels from the
antecedent.
In our papers (Iancu, 2009a; 2009b) we proposed a fuzzy
reasoning system characterized by:
• the linguistic terms (or values), that are represented by trapezoidal
fuzzy numbers
• the inputs, which can be crisp data, intervals and/or linguistic terms
• various implication operators, which are used to represent the rules
• the crisp action of a rule, computed by Middle-of-Maxima method
• the overall crisp actions, computed by discrete Center-of-Gravity.
The following implications are used:
Reichenbach: I R ( x, y ) = 1 − x + xy
Willmott: I W ( x, y ) = max(1 − x, min( x, y )
120
Mamdani: I M ( x, y ) = min( x, y )
⎧1 if x ≤ y
Rescher-Gaines: I RG ( x, y ) = ⎨
⎩0 otherwise
Kleene-Dienes: I KD ( x, y ) = max (1 − x, y )
⎧1 if x ≤ y
Brouwer-Gödel: I BG ( x, y ) = ⎨
⎩ y otherwise
⎧1
⎪
Goguen: I G ( x, y ) = ⎨ y
⎪⎩ x
if
x≤ y
otherwise
Lukasiewicz: I L ( x, y ) = min(1 − x + y,1)
if x ≤ y
1
⎧
Fodor: I F ( x, y ) = ⎨
⎩max(1 − x, y ) otherwise.
It is sufficiently to work with rules with a single conclusion
because a rule with multiple consequent can be treated as a set of such
rules.
4.5.1 THE PROPOSED MODEL
We assume that the facts are also given by intervals or linguistic
values and a rule is characterized by
• a set of linguistic variable A , having as domain an interval
I A = [a A , b A ]
• n A linguistic values A1 , A2 ,..., An for each linguistic variable A
A
•
membership function for each value
Ai
is
μ A0i ( x) , where
i ∈ {1,2,..., n A } and x ∈ I A .
121
According to the structure of a FLC system, the following steps
are necessary in order to work with our system (Iancu, 2009a).
Step 1. Fuzzification
We consider an interval input [a, b] with
a A ≤ a < b ≤ bA .
The membership function of Ai is modified ([16]) by membership
function of [a, b]
⎧1
⎩0
μ[ a , b ] ( x) = ⎨
if x ∈ [a, b]
otherwise
as follows
μ Ai ( x) = min( μ A0i ( x), μ[ a , b ] ( x)), ∀x ∈ I A ;
it is obvious that, any t-norm T can be used instead of min . The firing
level, generated by interval input [a, b] , for the linguistic value Ai is
μ Ai = max{μ Ai ( x)/x ∈ [a, b]}.
According to the previous formula, for a linguistic value A with
the membership function represented as a trapezoidal fuzzy number
N A = (m A , m A , α A , β A )
and an interval input [a, b] the firing level μ A is computed as
⎧1 if [a, b]I [m A , m A ] ≠ ∅
⎪
⎪ m A + β A − a if a ∈ [m A , m A + β ]
A
⎪
βA
μA = ⎨
⎪ b − m A + α A if b ∈ [m − α , m ]
A
A
A
⎪
αA
⎪0 otherwise
⎩
122
The same technique is used to compute the firing level μ Ai
generated by a linguistic input value A' i ; in this case
μ Ai ( x) = min( μ A0i ( x), μ A 'i ( x)), ∀x ∈ I A .
For a crisp input x0 the firing level is
μ A i = μ A0 i ( x0 ) .
Step 2. Fuzzy inference
We consider a set of fuzzy rules
Ri : if X1 is Ai1 and ... and X r is Air then Y is Ci
where the variables X j , j ∈ {1,2,..., r}, and Y have the domains U j and
V, respectively. The firing levels of the rules, denoted by {α i } , are
computed by
α i = T (α i1 ,..., α ir )
where T is a t-norm and α i j is the firing level for Ai j , j ∈ {1,2,..., r} . The
causal link from X1,..., X r to Y is represented using an implication
operator I . It results that the conclusion inferred from the rule Ri is
C i′(v) = I (α i , C i (v)), ∀v ∈ V .
The formula
C ′(v) = I (α , C (v))
gives the following results, depending on the implication I :
123
Reichenbach : C ′(v) = I R (α , C (v)) = 1 − α + αC (v)
C'
1−α
C
α
Figure 4.4: Conclusion obtained with Reichenbach implication
Willmott : C ′(v) = I W (α , C (v)) = max(1 − α , min(α , C (v)))
C
C'
1− α
0.5
α
Figure 4.5: Conclusion obtained with Willmott implication
C
α
C'
0.5
1− α
Figure 4.6: Conclusion obtained with Willmott implication
124
Mamdani : C ′(v) = I M (α , C (v)) = min(α , C (v))
C
C'
α
Figure 4.7: Conclusion obtained with Mamdani implication
⎧1
Rescher-Gaines: C ′(v) = I RG (α , C (v)) = ⎨
⎩0
if α ≤ C (v)
otherwise
C
C'
α
Figure 4.8: Conclusion obtained with Rescher-Gaines implication
Kleene-Dienes: C ′(v) = I KD (α , C (v)) = max(1 − α , C (v))
C'
1− α
α
C
Figure 4.9: Conclusion obtained with Kleene-Dienes implication
125
if α ≤ C (v)
⎧ 1
Brouwer-Gödel : C ′(v) = I BG (α , C (v)) = ⎨
otherwise
⎩C (v)
C
C'
α
Figure 4.10: Conclusion obtained with Brouwer-Gödel
implication
if α ≤ C (v)
⎧ 1
⎪
Goguen : C ′(v) = I G (α , C (v)) = ⎨ C (v)
otherwise
⎪⎩ α
C
C'
α
Figure 4.11: Conclusion obtained with Goguen implication
Lukasiewicz : C ′(v) = I L (α , C (v)) = min(1 − α + C (v),1)
C'
1− α
α
C
Figure 4.12: Conclusion obtained with Lukasiewicz implication
126
if α ≤ C (v)
⎧1
⎪
Fodor : C ′(v) = I F (α , C (v)) = ⎨max(1 − α , C (v))
⎪
otherwise
⎩
C
1− α
α
C'
Figure 4.13: Conclusion obtained with Fodor implication
C
α
C'
1− α
Figure 4.14: Conclusion obtained with Fodor implication
Step 3. Defuzzification
The fuzzy output Ci′ of the rule Ri is transformed in a crisp
output zi using the Middle-of-Maxima operator according with the
following algorithm. The crisp value z0 associated to a conclusion C ′
inferred from a rule having the firing level α and the conclusion C ,
represented by the fuzzy number (m C , m C , α C , β C ) , is:
z0 =
mC + mC
for implication I ∈ {I R , I KD }
2
127
m C + m C + (1 − α )( β C − α C )
for
2
I ∈ {I M , I RG , I BG , I G , I L , I F } or ( I = I W and α ≥ 0.5)
z0 =
z0 =
aV + bV
if I = I W , α ≤ 0.5 and V = [aV , bV ].
2
In the last case, in order to remain inside the support of C , we can
choose a value according to Max-Criterion; for instance
z0 =
m C + m C + α (β C − α C )
.
2
The overall crisp action corresponding to an implication is computed by
the discrete Center-of-Gravity method: if the number of fired rules is N
then the final control action
N
z0 =
∑α i zi
i =1
N
∑α i
i =1
where α i is the firing level and zi is the crisp output of the i -th rule.
Finally, we combine the results obtained with various implication
operators in order to obtain the overall output of the system. For this we
N (I )
use the "strength" of an implication, given by the ratio λ ( I ) =
,
13
where N (I ) is the number of properties (from the list I1 to I13 , see
Definition 3.2) verified by the implication I . According to [12] we have:
N ( I R ) = 11 ,
N (IW ) = 6 ,
N (I M ) = 4 ,
N ( I RG ) = 11 ,
N ( I KD ) = 11 ,
N ( I BG ) = 10 , N ( I G ) = 10 , N ( I L ) = 13 , N ( I F ) = 12 .
Then, the overall crisp action of the system is computed as
128
∑I λ ( I ) z 0 ( I )
∑I λ (I )
z0 =
where z0 ( I ) is the overall control action given by the implication I .
A variant of this model is presented in (Iancu, 2009b), where
a) the firing level, generated by the interval input [a ,b] corresponding to
the linguistic value Ai is computed as a ratio, which is the area defined by
μ A i divided by area defined by μ Ao . This ratio can be written as
i
b
μ Ai =
∫ μ A i (x )dx
a
b
∫
a
μ A0 i
(x )dx
.
If the input is a fuzzy set B then the firing level corresponding to the
linguistic value Ai is computed using the previous technique with
(
)
μ A i ( x ) = min μ A0 i ( x ) , μ B ( x ) .
The firing level generated by a crisp input x0 is computed as
μ A i = μ A0 i ( x0 ) .
b) The overall crisp control of the system is computed using an OWA
(Ordered Weighted Averaging) operator), given by
Definition 4.2
An OWA operator of dimension n is a maping
F : R n → R that has an associated n vector w = (w1 , w2 ,L , wn ) such as
t
n
wi ∈ [0 , 1], 1 ≤ i ≤ n , ∑ wi = 1 .
i =1
The aggregation operator of the values {a1 , a 2 ,L , a n } is
129
n
F (a1 , a 2 ,L , a n ) = ∑ w j b j
j =1
where b j is the j -th largest element from {a1 , a 2 ,L , a n }.
The weight corresponding to an implication I is computed as N (I ) Np ,
where N (I ) is the number of the properties (see Definition 3.2) verified
by the implication I and Np is the sum of properties verified by all
implications used in the system. The values to aggregate are the crisp
control values given by the set of implications used by the system.
4.5.2. AN APPLICATION
In order to show how the proposed system works, we consider an
example inspired from (Liu, Geng & Zhang, 2005). We consider rules
with two inputs and one output. The input variables are quality (Q) and
price (P ) ; the output variable is satisfaction score (S ) . The fuzzy rulebase consist of
R1 : if Q is Poor and P is Low then S is Middle
R2 : if Q is Poor and P is Middle then S is Low
R3 : if Q is Poor and P is High then S is Very Low
R4 : if Q is Average and P is Low then S is High
R5 : if Q is Average and P is Middle then S is Middle
R6 : if Q is Average and P is High then S is Low
R7 : if Q is Good and P is Low then S is Very High
R8 : if Q is Good and P is Middle then S is High
R9 : if Q is Good and P is High then S is Middle
There are three linguistic values for the variable price
130
{Low, Middle, High}
and five linguistic values for the variable quality
{Poor , Below Average, Average, AboveAverage, Good }.
We consider the universes of discourse [0,800] for price and
[0,10] for quality. The membership functions corresponding to the
linguistic values are represented by the following trapezoidal fuzzy
numbers:
Low = (0,100, 0, 200)
Middle = (300, 500,100,100)
High = (700, 800, 200, 0)
Poor = (0,1, 0, 2)
Below Average = (2, 3,1,1)
Average = (4, 6, 2, 2)
Above Average = (7, 8,1,1)
Good = (9,10, 2, 0).
The satisfaction score has following linguistic values
{Very Low, Low, Middle, High, VeryHigh}.
For the universe [0,10] we consider the following membership
functions:
Very Low = (0,1,0,1)
Low = (2, 3,1, 1.5)
Middle = (4, 6,1,1)
High = (7, 8,1, 2)
Very High = (9,10,1,0).
These membership functions are presented in the next figures.
131
Figure 4.15: The membership function of the input variable price
Figure 4.16: The membership function of the input variable quality
132
Figure 4.17: The membership function of the output variable
satisfaction score
We consider a person interested to buy a computer with price =
400-600 EUR and quality = AboveAverage . The positive firing levels
corresponding to the linguistic values of the input variable price are
μ Middle = 1, μ High = 0.5
and the positive firing levels corresponding to the linguistic values of the
input variable quality are:
μ Average = 2/3, μ Good = 2/3.
The fired rules and their firing levels, computed with t-norm Product, are:
R5 with firing level α 5 = 2/3 ,
R6 with firing level α 6 = 1/3 ,
R8 with firing level α 8 = 2/3 and
R9 with firing level α 9 = 1/3 .
Working with I L implication the fired rules give the following
crisp values as output:
133
z5 = 5, z6 = 8/3, z8 = 23/3, z9 = 5;
the overall crisp control action for I L is
z0 ( I L ) = 5.5
Working with I R implication the firing rules give the following
crisp values as output:
z5 = 5, z6 = 2.5, z8 = 7.5, z9 = 5;
its overall crisp action is
z 0 ( I R ) = 5.416.
Because λ ( I R ) = 11/13 and λ ( I L ) = 1 , the overall crisp action
given by system is
z 0 = 5.4615
The Mamdani model is characterized by:
• Mamdani's minimum operator is used in order to compute the
firing levels of the rules and to model the fuzzy implication
• the maximum operator is used to compute the overall system
output from individual rules outputs.
Applying this model for our example one obtain the following
results:
• the firing levels are: α 5 = 2/3, α 6 = 0.5 , α 8 = 2/3 , α 9 = 0.5
• the crisp rules outputs are: z5 = 5, z6 = 5.25/2, z8 = 23/3, z9 = 5.
• the overall crisp action is: z0 = 23/3 = 7.66
In we use the Center-of-Gravity ( instead of maximum operator) to
compute the overall crisp action we obtain z0 = 5.253
We observe an important difference between these two results and
also between these results and those given by our method. An explanation
consists in the small value of the "strength" of Mamdani's implication in
comparison with the values associated with Reichenbach and Lukasiewicz
134
implications; the strength of an implication is a measure of its quality.
From different implications, different results will be obtained if separately
implications will be used. Our system offers a possibility to avoid this
difficulty, by aggregation operation which achieves a "mediation"
between the results given by various implications.
4.6
FUZZY
CLASSIFIER SYSTEM
In a fuzzy classifier system each fuzzy if-then rule is treated as an
individual classifier. A heuristic method for generating fuzzy if-then rules
from training patterns and a fuzzy reasoning method that assigns a class
label to each unseen pattern using the generated fuzzy rules are presented
(
)
(Nakashima, 2000). We assume that m real vectors x p = x1p , x 2p , L , x np
are given as patterns from c classes ( c ≤ m ). The pattern space is [0,1]
n
and therefore attribute values of each pattern are x ip ∈ [0, 1]
n
for
p ∈ {1, L , m} and i ∈ {1, L , n}. We use rules of the following form
R j : if x1 is A1j and L and xn is A nj then Class C j with CF j
where R j is the label of the j -th rule, A1j , …, A nj are antecedent fuzzy
sets on the unit interval [0 , 1] , C j is the consequent class and CF j is the
grade of certainty of the fuzzy rule R j . In computer simulations one use a
typical set of linguistic values as antecedent fuzzy sets. The membership
function of each linguistic value is specified by homogeneously
partitioning the domain interval [0 , 1] of each attribute into the symmetric
triangular fuzzy sets. The consequent class C j and the grade of certainty
CF j are determined by the following heuristic procedure (Ishibuchi,
135
Nozaki & Tanaka, 1992; Ishibuchi, Nozaki, Yamamoto & Tanaka 1995;
Ishibuchi, Murata & Türkşen, 1997)
Determination of C j and CF j
Step 1. Calculate the compatibility grade of each training pattern
(
x p = x1p , x 2p , L , x np
)
with the fuzzy if-then rule R j by the following
operation:
μ j (x p ) = μ 1j (x1p )× L × μ nj (x np )
Step 2. For each class, calculate the sum of the compatibility grades of the
training patterns with the fuzzy if-then rule R j :
β class h (R j ) =
∑
x p ∈class h
μ j (x p ), h = 1, 2, L , c
Step 3. Find class ĥ j that has the maximum value of β class h (R j ) :
β class ĥ (R j ) = max{β class h1 (R j ), L , β class hc (R j )}.
j
If two or more classes take the maximum value (i. e. the consequent class
can not be determined uniquely) or if there is no training pattern
compatible with the fuzzy rule R j (i. e. β class h (R j ) = 0 for h = 1, 2 , L , c )
the consequent class C j will be ∅ . If a single class takes the maximum
value, let C j be the class ĥ j .
Step 4. If the consequent class C j is ∅ then the grade of certainty of the
rule R j will be CF j = 0 . Otherwise it is determined as follows:
CF j =
β class ĥ (R j ) − β
j
∑ β class h j (R j )
c
h =1
136
, β =
( )
1 c
∑ β class h R j
c − 1 h =1
h≠h j
The fuzzy if-then rules with certainty grades are different from the
standard ones that are usually used in control problems and functions
approximation problems. The following traditional type of fuzzy if-then
rules is also applicable to pattern classification problems:
R j : if x1 is A1j and L x n is A nj then y1 is β 1j and L y c is β cj
where y k is the possibility grade of the occurrence of class k and β jk is a
consequent fuzzy set. Instead of the consequent fuzzy set β jk one can use
a singleton fuzzy set (i.e., a real number b kj ) or a linear function of input
variables (i. e. b kj ,0 + b kj ,1 x1 + L + b kj ,n xn ).
Fuzzy reasoning
When the antecedent of each fuzzy rule is given one can determine
the consequent class and the grade of certainty by the heuristic procedure
above presented. An input pattern x is classified by a fuzzy reasoning
method based on a single winner rule. The winner rule R ˆj is determined
as
μ ˆj ( x ) ⋅ CFˆj = max{μ j ( x ) ⋅ CF j / R j ∈ S }
where S is the set of fuzzy if-then rules. If many fuzzy if-then rules
have the same maximum product but different consequent classes for the
input pattern x the classification is rejected. The classification is also
rejected if no fuzzy rule is compatible with the input pattern x (i.e.
μ j (x) = 0 for ∀R j ∈ S ). When many fuzzy if-then rules with the same
consequent class have the same value, we assume that the rule with the
smallest index j is the winner.
137
Fuzzy-genetic classifier system
The system is based on the heuristic rule generation procedure and
on genetic operations used for generating a combination of antecedent
fuzzy sets for each fuzzy if-then rule. The outline of the system is as
follows:
Step 1. Generate an initial population of fuzzy if-then rules
Step 2. Evaluate each fuzzy if-then rule in the current population
Step 3. Generate new rules by genetic operations
Step 4. Replace a part of the current population with the newly generated
rules
Step 5. Terminate the algorithm if a pre-specified stopping condition is
satisfied, otherwise return to Step 2. )
Coding of fuzzy if-then rules
Because the consequent class and the grade of certainty of each
rule are determined by the heuristic procedure, only the antecedent fuzzy
sets are altered by genetic operations. The system uses five linguistic
values and “don’t care”, denoted by the following six symbols (1,2,3,4,5,
#):
→1
S: small
MS: medium small → 2
M: medium
→3
ML: medium large → 4
L: large
→5
→6
DC: don’t care
For example, the string “1#3#” denotes the following fuzzy if-then rule
for a four-dimensional pattern classification problem:
If x1 is small and x2 is don’t care and x3 is medium and x4 is
don’t care then class C j with CF j .
138
Because the conditions with “don’t care” can be omitted, this rule is
rewritten as follows:
If x1 is small and x3 is medium then class C j with CF j .
Initial population
The initial population of N pop rules are generated by randomly
selecting their antecedent fuzzy set from the six symbols corresponding to
the five linguistic values and “don’t care”. Each symbol is randomly
selected with the probability of 1 6 . The consequent class C j and the
grade of certainty CF j of each fuzzy if-then rule are determined by the
heuristic procedure.
Evaluation of each rule
An unit reward is assigned to the winner rule when a training
pattern is correctly classified by that rule. After all the training patterns
are examined, the fitness value of each rule is defined by the total reward
assigned to that rule:
fitness (R j ) = NCP (R j ) ,
where NCP (R j ) is the number of training patterns that are correctly
classified by R j .
Generating new population of rules
In order to generate new fuzzy if-then rules, a pair of rules is
selected from the current population. The selection is made using the
probability based on the roulette wheel selection with linear scaling:
P (R j ) =
fitness (R j ) − fitness min (S )
∑ ( fitness(R ) − fitness (S ))
Ri ∈S
j
min
139
where fitness min (S ) is the minimum fitness value of the fuzzy rules in the
current population S . From the selected pair, two rules are generated by
uniform crossover for the antecedent fuzzy sets as in the following
example
*
*
⇒ 1 # 4 1 5
1 # 3 2 5
# 2 4 1 #
# 2 3 2 #
where * denotes the crossover position. The mutation operator changes a
value by another, randomly selected:
*
1 2 3 2 # ⇒ 1 2 3 5 #
The consequent class and the grade of certainty of each of the newly
generated fuzzy if-then rules are determined by the heuristic procedure.
These genetic operations are iterated until a pre-specified number of rules
are newly generated; let N rep be this number. The worst N rep rules (with
the smallest fitness values) are removed from the current population and
the newly generated rules are added. In this way a population of rules are
generated.
Final solution
Frequently, the number of generation is used as a stopping
condition for terminating the execution of fuzzy classifier system. The
final solution is the rule set with the maximum classification rate for
training pattern over all generations.
This fuzzy classifier system can be written as the following
algorithm:
Step1. Generate initial population of N pop fuzzy if-then rules by
randomly specifying antecedent fuzzy sets of each rule. The consequent
class and the grade of certainty are determined by the heuristic procedure.
140
Step 2. Classify all the given training patterns by the fuzzy if-then rules in
the current population, calculating the fitness value of each rule.
Step 3. Generate N rep fuzzy if-then rules from the current population by
selection, crossover and mutation operators. The consequent class and the
grade of certainty are determined by the heuristic procedure.
Step 4. Replace the worst N rep fuzzy if-then rules with the newly
generated rules.
Step 5. Terminate the algorithm if a pre-specified stopping condition is
satisfied, otherwise return to Step 2.
Numerical example
An example for a classification problem with two attributes (i.e.,
x1 and x2 ) and two classes is presented in (Nakashima, 2000).
class 1
class 2
1.0
x 2 0.5
0.0
0.5
x1
1.0
The rules have the form
R j : If x1 is A1j and x2 is A 2j then class C j with CF j
141
where the antecedent fuzzy sets are selected from the five linguistic
values from the following figure and “don’t care”:
1
S
MS
M
ML
L
1
0
⎧1 if 0 ≤ x ≤ 1
⎩0 otherwise
μ don' t care (x ) = ⎨
The total number of fuzzy if-then rules is 36 and the problem is to find a
compact rule set with high classification performance. From the
following parameters
Population size:
N pop = 1, 2 , L
Crossover probability: 1.0
Mutation probability: 0.1
Number of replaced rules in each population N rep = 1
Stopping condition: 1000 generations
the solution obtained is
If x1 is medium small then class 1 with CF = 0.90
If x1 is medium and x2 is small then class 1 with CF = 0.85
If x1 is medium and x2 is medium small then class 1 with CF = 0.79
If x1 is large then class 2 with CF = 1.0
If x1 is medium large then class 2 with CF = 0.87
If x1 is medium and x2 is medium large then class 2 with CF = 0.81
142
REFERENCES
Alsina, C., Trillas, E. & Valverde, L. (1980). On non-distributive logical
connectives for fuzzy sets theory. BUSEFAL, 3, 18-29.
Baldwin, J. F. (1987). Evidential Support Logic Programming. Fuzzy Sets
and Systems, 24, 1-26.
Bauer, M. (1996). Approximations for Decision Making in the DempsterShafer Theory of Evidence, UAI, 73-80
Bellman, R. E. & Giertz, M. (1973). On the analytic formalism of the
theory of fuzzy sets. Information Sciences, 5, 149-156.
Buisson, J. C. Farreny, H. & Prade, H. (1986). Dealing with Imprecision
and Uncertainty in the Expert System DIABETO III. In Proc. of
the 2nd Int. Conf. on Artificial Intelligence (pp. 705-721). Paris,
France: Hermes.
Bourke, M. M. (1995). Self-learning predictive control using relationalbased fuzzy logic, Thesis, University of Alberta, Edmonton,
Alberta
De Campos, L. M., Lamata, M. T. & Moral, S. (1990). The concept of
conditional fuzzy measure. International Journal of Intelligent
Systems, 5, 237-246.
Czogala, E. & Leski, L. (2001). On equivalence of approximate reasoning
results using different interpolations of fuzzy if-then rules. Fuzzy
Sets and Systems, 117, 279-296.
Demirly, K. & Turksen, I. B. (1992). Rule Break up with Compositional
Rule of Inference. In IEEE International Conference on Fuzzy
Systems (pp. 949-956). San Diego, CA.
Dempster, A. P.(1967). Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Statist. 38, 325-339.
143
Desprès, S.(1986). Un apport à la conception des systémes à base de
connaisances: Les opérations de deduction floues. Thèse
d’Université, Université Paris VI, France.
Dubois, D, & Prade, H. (1979). Fuzzy real algebra: Some results. Fuzzy
Sets and Systems, 2, 327-348.
Dubois, D, & Prade, H. (1980). Fuzzy Sets and Systems: Theory and
Applications. San Diego, USA: Academic Press, Inc.
Dubois, D. & Prade. H. (1982). On several representations of an uncertain
body of evidence. In M. M. Gupta & E. Sanchez (Eds.), Fuzzy
Information and Decision Processes (pp. 167-181), Amsterdam,
Holland: North-Holland.
Dubois, D. (1983). Modeles Mathématique de l’Impresis and l’Incertain
en Vue d’Applications aux Techniques d’Aide a la Decision.
Thèse d’Etat, Grenoble.
Dubois, D. & Prade. H. (1985a). Evidence measures based on fuzzy
information. Automatica, 21(5), 547-562.
Dubois, D. & Prade. H. (1985b). The generalized modus ponens under
sup-min composition. A theoretical study. In M. M. Gupta, A.
Kandel, W. Bandler & J. B. Kiszka (Eds.), Approximate
Reasoning in Expert Systems (pp. 217-232). North-Holland.
Dubois, D. & Prade. H. (1986a). On the unicity of Dempster rule of
combination. International Journal of Intelligent Systems, 1, 133142.
Dubois, D.& Prade, H.(1986b). A Set-Theoretic View of Belief
Functions. International Journal of General Systems, 12, 193-226,
Vol.12.
Dubois, D.& Prade, H.(1987). Théorie des Possibilites. Applications à la
representation des connaisances en informatique. Paris, Masson.
144
Dubois, D.& Prade, H.(1988). Representation and combination of
uncertainty with belief functions and possibility measures.
Computational Intelligence, 4, 244-264.
Fagin, R. & Halpern, J. Y. (1989). A new approach to updating beliefs.
(Research report n 0 RJ 7222) San Jose, USA: IBM Research
Division, Almaden Research Center
Fodor, J. C. (1991). On fuzzy implication operators. Fuzzy Sets and
Systems, 42, 293-300.
Fullér, R. (1995). Neural Fuzzy Systems. Åbo, Finland: Åbo Akademi
University.
Fullér, R. (1998). Fuzzy Reasoning and Fuzzy Optimization. Turku,
Finland: TUCS General Publications.
Garibba, S. F. & Servida, A. (1988). Evidence Aggregation in Expert
Juggements. IN Proc. of 2nd Int. Conf on Information Processing
and Management of Uncertainty in Knowledge-Based Systems –
IPMU (pp. 385-400). Urbino, Italy: Springer Verlang.
Graham, B.P., & Newel, R.B.(1989). Fuzzy Adaptive Control of a FirstOrder Process. Fuzzy Sets and Systems, Vol. 31, pp. 47-65.
Fullér, R. (2000). Introduction to Neuro-Fuzzy Systems. Berlin, Germany:
Springer-Verlang .
Iancu, I. (1997a). T-norms with threshold. Fuzzy Sets and Systems, 85,
83-92.
Iancu, I. (1997b). Introduction of a double threshold in uncertainty
management. In I. Plander (Ed.), Proc. of Seventh Int. Conf. on
Artif. Intelligence and Information-Control Syst. of Robots (pp.
10-14), Smolenice Castle, Slovakia: World Scientific, Singapore.
Iancu, I. (1997c). PROSUM – Prolog System for Uncertainty
Management. Int. Journal of Intelligent Systems, 12(9), 615-627.
145
Iancu, I. (1997d). Reasoning System with Fuzzy Uncertainty. Fuzzy Sets
and Systems, 92, 51-59.
Iancu, I. (1998a). A method for constructing t-norms. Korean J. Comput.
& Appl. Math., 5(2), 407-414.
Iancu, I. (1988b). Some applications of Pedrycz’s operator. Computers
and Artificial Intelligence, 17(1), 83-97.
Iancu, I. (1998c). Propagation of uncertainty and imprecision in
knowledge-based systems, Fuzzy Sets and Systems. Int. J. of Soft
Computing and Intelligence 94, 29-43
Iancu, I. (1999a). Fuzzy connectives with applications in uncertainty
management. In Proc. of the 3-rd annual meeting of the Romanian
Society of Math. Science (pp. 40-47), Craiova: University of
Craiova.
Iancu, I. (1999b). On a family of t-operators. Annals of the Univ. of
Craiova. Mathematics-Computer Science Series, XXVI, 84-92.
Iancu, I. (2000). Uncertain Reasoning System. In A. Kent, J. G. Williams
(Eds). Encyclopedia of Computer Science and Technology , vol 48
(28) (pp. 359-372), Marcel Deker, Inc., New York
Iancu, I. (2003). On a Representation of an Uncertain Body of Evidence,
Annals of the Univ. of Craiova, Mathematics-Computer Science
Series, XXX( 2) , 100-108.
Iancu, I. (2005). Operators with n-thresholds for uncertainty management.
J. Appl. Math. & Computing. An Int. Journal, 19(1-2), 1-17.
Publisher: Springer Berlin / Heidelberg
Iancu, I. (2008a) Generalized Modus Ponens Using Fodor's Implication
and a Parametric T-norm, WSEAS Transaction on Systems 6(vol.
7) 738-747
146
Iancu, I. (2008b). Generalized Modus Ponens Reasoning for Rules with
Partial Overlapping Between Premise and Observation, European
Computing Conference, Malta, pp. 37-43
Iancu, I. (2008c). An Approximation of Basic Assignment Probability in
Dempster-Shafer Theory, Proc. of 8-th Int. Conf. on Artificial
Intelligence and Digital Communications, Craiova, September
2008, pp. 116-123
Iancu, I. (2009a) Extended Mamdani Fuzzy Logic Controller, The Fourth
IASTED International Conference on Computational Intelligence
~CI 2009~ August 17 - 19, 2009 Honolulu, Hawaii, USA, ACTA
Press, pp. 143-149
Iancu, I. & Colhon, M. (2009b). Mamdani FLC with various implications,
11th International Symposium on Symbolic and Numeric
Algorithms for Scientific Computing - SYNASC 09 - Timisoara,
Romania, September 26-29, 2009, will be published by IEEE
Computer Society's Conference Publishing Service
Iancu, I. (2009c). Generalized Modus Ponens Using Fodor’s Implication
and T-norm Product with Threshold, Int. J. of Computers,
Communications & Control, Vol. IV (2009), No. 4, pp. 330-343
Inuiguchi, M., Fu, K. S. & Ichihashi, H. (1990). Properties of possibility
and necessity measures constructed by Gödel implication. In Third
Int. Conf. IPMU – Information Processing and Management
Uncertainty in Knowled1ge-Based Systems (pp. 358-360), Paris.
Ishibuchi, H., Nozaki, K. & Tanaka, H. (1992). Distributed
representation of fuzzy rules and its application to pattern
classification. Fuzzy Sets and Systems, 52(1), 21–32.
Ishibuchi, H., Nozaki, K., Yamamoto, N. & Tanaka, H. (1995). Selecting
fuzzy if-then rules for classification problems using genetic
algorithms. IEEE Trans. on Fuzzy Systems, 3( 3), pp. 260–270.
147
Ishibuchi, H., Murata, T. & Türkşen, I. B. (1997). Single-objective and
two-objective genetic algorithms for selecting linguistic rules for
pattern classification problems. Fuzzy Sets and Systems, 89(2) pp.
135–149.
Jager, R. (1995). Fuzzy Logic in Control, Thesis Technische Universiteit
Delft
Karr, C. (1991a). Genetic Algorithms for Fuzzy Controllers. AI Expert, 2,
pp. 26-33
Karr, C.(1991b). Applying Genetics to Fuzzy Logic. AI Expert, 3, pp.
38-43.
Klir, C. J. & Yuan, B.(1995). Fuzzy Sets and Fuzzy Logic. Theory and
Applications. New Jersey, USA: Prentice Hall PTR.
Lefevre, E., Colot, O. & Vannoorenberghe, P.(2000). Belief functions
combination and conflict management. Information Fusion
Journal, 3(2), 149-162.
Lebailly, J., Martin-Clouaire, R. & Prade, H. (1987). Use of Fuzzy Logic
in a Rule Based System in Petroleum Geology. In E. Sanchez & L.
A. Zadeh (Eds.), Approximate Reasoning in Intelligent Systems,
Decision and Control (pp. 125-144). Oxford, UK: Pergamon
Press.
Leung, K. S. & Lam, W. (1989). A Fuzzy Expert System Shell Using
Both Exact and Inexact Reasoning. Journal of Automated
Reasoning, 5, 207-233.
Leung, K. S., Wong, W. S. F. & Lam, W. (1989). Applications of a novel
fuzzy expert system shell. Expert Systems, 6(1), 2-10.
Ling, C. H. (1965). Representation of associative functions. Publ. Math.
Debrecen, 12, 189-212.
Liu, F., Geng, H., Zhang, YQ (2005) Interactive Fuzzy Interval Reasoning
for smart Web shopping. Applied Soft Computing 5, 433-439
148
Lowrance, J., Garvey, T. & Strat, T. (1986). A Framework for EvidentialReasoning Systems, In Proc. of the 5th Nat. Conf. of the American
Association forArtificial Intelligence, 896-903
Mamdani, E.H. & Assilian, S.(1975). An Experiment in Linguistic
Synthesis with a Fuzzy Logic Controller. International Journal of
Man-Machine Studies, Vol. 7, pp. 1-13.
Martin-Clouaire, R. & Prade, H. (1985). SPII-1: A simple inference
engine capable of accommodating both imprecision and
uncertainty. In G. Mitra (Ed.), Computer_Assisted Decision
Making (pp. 117-131). Amsterdam, Holland: North-Holland.
Mizumoto, M. (1985). Extended fuzzy reasoning. In M. M. Gupta et
al.(Eds.), Approximate Reasoning in Expert Systems (pp. 71-85).
Amsterdam, Holland: North-Holland.
Mizumoto, M & Zimmerman, H.-J.(1982): Comparison on fuzzy
reasoning methods, Fuzzy Sets and Systems, 8, 253-283.
Murphy, C.K.(2000). Combining belief functions when evidence
conflicts. Decision Support Systems, 29, 1-9.
Nakashima, T. (2000). Fuzzy Genetics-Based Machine Learning for
Pattern Classification. Doctoral dissertation, Osaka Prefecture
University, Japan.
Orponen, P. (1990). Dempster' Rule of Combination is # P-complete,
Artificial Intelligence, 44, 245-253
Pacholczyc, D. (1987). Introduction d’un seul dans le calcul de
l’incertitude en logique floue. BUSEFAL, 32, 11-18.
Planchet, B. (1989). Credibility and Conditioning. Journal of Theoretical
Probability, 2, 289-299.
Pedrycz, W. (1983). Some Applicational Aspects of Fuzzy Relational
Equations in Systems Analysis. International Journal of General
Systems, Vol. 9, pp. 125-132.
149
Prade, H. (1985). A computational approach to approximate and plausible
reasoning with applications to expert systems. IEEE Trans. Pattern
Analysis & Machine Intelligence, 7(3), 260-183.
Schweizer, B. & Sklar, A. (1960). Statistical metric spaces. Pacific J.
Math., 10, 313-334.
Singer, D. (1990). A fuzzy set approach to fault tree and reliability
analysis. Fuzzy Sets and Systems, 34, 145-155.
Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton, USA:
Princeton Univ. Press.
Sugeno, M.(1974). Theory of Fuzzy Integral and its Applications. Ph. D.
Thesis, Inst. of Technology, Tokyo.
Smarandache, F. & Dezert, J. (Eds). (2004). Advances and Applications
of DSmT for Information Fusion. Rehoboth, USA: American
Research Press.
Smets, P. (1981). The degree of belief in a fuzzy event. Information Sci.,
25, 1-19.
Smets, P. (1988). Belief Functions versus Probability Functions, in B.
Buchon, L. Saitta, R. Yager (Eds.): Uncertainty and Intelligent
Systems, vol. 313 of Lecture Notes in Computer Science,
Springer, 17-24
Smets, P. (1993). Belief functions: the disjunctive rule of combination
and the generalized Bayesian theorem. International Journal of
Approximate reasoning, 9, 1-35.
Smets, P.(2000). Data Fusion in the Transferable Belief Model. In
Proceedings of the 3rd International Conference on Information
Fusion (pp PS21-PS33). Paris, France.
Smets, P. & Kennes, R.(1994). The transferable belief model. Artificial
Intelligence, 66(2), 191-234.
150
Suppes, P. & Zanotti, M. (1977). On using random relations to generate
upper and lower probabilities. Syntese, 36, 427-440.
Takagi, T. & Sugeno, M. (1985). Fuzzy identification of systems and its
applications to modeling and control. IEEE Trans. Syst. Man
Cybernet., 116-132.
Tessem, B. (1993). Approximations for efficient computation in the
theory of evidence, Artificial Intelligence, 61, 315-329.
Trillas, E. (1979). Sobre funciones de negacion en la teoria de conjuntos
difusos. Stochastica, 3(1), 47-59.
Voorbraak, F. (1989). A Computationally Efficient Approximation
Dempster-Shafer Theory, Int. Journal of Man-Machine Studies,
30, 525-535.
Yager, R. R. (1983a). Some relationships between possibility, truth and
certainty. Fuzzy Sets and Systems, 11, 151-156.
Yager,R. R. (1983b). Hedging in the combination of evidence. Journal of
Information and Optimization Science, 4(1), 73-81.
Yager, R. R.(1985). On the relationships of methods of aggregation of
evidence in expert systems. Cybernetics and Systems, 16, 1-21.
Yager, R.R.(1987). On the Dempster-Shafer framework and new
combination rules. Information Sciences, 41, 93–138.
Yen, J. (1992). Computing generalized belief functions for continuous
fuzzy sets. International Journal of Approximate Reasoning, 6, 131.
Zadeh, L. A. (1965). Fuzzy Sets. Information and Control, 8, 338-353.
Zadeh, L. A. (1973). Outline of a new approach to the analysis of
complex systems and decision processes. IEEE Transanctions on
Systems, Man and Cybernetics, 3, 28-44.
151
Zadeh, L. A. (1975a, 1975b, 1975c). The concept of linguistic variable
and its application in approximate reasoning. Inform. Sci., 8, 199249; Inform. Sci., 8, 301-357; Inform. Sci., 9, 43-80.
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility.
Fuzzy Sets and Systems, 1, 3-28.
Zadeh, L. A. (1979). A theory of approximate reasoning. In J. E. Hayes,
D. Michie & L. I. Mikulich (Eds.). Machine Intelligence, 9 (pp.
149-194). New York: Elsevier.
Zimmermann, H.-J. & Roubens. J. (Eds.) (1990). Progress in Fuzzy Sets
and Systems. Boston, USA: Kluwer Academic Publishers.
Zimmermann, H.-J. (1991). Fuzzy Sets Theory – and its Applications.
Boston, USA: Kluwer Academic Publishers.
152
CALCUL EVOLUTIV
1
INTRODUCERE ÎN CALCULUL
EVOLUTIV
1.1 Specificul calculului evolutiv
În matematică optimizarea este înţeleasă ca însemnând găsirea unei
soluţii optime. În acest scop s-au obţinut rezultate importante în
calculul diferenţial, calculul variaţional, controlul optimal, cercetări
operaţionale. Drumul de la rezultatele teoretice, referitoare la teoreme
de existenţă, unicitate, caracterizare a soluţiei, etc., la optimizarea
efectivă este de multe ori prea lung, fie datorită complexităţii prea
mari a problemelor reale faţă de modelul matematic utilizat, fie
datorită complexităţii (timp, memorie) algoritmilor utilizaţi.
La mijlocul anilor '70, odată cu creşterea performanţelor
calculatoarelor şi, implicit, a complexităţii problemelor reale ce se
puteau rezolva cu ajutorul calculatorului, au devenit frecvente
situaţiile în care modelele clasice de optimizare nu mai conduceau la
soluţii acceptabile pentru probleme modelabile pe calculator. Tot mai
2
frecvent, probleme din biologie, climatologie, chimie, mecanică,
analiza datelor, etc., probleme ale căror modele includ sute sau mii de
variabile, ale căror funcţii de optimizat prezintă multiple optime locale
şi neregularităţi nestudiate din punct de vedere numeric, rămâneau
nerezolvate sau cu soluţii aproximate grosier.
Studiindu-se excelenta adaptare a fiinţelor vii, în ceea ce priveşte
forma, structura, funcţiile şi stilul de viaţă, numeroşi cercetători au
ajuns la concluzia că natura oferă soluţii optimale la problemele sale,
soluţii superioare oricăror performanţe tehnologice. S-a demonstrat,
chiar matematic, optimalitatea unor sisteme biologice: raportul
diametrilor ramificaţiilor arterelor, poziţia punctelor de ramificare a
vaselor de sânge, valoarea hematocritului (procentul volumului
particolelor solide din sânge). În consecinţă, au apărut primele
încercări de imitare a procesului de evoluţie naturală. Încă din
perioada anilor 1950, oameni de ştiinţă precum Turing şi von
Neumann au fost interesaţi în modelarea şi înţelegerea fenomenelor
biologice în termeni de procesări naturale ale informaţiei. Începutul
erei calculatoarelor a promovat tendinţa de simulare a proceselor şi a
modelelor naturale şi a condus la dezvoltarea unor modele evolutive
artificiale.
În anul 1970 profesorii Hans-Paul Schwefel (Dortmund) şi
Ingo Rechenberg (Berlin) având de rezolvat o problemă de mecanica
fluidelor, referitoare la optimizarea formei unui corp ce se deplasează
într-un fluid, au căutat o nouă tehnică de optimizare deoarece
metodele cunoscute până în acel moment nu conduceau la o soluţie
acceptabilă. Ideea lor a întruchipat conjectura lui Rechenberg, rămasă
3
până azi justificarea fundamentală a aplicării tehnicilor evolutive:
''Evoluţia naturală este, sau cuprinde, un proces de optimizare foarte
eficient, care, prin simulare, poate duce la rezolvarea de probleme
dificil de optimizat''.
Modelul de simulare propus de Rechenberg şi Schwefel [71,
76, 77] este cunoscut azi sub numele de ''strategii evolutive'' şi iniţial
se aplica doar problemelor de optimizare de variabilă continuă.
Soluţiile candidat x se reprezintă în virgulă mobilă iar individul i ,
căruia i se aplică procesul evolutiv, constă din această reprezentare şi
dintr-un parametru al evoluţiei, notat  , reprezentat tot in virgulă
mobilă: i  x,   . La fiecare pas soluţia curentă este modificată pe
fiecare componentă conform lui  şi în cazul unei îmbunătăţiri este
înlocuită cu cea nou obţinută. Parametrul  joacă rolul pasului din
metodele iterative clasice şi este astfel folosit încât să fie respectat
principiul "mutaţiilor mici". Pentru strategiile evolutive au fost
dezvoltate scheme de adaptare a parametrilor de control ( autoadaptare ).
A doua direcţie de studiu s-a conturat la Universitatea San
Diego; punctul de pornire a fost tot simularea evoluţiei biologice iar
structura de date aleasă a fost maşina cu număr finit de stări. Urmând
această abordare, Fogel [25, 28] a generat programe simple,
anticipând "programarea genetică". Populaţia este reprezentată de
programe care candidează la rezolvarea problemei. Există diverse
4
reprezentări ale elementelor populaţiei, una dintre cele mai utilizate
fiind aceea în care se utilizează o structură arborescentă. În anumite
aplicaţii, cum este regresia simbolică, programele sunt de fapt
expresii.
De exemplu, programul expresie " a  b * c " poate fi descris
prin  a b c  . O astfel de structură poate fi uşor codificată în Lisp,
astfel că primele implementări din programarea genetică foloseau
acest limbaj.
După mai mulţi ani de studiere a simulării evoluţiei, John
Holland de la Universitatea Michigan a propus în 1975 [44] conceptul
de "algoritm genetic". Au fost abordate probleme de optimizare
discretă iar structura de date aleasă
a fost şirul de biţi. Într-o
accepţiune strictă, noţiunea de algoritm genetic se referă la modelul
studiat de Holland şi de studentul său De Jong. Într-un sens mai larg,
algoritm genetic este orice model bazat pe ideea de populaţie şi care
foloseşte operatori de selecţie şi recombinare pentru a genera noi
puncte în spaţiul de căutare.
O altă direcţie o constituie „programarea evolutivă”. Iniţial, ea
a avut ca obiectiv dezvoltarea unor structuri automate de calcul
printr-un proces de evoluţie în care operatorul principal este cel de
mutaţie. Bazele domeniului au fost puse de către Fogel [28]. Ulterior,
programarea evolutivă a fost orientată către rezolvarea problemelor de
5
optimizare având aceeaşi sferă de aplicabilitate ca şi strategiile
evolutive.
Calculul evolutiv foloseşte algoritmi ale căror metode
de
căutare au ca model câteva fenomene naturale: moştenirea genetică
şi lupta pentru supravieţuire. Cele mai cunoscute tehnici din clasa
calculului evolutiv sunt cele amintite anterior: algoritmii genetici,
strategiile evolutive, programarea genetică şi programarea evolutivă.
Există şi alte sisteme hibride care încorporează diferite proprietăţi ale
paradigmelor de mai sus; mai mult, structura oricărui algoritm de
calcul evolutiv este, în mare măsură, aceeaşi.
Calculul evolutiv este un domeniu al calculului inteligent în
care rezolvarea unei probleme este văzută ca un proces de căutare în
spaţiul tuturor soluţiilor posibile. Această căutare este realizată prin
imitarea unor mecanisme specifice evoluţiei în natură. În scopul
găsirii soluţiei, se utilizează o populaţie de căutare. Elementele
acestei populaţii reprezintă soluţii potenţiale ale problemei. Pentru a
ghida căutarea către soluţia problemei, asupra populaţiei se aplică
transformări specifice, inspirate din evoluţia naturală, precum:
Selecţia. Elementele populaţiei care se apropie de soluţia
problemei sunt considerate adecvate şi sunt favorizate în sensul că au
mai multe şanse de a supravieţui în generaţia următoare precum şi de a
participa la generarea de “descendenţi”.
6
Încrucişara. La fel ca în înmulţirea din natură, pornind de la două
sau mai multe elemente ale populaţiei (numite părinţi), se generează
noi elemente (numite descendenţi). În funcţie de calitatea acestora
(apropierea de soluţia problemei) descendenţii pot înlocui părinţii sau
alţi indivizi din populaţie.
Mutaţia. Pentru a asigura diversitatea populaţiei se aplică, la fel ca
în natură, transformări cu caracter aleator asupra elementelor
populaţiei, permiţând apariţia unor trăsături (gene) care doar prin
încrucişare şi selecţie nu ar fi apărut în cadrul populaţiei.
În continuare, un algoritm de rezolvare bazat pe aceste idei va fi
numit algoritm evolutiv. Principalele caracteristici ale algoritmilor
evolutivi, comparativ cu cei tradiţionali sunt:
•
sunt algoritmi probabilişti ce îmbină căutarea dirijată cu cea
aleatoare;
•
realizează un echilibru aproape perfect între explorarea spaţiului
stărilor şi găsirea celor mai bune soluţii;
•
în timp ce metodele clasice de căutare acţionează la un moment
dat asupra unui singur punct din spaţiul de căutare, algoritmii
evolutivi menţin o mulţime (numită populaţie) de soluţii posibile;
•
algoritmii evolutivi nu acţionează direct asupra spaţiului de
căutare ci asupra unei codificări a lui;
•
sunt mai robuşti decât algoritmii clasici de optimizare şi decât
metodele de căutare dirijată;
7
•
sunt simplu de utilizat şi nu cer proprietăţi importante ale funcţiei
obiectiv precum continuitate, derivabilitate, convexitate, ca în cazul
algoritmilor clasici;
•
furnizează, cu o mare probabilitate, o soluţie apropiată de cea
exactă.
1.2 Noţiuni de bază
Principalele noţiuni care permit analogia între rezolvarea
problemelor de căutare şi evoluţia naturală sunt următoarele:
Cromozomul este o mulţime ordonată de elemente, numite gene, ale
căror valori determină caracteristicile unui individ. În genetică,
poziţiile pe care se află genele în cadrul cromozomului se numesc loci,
iar valorile pe care le pot lua se numesc alele. În calculul evolutiv
cromozomii sunt, de obicei, vectori ce conţin codificarea unei soluţii
potenţiale şi sunt numiţi indivizi. Astfel, genele nu sunt altceva decât
elementele acestor vectori.
Populaţia. O populaţie este constituită din indivizi care trăiesc într-un
mediu la care trebuie să se adapteze. În calculul evolutiv, un individ
este de cele mai multe ori identificat cu un cromozom şi reprezintă un
element din spaţiul de căutare asociat problemei de rezolvat.
8
Genotipul este ansamblul tuturor genelor unui individ sau chiar a
întregii populaţii. În calculul evolutiv genotipul reprezintă codificările
corespunzătoare tuturor elementelor populaţiei.
Fenotipul este ansamblul trăsăturilor determinate de către un anumit
genotip. În calculul evolutiv fenotipul reprezintă valorile obţinute prin
decodificare, adică valori din spaţiul de căutare.
Generaţia este o etapă în evoluţia unei populaţii. Dacă vedem evoluţia
ca un proces iterativ în care
o populaţie se transformă în altă
populaţie atunci generaţia este o iteraţie în cadrul acestui proces.
Selecţia. Procesul de selecţie naturală are ca efect supravieţuirea
indivizilor cu grad ridicat de adecvare la mediu (fitness mare). Acelaşi
scop îl are şi mecanismul de selecţie de la algoritmii evolutivi şi
anume de a favoriza supravieţuirea elementelor cu grad mare de
adecvare. Aceasta asigură apropierea de soluţia problemei întrucât se
exploatează informaţiile furnizate de către cele mai bune elemente ale
populaţiei. Unul dintre principiile teoriei evoluţioniste este acela că
selecţia este un proces aleator şi nu unul determinist. Acest lucru este
întâlnit în majoritatea mecanismelor de selecţie utilizate de către
algoritmii evolutivi.
Reproducerea este procesul prin care, pornind de la populaţia curentă,
se construieşte o nouă populaţie. Indivizii noii populaţii (generaţii)
moştenesc caracteristici de la părinţii lor, dar pot dobândi şi
caracteristici noi ca urmare a unor procese de mutaţie care au un
caracter întâmplător. În cazul în care în procesul de reproducere
9
intervin cel puţin doi părinţi, caracteristicile moştenite de descendenţi
se obţin prin combinarea (încrucişarea) caracteristicilor părinţilor.
Mecanismele de încrucişare şi mutaţie asigură explorarea spaţiului
soluţiilor prin descoperirea de noi configuraţii.
Fitnessul
sau adecvarea. În evoluţia naturală fiecare individ al
populaţiei este adaptat mai mult sau mai puţin mediului iar unul dintre
principiile teoriei evoluţioniste este acela că supravieţuiesc doar cei
mai buni indivizi. Fitnessul (adecvarea) este o măsură a gradului de
adaptare a individului la mediu. Scopul evoluţiei este ca toţi indivizii
să ajungă la o adecvare cât mai bună la mediu, ceea ce sugerează
legătura între un proces de evoluţie şi unul de optimizare. În calculul
evolutiv, gradul de adecvare al unui element al populaţiei este o
măsură a calităţii acestuia în raport cu problema care trebuie rezolvată.
Dacă este vorba de o problemă de maximizare atunci gradul de
adecvare va fi direct proporţional cu valoarea funcţiei obiectiv (un
element este cu atât mai bun cu cât valoarea acestei funcţii este mai
mare).
Noţiunile de adecvare şi evaluare sunt folosite în general cu
acelaşi sens; totuşi se poate face o distincţie între ele. Funcţia de
evaluare, sau funcţia obiectiv, reprezintă o măsură a performanţei în
raport cu o mulţime de parametri, în timp ce funcţia de adecvare
transformă această măsură a performanţei în alocare de facilităţi
reproductive. Evaluarea unui şir reprezentând o mulţime de parametri
este independentă de evaluarea altor şiruri. Adecvarea unui şir este,
10
însă, definită în raport cu alţi membri ai populaţiei curente; de
exemplu, prin
fi
, unde f i este evaluarea asociată şirului i iar f este
f
evaluarea medie a tuturor şirurilor populaţiei.
La aplicarea unui algoritm evolutiv pentru rezolvarea unei
probleme concrete trebuie alese în mod adecvat: modul de codificare a
elementelor, funcţia de adecvare şi operatorii de selecţie, încrucişare şi
mutaţie. Unele dintre aceste elemente sunt strâns legate de problema
de rezolvat, iar altele mai puţin.
Structura unui algoritm evolutiv este următoarea
Procedure AE
begin
t : 0
iniţializare P (t )
evaluare P (t )
while (not condiţie terminare) do
begin
t : t  1
selectare P (t ) din P (t  1)
modificare P (t )
evaluare P (t )
end
end.
11
1.3. Domenii de aplicabilitate
Sistemele evolutive se utilizează atunci când nu există altă
strategie de rezolvare a problemei şi este acceptat un răspuns
aproximativ. Se utilizează în special atunci când problema poate fi
formulată ca una de optimizare, însă nu numai. Algoritmii evolutivi
sunt utilizaţi în diverse domenii, precum:
Planificare. Majoritatea problemelor de planificare (de
exemplu, alegerea traseelor optime ale unor vehicule, rutarea
mesajelor într-o reţea de telecomunicaţii, planificarea unor activităţi,
etc.) pot fi formulate ca probleme de optimizare cu sau fără restricţii.
Multe din acestea sunt de tip NP, necunoscându-se algoritmi de
rezolvare care să aibă complexitate polinomială. Pentru astfel de
probleme algoritmii evolutivi oferă posibilitatea obţinerii, în timp
rezonabil, a unor soluţii sub-optimale de calitate acceptabilă.
Proiectare. Algoritmii evolutivi au fost aplicaţi cu succes în
proiectarea circuitelor digitale, a filtrelor dar şi a unor structuri de
calcul cum sunt reţelele neuronale. Ca metode de estimare a
parametrilor unor sisteme ce optimizează anumite criterii, algoritmii
evolutivi se aplică în diverse domenii din inginerie cum ar fi:
proiectarea avioanelor, proiectarea reactoarelor chimice, proiectarea
structurilor în construcţii, etc.
12
Simulare şi identificare. Simularea presupune să se determine
modul de comportare a unui sistem pornind de la un model al acestuia.
Identificarea este sarcina inversă, a determinării structurii sistemului
pornind de la modul de comportare. Algoritmii evolutivi sunt utilizaţi
atât în simularea unor probleme din inginerie dar şi din economie.
Identificarea unui model este utilă în special în efectuarea unor
predicţii în diverse domenii ( economie, finanţe, medicină, ştiinţele
mediului, etc.).
Control. Algoritmii evolutivi pot fi utilizaţi pentru a
implementa controlere on-line asociate sistemelor dinamice (de
exemplu pentru a controla roboţii mobili ).
Clasificare. Se poate considera că din domeniul calculului
evolutiv fac parte şi sistemele de clasificare. Un sistem de clasificare
se bazează pe o populaţie de reguli de asociere (reguli de producţie)
care evoluează pentru a se adapta problemei de rezolvat (calitatea unei
reguli se stabileşte pe baza unor exemple). Evoluţia regulilor are
acelaşi scop ca şi la învăţarea în reţelele neuronale. Algoritmii
evolutivi sunt aplicaţi cu succes în clasificarea imaginilor, în biologie
(pentru determinarea structurii proteinelor) sau în medicină (pentru
clasificarea electrocardiogramelor).
2
Operatori de evolutie în A. G.
2.1. Selecția
Selecţia are ca scop determinarea populaţiei intermediare ce conţine părinţii care vor fi
supuşi operatorilor de încrucişare şi mutaţie precum şi determinarea indivizilor ce vor face parte
din generaţia următoare. Criteriul de selecţie se bazează pe gradul de adecvare al indivizilor la
mediu, exprimat prin valoarea funcţiei de adecvare. Nu este obligatoriu ca atât părinţii cât şi
supravieţuitorii să fie determinaţi prin selecţie, fiind posibil ca selecţia să fie folosită într-o singură
etapă.
Metoda ruletei
Principiul ruletei reprezintă cea mai simplă metodă de selecţie, fiind un algoritm stochastic
ce foloseşte următoarea tehnică:
• indivizii sunt văzuţi ca segmente continuie pe o dreaptă, astfel încât fiecare segment asociat unui
individ este egal cu fitnessul său;
• se generează un număr aleator şi individul al cărui segment conţine numărul respectiv va fi
selectat
• se repetă pasul anterior până când este selectat numărul dorit de indivizi.
Această tehnică este similară cu cea a ruletei, în care fiecare segment este proporţional cu
fitnessul unui individ.
Exemplul 2.1. [98]Tabelul următor arată probabilitatea de selecţie pentru 11 indivizi,
folosind aranjarea liniară şi presiunea selectivă 2.
Individul 1 este cel mai bun şi ocupă intervalul cel mai mare, în timp ce individul 10 este
penultimul în şir şi ocupă intervalul cel mai mic; individul 11, ultimul din şir, având fitnessul 0 nu
poate fi ales pentru reproducere.
nr 4
individ
nr 2
nr 6
nr 5
nr 1
nr 3
Probabilitatea în acest tabel se calculează ca fiind raportul dintre fitnessul unui individ şi suma
fitnessurilor tuturor indivizilor.
Număr
1
2
3
4
5
6
7
8
9
10
2.0
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0.18
0.16
0.15
0.13
0.11
0.09
0.07
0.06
0.03
0.02
individ
Fitness
Probabilitate
selecţie
Dorind să selectăm 6 indivizi, se generează 6 numere aleatoare uniform distribuite în intervalul
0, 1 . Fie acestea
0.81 , 0.32 , 0.96 , 0.01 , 0.65 , 0.42 . Aşezând cei 10 indivizi pe o dreaptă, în
ordinea
1, 2, ... ,10, individul 1 va ocupa segmentul cuprins între 0.0 şi 0.18, individul 2 segmentul dintre
0.18 şi 0.34, individul 3 segmentul dintre 0.34 şi 0.49, etc. Deoarece individul 6 corespunde
segmentului delimitat de 0.73 şi 0.82 iar primul număr generat (= 0.81) cade în acest segment,
individul 6 va fi selectat pentru reproducere. În mod similar se selectează indivizii 1, 2, 3, 5, 9;
deci, populaţia selectată este formată din indivizii 1, 2, 3, 5, 6, 9.
2.2. Încrucişarea
Încrucişarea permite combinarea informaţiilor provenite de la doi sau mai mulţi părinţi pentru
generarea unuia sau mai multor urmaşi.
2.2.1. Încrucişarea binară
Acest operator crează descendenţi prin combinarea unor părţi alternative ale părinţilor.
2.2.1.1. Încrucişarea simplă
Dacă
N este numărul poziţiilor binare ale unui individ, poziţia de încrucişare
k  1, 2, , N  1 este selectată uniform şi aleator iar valorile situate la dreapta acestui punct sunt
schimbate între cei doi indivizi, rezultând doi descendenţi.
părinţi
descendenţi
Figura 4.1
Ca exemplu [98], considerăm următorii părinţi, având 11 biţi fiecare, şi poziţia de
încrucişare 5
p1:
01110011010
p2:
10101100101
Rezultă descendenţii :
d1:
0 1 1 1 0| 1 0 0 1 0 1
d2:
1 0 1 0 1| 0 1 1 0 1 0
2.2.1.2. Încrucişarea multiplă
În acest caz, se folosesc m  1 puncte de încrucişare ki  1, 2,  , N  1 alese aleator,
diferite între ele şi sortate crescător. Valorile aflate între puncte de încrucişare consecutive sunt
schimbate între cei doi părinţi pentru a obţine doi descendenţi.
părinţi
descendenţi
Figura 4.2
De exemplu [98], indivizii
p1:
01110011010
p2:
10101100101
cu 3 puncte de încrucişare date de poziţiile : 2, 6, 10
generează descendenţii
d1:
0 1| 1 0 1 1| 0 1 1 1| 1
d2:
1 0| 1 1 0 0| 0 0 1 0| 0
Ideea încrucişării multiple este aceea că părţi ale cromozomilor care contribuie la
îmbunătăţirea performanţei unui individ nu este necesar să fie conţinute în subşiruri adiacente. Mai
mult, încrucişarea multiplă pare să încurajeze explorarea în spaţiul de căutare mai degrabă, decât
să favorizeze convergenţa rapidă către indivizii de elită.
2.2.1.3. Algoritmul de încrucişare
Fie P (t ) populaţia curentă şi P s populaţia aleasă în urma operaţiei de selecţie. Asupra
indivizilor din P s se aplică operatorul de încrucişare cu probabilitatea p c (indicele c provine de
la termenul crossover, folosit în limba engleză pentru încrucişare ).
Algoritmul
P1. Pentru fiecare individ din P s :
•
•
se generează un număr aleator q  0, 1
dacă q  p c atunci individul respectiv este reţinut pentru
încrucişare; în caz contrar nu participă la această operaţie.
P2. Fie m numărul indivizilor reţinuţi la pasul P1.
m
perechi.
2
• dacă m este impar atunci, în mod aleator, se şterge un individ
•
dacă m este număr par, se formează aleator
selectat sau se adaugă unul nou la P s , apoi se formează
perechile.
P3. Perechile formate anterior sunt supuse operaţiei de încrucişare.
•
pentru fiecare pereche se stabilesc
aleator
încrucişare k i , 1  k i  l ,
este lungimea
unde
l
punctele de
unui
cromozom.
•
se execută încrucişarea pentru perechea curentă, descendenţii
devenind membri ai generaţiei urmatoare P (t  1) , iar părinţii se
şterg din Pt 
•
se adaugă la P (t  1) indivizii rămaşi în P (t ) .
Observaţie
Probabilitatea de încrucişare are valori mici, de regulă în intervalul [0.2, 0.95]; p c  0.3
înseamnă că 30% din indivizi vor suferi încrucişări.
2.3. Mutaţia
Mutaţia este cel mai simplu operator genetic şi constă în schimbarea aleatoare a unor valori
ale cromozomului pentru a introduce noi soluţii. Scopul său este de a împiedica pierderea
ireparabilă a diversităţii, evitând, astfel, convergenţa prematură. Diversitatea permite explorarea
unor zone largi din spaţiul de căutare. Mutaţia foloseşte ca parametru probabilitatea de mutaţie p m
, care ia valori mici; de obicei în intervalul 0.001, 0.01 .
2.3.1. Mutaţia binară
Dacă n este dimensiunea populaţiei iar l este lungimea unui cromozom atunci numărul
mediu de biţi ce vor suferi mutaţie este N  n  l  p m . Mutaţia binară poate fi implementată sub
mai multe forme [16].
2.3.1.1. Mutaţia tare
Pentru fiecare poziţie a fiecărui cromozom se execută paşii:
P1: se generează un număr aleator q  0, 1
P2: dacă q  p m atunci se schimbă poziţia respectivă, în caz contrar
nu se efectuează nimic.
2.3.1.2. Mutaţia slabă
Pentru fiecare poziţie a fiecărui cromozom se execută paşii:
P1: se generează un număr aleator q  0, 1
P2: dacă q  p m atunci se alege aleator una din valorile 0 sau 1 şi se
atribuie poziţiei curente, în caz contrar nu se efectuează nimic.
2.3.1.3. Funcționarea algoritmilor genetici
Explicăm funcţionarea algoritmilor genetici pentru o problemă de maximizare deoarece
minimizarea funcţiei f este echivalentă cu maximizarea funcţiei g   f . În plus, presupunem că
funcţia obiectiv f ia valori pozitive, în caz contrar putându-se aduna o constantă pozitivă C şi
maximizându-se f  C .
Presupunem că se doreşte maximizarea unei funcţii de k variabile
f : Rk  R ;
fiecare variabilă x i ia valori într-un domeniu
Di  ai , bi   R
şi
f x1 , , xk   0 pentru xi  Di .
Cerând o precizie de p zecimale pentru valorile variabilelor, fiecare interval Di va trebui divizat
în bi  ai  10 p subintervale egale. Fie l i cel mai mic întreg astfel încât
bi  ai  10 p  2l
i
1 .
Atunci, o reprezentare a variabilei x i ca un şir binar de lungime l i va satisface precizia
dorită. În plus, are loc formula
xi  ai  zecimalstring2  
bi  ai
2 li  1
unde zecimalstring2  reprezintă valoarea zecimală a şirului binar string .
k
Acum fiecare cromozom este reprezentat printr-un şir binar de lungime l   li ; primii l1
i 1
biţi reprezintă o valoare din a1 , b1 , următorii l 2 biţi reprezintă o valoare din a2 , b2  şi aşa mai
departe. Populaţia iniţială constă din n cromozomi aleşi aleator. Totuşi dacă avem informaţii
despre optimul potenţial, ele pot fi utilizate pentru a genera populaţia iniţială. În continuare
algoritmul funcţionează astfel:
•
•
se evaluează fiecare cromozom al fiecărei generaţii, utilizând funcţia f
se selectează noua populaţie conform distribuţiei de probabilitate bazată pe fitness
•
se modifică cromozomii din noua populaţie prin operatori de mutaţie şi încrucişare
•
după un număr de generaţii, când nu mai sunt observate îmbunătăţiri substanţiale, cel mai bun
cromozom este selectat ca soluţie optimă; deseori algoritmul se opreşte după un număr finit de
iteraţii.
Pentru procesul de selecţie vom utiliza tehnica ruletei:
•
se calculează fitnessul evalvi  pentru fiecare cromozom vi , i  1, 2, , n
se găseşte fitnessul total
•
n
F   evalvi 
i 1
se calculează probabilitatea de selecţie p i pentru fiecare cromozom vi , i  1, 2, , n :
•
pi 
evalvi 
F
se calculează probabilitatea cumulată q i pentru fiecare cromozom vi , i  1, 2, , n :
•
qi 
i
 pj
j 1
Procesul de selecţie constă în folosirea ruletei de n ori; de fiecare dată se selectează un
singur cromozom astfel:
se generează un număr aleator a  0, 1
•
dacă a  q1 , se selectează primul cromozom; altfel se selectează cromozomul vi , 2  i  n ,
•
astfel încât qi 1  a  qi .
Este evident că unii cromozomi vor fi selectaţi de mai multe ori, cromozomii cei mai buni
generând mai multe copii.
După selecţie se aplică operatorul de încrucişare. Folosind probabilitatea de încrucişare pc
se determină numărul pc  n de cromozomi supuşi încrucişării. Se procedează astfel, pentru fiecare
cromozom din noua populaţie:
•
se generează un număr aleator a  0, 1
•
dacă a  pc , cromozomul curent se selectează pentru încrucişare.
În continuare se împerechează aleator cromozomii şi pentru fiecare pereche se generează
un numar aleator întreg pos  1, l  1 , l fiind lungimea unui cromozom iar pos este poziţia de
încrucişare. Doi cromozomi
b1b2  b pos b pos 1  bl
şi
c1c 2  c pos c pos 1  cl
sunt înlocuiţi prin descendenţii
b1b2  b pos c pos 1  cl
şi
c1c 2  c pos b pos 1  bl .
Se aplică, apoi, operatorul de mutaţie. Probabilitatea de mutaţie p m dă numărul p m  l  n al biţilor
ce vor suferi mutaţie. Pentru fiecare bit al fiecărui cromozom au loc operaţiile:
•
se generează un număr aleator a  0, 1
•
dacă a  p m , bitul va suferi mutaţie.
În urma operaţiilor de selecţie, încrucişare şi mutaţie, noua populaţie este gata pentru
următoarea evaluare.
3
STRATEGII EVOLUTIVE
3.1 Generalităţi
Strategiile evolutive (SE) au apărut din necesitatea de a
rezolva probleme de optimizare de tipul: “se cere x*  D  R n cu
 
f x*  f x  x  D , unde f : D  R n  R şi D este o regiune
mărginită determinată de restricţiile impuse asupra lui x ”. Pentru o
astfel de problemă strategiile evolutive sunt mai potrivite decât
algoritmii genetici deoarece nu necesită codificarea binară a datelor,
care are dezavantajul că limitează precizia. Strategiile evolutive
reprezintă indivizii sub forma unor vectori cu componente reale şi
folosesc mutaţia ca principal operator de evoluţie.
În strategiile evolutive avansate reprezentarea indivizilor
încorporează în ea şi parametrii de control ai strategiei. Un individ se
reprezintă ca o pereche v  x,   , unde vectorul x este un element
din spaţiul de căutare iar  2 este vectorul dispersie;  i2 reprezintă
dispersia perturbaţiei pe care o suferă componenta xi a vectorului x
14
în procesul de mutaţie. Vectorul  reprezintă parametrul de control al
strategiei.
Primul algoritm de tip strategie evolutivă a fost propus în
1964, la Universitatea Tehnică din Berlin, de către Rechenberg şi
Schwefel în scopul rezolvării unei probleme care cerea amplasarea
unei conducte flexibile într-o zonă de o anumită formă astfel încât
costul să fie cât mai mic. Ideea rezolvării era: pornind de la o
aproximaţie curentă se genera alta printr-o perturbaţie aleatoare bazată
pe o repartiţie normală şi se alegea cea mai bună dintre cele două.
Această strategie, numită 1 1 , nu opera cu populaţii ci urmărea
adaptarea unui singur individ sub acţiunea mutaţiei. Limitele acestui
model au dus la căutarea unor mecanisme care să implice în evoluţie
mai mulţi indivizi [5, 6, 7, 8, 9, 15, 16, 81, 82]. Dintre acestea
amintim:
•
Strategia  ,   : pornind de la cei  indivizi ai populaţiei curente,
se generează prin încrucişare şi mutaţie    indivizi noi, iar dintre
aceştia se aleg  , care vor forma noua populaţie; dacă   1 , se alege
individul cel mai bun ( cel pentru care funcţia obiectiv este cea mai
mică).
•
Strategia
    :
pornind de la cei  indivizi ai populaţiei
curente, se generează prin încrucişare şi mutaţie un număr de 
indivizi noi care se adaugă la populaţia curentă; din populaţiile reunite
se selectează  indivizi care vor forma noua populaţie. Dacă   1 ,
15
se foloseşte doar mutaţia pentru a genera indivizi noi, fiind evident că
încrucişarea nu poate fi utilizată. Dacă   1 , selecţia urmăreşte
eliminarea celui mai slab individ (cel care dă cea mai mare valoare a
funcţiei obiectiv) din populaţiile reunite. Este evident că dacă
individul nou generat este mai slab decât toate elementele populaţiei
curente atunci aceasta rămâne nemodificată.
•
Strategia  , k , , p  : este o extindere a strategiei  ,   şi se
caracterizează prin:
a) k  1 reprezintă durata de viaţă a indivizilor, măsurată în
generaţii. Valoarea lui k se micşorează cu 1 la fiecare nouă generaţie
iar un individ va fi selectat doar dacă k  0 .
b) operaţiile
şi încrucişare sunt controlate de
de mutaţie
probabilităţile pm şi respectiv pc , la fel ca în cazul algoritmilor
genetici.
c) numărul părinţilor folosiţi în operaţia de încrucişare este p
d) se pot folosi şi operatori de încrucişare specifici algoritmilor
genetici, ca de exemplu încrucişarea multiplă.
• Strategia evolutivă rapidă: este o variantă, încadrată de autorii ei
la programarea evolutivă, caracterizată prin:
a) foloseşte
o
specifică
mutaţie
bazată
pe perturbarea
elementelor cu valori generate în conformitate cu distribuţia Cauchy a
cărei funcţie de densitate este:
( x ) 
1
,
 x  2

2

 0
16
Acest tip de perturbaţie prezintă avantajul că poate genera descendenţi
îndepărtaţi de părinţi cu o probabilitate mai mare decât perturbaţia
normală (asigurând o probabilitate mai mare de evadare din minime
locale şi o accelerare a procesului de găsire a optimului). Diferenţa
dintre cele două funcţii de densitate (normală şi Cauchy) este ilustrată
în figura 8.1.
b) perturbaţiile
sunt
independente iar parametrii  i sunt
determinaţi prin autoadaptare (cu perturbare log-normală) la fel ca în
cazul strategiilor evolutive clasice.
c) nu se foloseşte încrucişare nici asupra
componentelor
propriu-zise nici asupra parametrilor de control.
d) selecţia este de tip turneu.
Figura 2.1: Funcţiile de distribuţie ale repartiţiilor normală (linie
întreruptă) şi Cauchy (linie continuă)
17
3.2 Operatori de evoluţie
3.2.1 Încrucişarea
Operatorul de încrucişare selectează, cu o probabilitate
uniformă, cei p părinţi. Cel mai des se utilizează cazurile p  2 şi
p   . Datorită similarităţii dintre reprezentarea indivizilor în
strategii evolutive şi în algoritmi genetici, există similaritate şi între
tipurile de încrucişare.
3.2.1.1 Încrucişarea discretă
În cazul p  2 fie x1 şi x 2 părinţii selectaţi aleator; pentru
fiecare componentă i  1, 2, , n se generează un număr aleator
qi  0, 1 ,
urmând
distribuţia
uniformă.
Componenta
yi
a
descendentului va fi
 x1i dacă qi  0.5
yi  
 xi2 dacă qi  0.5
În cazul când se încrucişează p părinţi
yi
a

descendentului
x  x1, ,  x p
y
este
x , , x , componenta
1
componenta
 ales aleator. În cazul
p
xi
a
părintelui
p   , părintele care dă
componenta yi a descendentului y este ales din întreaga populaţie;
de aceea încrucişarea se numeşte globală.
18
3.2.1.2 Încrucişarea intermediară
În acest caz componenta yi a descendentului y este o
combinaţie convexă a componentelor corespunzătoare din cei p
părinţi x1, , x p , aleşi aleator:
yi 
p
  j xij
i  1, 2,  , n
j 1
şi
p
  j  1,
j 1
j  0.
În general se lucrează cu doi părinţi x1 , x 2 :
yi  x1i  1   xi2 , i  1, 2,  , n
cu   0, 1 ales ca fiind valoarea unei variabile aleatoare cu
distribuţie uniformă şi păstrând aceeaşi valoare pentru toate
componentele.
Încrucişarea intermediară poate fi extinsă schimbând părinţii şi
parametrul  pentru fiecare componentă a descendentului. Toate
tipurile de încrucişare se aplică similar şi pentru dispersie sau alţi
parametri de control.
19
Experimental, s-a observat că se obţin rezultate bune dacă pentru
vectorii de poziţie se utilizează încrucişarea discretă iar pentru
parametrii strategiei se foloseşte încrucişarea intermediară.
3.2.2 Mutaţia
Mutaţia este cel mai important operator al strategiilor evolutive.
Am văzut că un individ este o pereche x,   , unde vectorul  indică
modul în care vectorul de poziţie x se transformă prin mutaţie.
Parametrul de control  este supus şi el operaţiei de mutaţie.
Individul x,   se transformă prin mutaţie în x' ,  ' :
x' ,  '  mutx,   .
Vectorul x' se obţine după legea
x'  x    N 0, 1
sau, pe componente,
x'i  xi   'i N i 0, 1 , i  1, 2, , n
unde N 0,1 şi N i 0, 1 sunt variabile aleatoare de medie 0 şi
dispersie 1.
Mutaţia asupra lui  acţionează diferit, după cum   1 sau
  1 , în modelele  ,   şi    .
20
3.3. Funcţionarea strategiilor evolutive
Structura generală a unei strategii evolutive este cea a unui
algoritm evolutiv, apărând, în funcţie de strategie, unele diferenţe la
nivelul operatorilor.
3.3.1 Strategia (1  1)
Modelul iniţial al lui Rechenberg consideră populaţia formată
dintr-un singur individ supus operatorului de mutaţie. După obţinerea
descendentului, cei doi membri ai populaţiei sunt comparaţi între ei
prin intermediul funcţiei de adecvare şi este reţinut individul cel mai
bun. Folosim următoarele notaţii:
• k
= un număr de generaţii consecutive; de obicei se ia k  10n ,
unde n este dimensiunea spaţiului de căutare
• s (k ) =
numărul mutaţiilor de succes din ultimele k generaţii
consecutive
• p(k ) 
s (k )
= frecvenţa mutaţiilor de succes din ultimele k
k
generaţii
•
C= constantă, aleasă (la sugestia lui Schwefel [79]) ca fiind 0.817,
această valoare fiind obţinută pentru modelul sferei.
21
Algoritmul SE(1+1)
begin
t : 1 , P(t )  x(t ),  (t )
evaluează f (x )
while not cond ( P(t )) do
begin
calculează p(k )
calculează
 t 
nc


 t  1  mut  t    t 


n
 t   c

dacă
dacă
dacă
1
5
1
p(k ) 
5
1
p k  
5
p(k ) 
calculează
xt  1  mutxt   xt    t  1  N 0, 1
evaluează f xt  1
if f xt 1  f xt  {selecţia}
then Pt  1  xt  1,  t  1
else P (t  1) : P(t )
t : t  1
end
end
22
Observaţii.
1) cond ( P(t )) reprezintă condiţia de oprire, dată de obicei prin
numărul de generaţii
2) Schwefel a propus [79] o altă versiune a mutaţiei pentru
parametrul 
 t 
 c


 t  n    t 


c   t 

dacă
dacă
dacă
1
5
1
p(k ) 
5
1
pk  
5
p(k ) 
3) Strategia (1+1) lucrează cu populaţii formate dintr-un singur
individ şi nu foloseşte încrucişarea.
Regulile, folosite anterior, pentru modificarea parametrului 
sunt versiuni ale “regulii de succes 1 5 ” propusă de Rechenberg, care
afirmă că:
•
raportul dintre numărul mutaţiilor de succes şi numărul mutaţiilor
totale trebuie să fie 1 5
•
dacă acest raport este mai mare decât 1 5 atunci valoarea lui 
trebuie să crească
•
dacă raportul este mai mic decât 1 5 atunci valoarea lui
trebuie să descrească.

23
3.3.2 Strategia (   1)
Strategia (1+1) poate fi generalizată prin mărirea numărului
părinţilor fiecărui descendent şi/sau a numărului descendenţilor unui
părinte. În strategia
  1,
  1 părinţi vor genera un singur
descendent folosind încrucişarea şi mutaţia. Încrucişarea se aplică atât
vectorului de poziţie cât şi dispersiei şi se poate folosi oricare din
operatorii prezentaţi anterior. Mutaţia se aplică urmând principiul
strategiei 1 1 , dar nu există o metodă de control a dispersiei, regula
1 5 nemaifiind aplicabilă în acest caz. Acest dezavantaj face ca
strategia   1 să fie puţin folosită.
3.3.3 Strategii multidescendent
Aceste strategii au apărut din dorinţa de a folosi metode mai
robuste şi mai generale pentru a controla parametrii mutaţiei. Din
această categorie fac parte strategiile     şi  ,   care lucrează
cu populaţii formate din   1 părinţi şi    descendenţi, ceea ce
determină o creştere a vitezei de convergenţă. Cei  indivizi ai noii
generaţii se selectează din :
•
populaţia intermediară obţinută reunind
generaţiei curente cu
   
cei
 indivizi ai
cei  descendenţi ai ei, în cazul strategiei
24
•
din cei  descendenţi ai populaţiei curente, în cazul strategiei
,   .
Deoarece strategia  ,   îşi schimbă complet populaţia la
fiecare nouă generaţie, ea nu este elitistă. Această calitate permite
ieşirea din zona unui optim local şi evoluţia către optimul global. În
schimb, strategia
   
permite supravieţuirea unor indivizi
neperformanţi din vechea generaţie. În felul acesta procesul de căutare
tinde să favorizeze minimele locale în defavoarea celor globale. De
aceea se recomandă folosirea strategiei  ,   .
Încrucişarea. Iniţial strategiile
   
şi
,  
au fost
aplicate folosind doar operatorul de mutaţie. Ulterior s-a constatat că
se obţin rezultate mai bune dacă se foloseşte şi încrucişarea, aplicată
înaintea mutaţiei. În mod empiric, s-a ajuns la concluzia că folosirea
încrucişării discrete pentru vectorii de poziţie şi a celei convexe pentru
parametrii strategiei conduce la cele mai bune rezultate. Operatorul de
încrucişare se aplică de  ori populaţiei de  părinţi, obţinându-se o
populaţie intermediară de    indivizi. Un descendent se obţine
prin încrucişarea a p părinţi, 1  p   ; de obicei se ia p  2 sau
p  .
Mutaţia. Considerăm indivizii reprezentaţi prin perechi de
forma
x,   ,
unde x  R n este vectorul de poziţie iar  2 este
vectorul dispersie. Notăm cu N 0, 1 un număr aleator ce urmează
25
distribuţia normală de medie 0 şi dispersie 1. Mutaţia standard
înlocuieşte individul x,   cu x' ,  ' obţinut după regulile:
 'i   i e p1 N (0, 1)  p 2 N i (0, 1)
x'i  xi   'i N i 0, 1
cu i  1, 2, , n. Parametrii p1 şi p2 controlează mărimea pasului
mutaţiei şi respectiv schimbările individuale. Schwefel [77] a propus
pentru aceşti parametri următoarele valori
p1 
c1
şi p2 
2n
c2
2 n
unde c1 şi c2 iau de obicei valoarea 1.
Mutaţia, aşa cum rezultă din formulele anterioare, acţionează
întâi asupra dispersiei şi apoi asupra vectorului de poziţie. Se poate
lucra cu vectorul dispersie având toate componentele egale; fie 
valoarea lor comună. În acest caz mutaţia funcţionează după regulile
 '   e pN ( 0 ,1 )
x'i  xi   ' N i 0,1
adică toate componentele vectorului de poziţie se modifică folosind
aceeaşi dispersie; parametrul p ia valoarea p 
c
.
n
Strategiile multidimensionale funcţionează după următorul algoritm
26
begin
t : 0
iniţializează populaţia P (t )
evaluează indivizii din P (t )
while not cond ( P(t )) do
begin
P' t  :  încrucişare ( P (t ))
P" t  :  mutaţie ( P ' (t ))
evaluează P" ( t )
P(t  1) : selecţie P" t   M 
t : t  1
end
end
Populaţia iniţială se construieşte alegând aleator , cu o
probabilitate uniformă,  puncte x i  R n , i  1, 2, , . Dacă se
cunoaşte un punct x aflat în vecinătatea celui de optim atunci x va fi
unul dintre indivizi iar ceilalţi   1 se obţin prin mutaţii asupra lui. O
adaptare eficientă a parametrilor strategiei necesită o diversitate
suficient de mare a populaţiei de părinţi. Deci, numărul  trebuie să
fie mai mare ca 1 iar raportul dintre numărul descendenţilor şi cel al
părinţilor trebuie să fie în favoarea descendenţilor. Se recomandă un
raport

 7 şi, în mod frecvent, se foloseşte strategia 15, 100 .

27
Kursawe [54] a arătat că asigurarea convergenţei este
influenţată de operatorul de încrucişare folosit; acesta depinde de
forma funcţiei obiectiv, de dimensiunea spaţiului de căutare şi de
numărul de parametri ai strategiei.
Condiţia de oprire cond se referă, de obicei, la numărul maxim
de generaţii dar pot fi luate în consideraţie şi alte criterii, printre care
cele de mai jos:
1) diversitatea populaţiei a scăzut sub o anumită limită, semn că
ne aflăm în vecinătatea unui optim global; diversitatea poate fi
măsurată prin diferenţa calităţilor asociate celui mai bun individ şi
celui mai neperformant.
2) nu se mai obţin îmbunătăţiri semnificative ale funcţiei obiectiv
Mulţimea M poate lua una din valorile:
M  P (t ) pentru strategia    
M   pentru strategia  ,   .
3.3.4 Utilizarea mutaţiilor corelate
Fiecare individ din populaţie este caracterizat nu numai de
vectorul x al variabilelor ci şi de parametrul 
al strategiei.
Reprezentarea poate fi extinsă [72] introducând valorile de varianţă
cii   i2
(
1 i  n)
şi
valorile
de
covarianţă
c ij
28
( 1  i  n  1, i  1  j  n ) ale distribuţiei normale n  dimensionale
având densitatea de probabilitate
p( z ) 
det( A)
2 n
1
 z T Az
e 2
Pentru a asigura pozitiv-definirea matricei de covarianţă
A1  cij  se utilizează unghiurile de rotaţie  j în locul coeficienţilor
c ij . Acum un individ se va reprezenta sub forma a  x,  ,   unde
• xR
•
n
este vectorul variabilelor obiectiv
  Rn este vectorul deviaţiilor standard ale distribuţiei
normale; 1  n  n
•
    ,  n este vectorul unghiurilor ce definesc mutaţiile
n 

corelate ale lui x ; n   n   n  1 .
2 

Vectorul  permite căutarea în orice direcţie; altfel erau
favorizate direcţiile paralele cu axele sistemului de coordonate.
Parametrii  şi  ai strategiei se modifică prin mutaţie iar tipul
acestei operaţii depinde de valorile lui n şi n . Astfel, pentru
•
n  1 şi n  0 se obţine mutaţia standard, când o singură
valoare a deviaţiei standard controlează mutaţia tuturor componentelor
lui x .
29
• n  n
şi n  0 se obţine mutaţia standard, când valorile
 1 ,  ,  n controlează mutaţia componentelor corespunzatoare ale
vectorului x
• n  n
• n  2
şi n 
n(n  1)
se obţin mutaţiile corelate
2
şi n  n  1 : valoarea  12 este folosită pentru a efectua
căutarea într-o direcţie arbitrară iar  22 este utilizată pentru toate
direcţiile perpendiculare pe aceasta.
Mutaţia asupra unui individ
a  x1 , , xn ,  1 , ,  n , 1 , ,  n 
acţionează astfel:
 'i   i e ' N (0, 1)   N i (0, 1) , 1  i  n
 ' j   j    N j (0, 1) , 1  j  n
x'  x  cov( ,  )
unde vectorul cov este calculat astfel:
cov  Tz , z  z1 ,  , z n  ,


zi  N 0,  'i 2 ,
T
n 1 n
  Tpq  ' j ,
p 1 q  p 1
30
j
1
2n  p  p  1  2n  q .
2
 
Matricea de rotaţie T pq  ' j este matricea unitate exceptând
 
 
t pp  tqq  cos  j şi t pq  tqp   sin  j .
Pentru factorii  ,  ' şi  , Schwefel a sugerat următoarele valori:

c1
2 n
, '
c2
,   0.0873 ( 5o )
2n
3.4. Analiza convergenţei
Folosindu-se instrumente din teoria lanţurilor Markov s-au
obţinut condiţii suficiente de convergenţă în sens probabilist pentru
strategiile evolutive. Condiţii suficiente (nu şi necesare) simplu de
verificat sunt:
i)
repartiţia utilizată pentru mutaţie are suport infinit ( este
satisfăcută atât de repartiţia normală cât şi de repartiţia Cauchy);
ii)
selecţia este elitistă (strategiile de tip (    ) satisfac această
proprietate);
iii)
recombinarea se aplică cu o anumită probabilitate pr.
Din punct de vedere practic convergenţa în timp infinit nu este de
mare folos ci mai degrabă testează abilitatea algoritmului de a găsi
elemente din ce în ce mai bune prin trecerea de la o generaţie la alta
31
(algoritmul progresează în procesul de căutare). O situaţie nedorită
este aceea în care acest progres este stopat. Există două manifestări ale
acestui fapt:
• Convergenţa prematură. Algoritmul se blochează într-un optim
local datorită faptului că populaţia nu mai este suficient de diversă
pentru a susţine procesul de explorare.
• Stagnare. Algoritmul s-a blocat în condiţiile în care populaţia este
încă diversă, însă mecanismele evolutive nu sunt suficient de
puternice pentru a susţine explorarea.
Soluţionarea acestor probleme se bazează pe alegerea adecvată a
operatorilor şi a parametrilor de control; încă nu există rezultate
teoretice care să furnizeze soluţii de evitare a situaţiilor de
convergenţă prematură sau stagnare.
Studiul teoretic al vitezei de convergenţă se bazează pe estimarea
unor rate de progres în cazul unor funcţii test simple (funcţia sferă şi
perturbaţii ale acesteia). Prin estimarea ratei de progres s-au obţinut
informaţii referitoare la alegerea parametrilor de control astfel încât
rata să poată fi maximizată. Există diverse abordări şi diferite măsuri
ale progresului strategiilor evolutive. Din punct de vedere practic este
util faptul că strategiile evolutive au cel mult viteză liniară de
convergenţă.
În absenţa unei teorii complete a domeniului, multe dintre
proprietăţile şi regulile de proiectare sunt deduse pornind de la studii
32
experimentale. Acestea se efectuează pe probleme de optimizare
construite în aşa fel încât să ridice dificultăţi metodelor de rezolvare
(de exemplu cu multe minime locale sau cu un minim global greu de
atins din cauza prezenţei unor "platouri"). Multe dintre funcţiile de test
utilizate în analiza strategiilor evolutive au fost construite pentru a
testa metode tradiţionale de optimizare şi au ridicat dificultăţi pentru
acestea.
Cum strategiile evolutive implică prezenţa unor elemente
aleatoare, la rulări diferite ale algoritmului se vor obţine rezultate
diferite. Din acest motiv studiul experimental nu poate fi decât unul
statistic caracterizat prin faptul că se vor efectua mai multe rulări
independente şi se va determina frecvenţa situaţiilor în care strategia
a avut succes. Se consideră că strategia a avut succes atunci când cel
mai bun element întâlnit de-a lungul generaţiilor (sau cel mai bun
element din ultima generaţie) este suficient de apropiat de optim.
Studiile statistice sunt folosite pentru a analiza influenţa
operatorilor şi a parametrilor de control asupra eficacităţii strategiei
evolutive. Valoarea lor este limitată datorită faptului că rezultatele
obţinute pe funcţii test nu pot fi extrapolate pentru orice problemă de
optimizare. Coroborate însă cu rezultatele teoretice (obţinute pentru
funcţii test simple, cum este modelul sferei) au condus la criterii
euristice care au un oarecare succes în practică.
Din punct de vedere statistic prezintă interes valoarea medie şi
dispersia valorii optime descoperite la fiecare rulare.
33
3.5. Domenii de aplicabilitate
Câteva dintre aplicaţiile strategiilor evolutive sunt:
• Biologie şi biotehnologie: simularea evoluţiei proteinelor,
proiectarea lentilelor optice, optimizarea parametrilor unui model al
transmiterii semnalelor genetice bazat pe transcriere a ADN-ului,
optimizarea proceselor de fermentare.
• Chimie şi inginerie chimică: determinarea compoziţiei optimale de
electroliţi în procesele de galvanizare, minimizarea energiei clusterilor
în moleculele gazelor rare, estimarea parametrilor modelelor de
analiză cinetică a spectrelor de absorbţie, identificarea benzilor în
spectrele obţinute prin rezonanţă magnetică nucleară.
• Proiectare asistată de calculator: determinarea parametrilor unui
amortizor pneumatic de şocuri, optimizarea volumului unor construcţii
în vederea minimizării instabilităţii, determinarea formei optimale a
unor dispozitive, adaptarea parametrilor unor modele de tip element
finit pentru proiectarea optimală a structurilor, optimizarea eficienţei,
senzitivităţii şi lărgimii de bandă a convertoarelor cu ultrasunete,
proiectarea optimală a arcelor utilizate în dispozitivele de suspensie de
la vehicule.
• Fizică şi analiza datelor: determinarea configuraţiei optime a
defectelor în materialele cristaline, estimarea parametrilor în probleme
34
de dinamica fluidelor, determinarea stărilor stabile în sistemele
disipative.
• Procese dinamice, modelare şi simulare: optimizarea unui sistem
socio-economic complex, identificarea parametrilor unui model de
răspândire a unei infecţii virale.
• Medicină şi inginerie medicală: controlul optimal al protezelor,
identificarea parametrilor modelelor folosite în farmacologie.
•
Inteligenţă artificială: controlul inteligent al vehiculelor
autonome, determinarea ponderilor reţelelor neuronale.
3
PROGRAMARE EVOLUTIVĂ ŞI
PROGRAMARE GENETICĂ
Programarea evolutivă şi Programarea genetică lucrează cu
populaţii care nu mai sunt reprezentate prin şiruri binare sau reale, ca
în cazul algoritmilor genetici şi al strategiilor evolutive, ci prin
structuri mai complicate: programe, automate finite, etc. Din punct de
vedere al operatorilor folosiţi, programarea evolutivă este mai
apropiată de strategiile evolutive (foloseşte mutaţia ca operator
principal, încrucişarea fiind foarte rar sau deloc folosită) în timp ce
programarea genetică este mai apropiată de algoritmii genetici
(operatorul principal este cel de mutaţie).
3.1. Programare evolutivă
3.1.1. Generalităţi
Programarea evolutivă a fost iniţiată de Fogel [26, 28] cu
scopul de a genera un comportament inteligent pentru un sistem
artificial. Comportamentul inteligent se referă la abilitatea sistemului
216
de a realiza predicţii asupra mediului informaţional în care se află.
Sistemele
sunt
modelate
prin
automate
Turing
iar
mediul
informaţional este reprezentat printr-o succesiune de simboluri de
intrare.
Un automat Turing este un automat finit înzestrat cu o bandă
de ieşire cu următoarea funcţionare: aflându-se în starea p şi citind
simbolul de intrare x , va trece într-o altă stare q şi va înscrie pe
banda de ieşire un simbol y . Prin acest mecanism, automatul Turing
transformă o secvenţă de simboluri de intrare într-o secvenţă de
simboluri de ieşire.
Comportamentul automatului este considerat inteligent dacă
poate prezice simbolul următor. Populaţia este dată de diagramele de
tranziţie ale automatului iar gradul de adecvare al unui individ este cu
atât mai mare cu cât şirul de simboluri produs de automat este mai
aproape de un şir „ţintă”. Diagrama de tranziţie a unui automat finit
determinist este reprezentată printr-un multigraf orientat şi etichetat,
în care nodurile sunt etichetate cu stările automatului iar arcele
reprezintă tranziţiile şi sunt etichetate cu simbolul de intrare şi cel de
ieşire corespunzător.
Ca exemplu, să considerăm automatul din Figura 9.1, care
verifică dacă un şir de biţi conţine un număr par sau impar de poziţii
egale cu 1. Alfabetul de intrare este 0, 1 iar ieşirea unei tranziţii va fi
0 sau 1, după cum şirul conţine un număr par respectiv impar de cifre
217
egale cu 1. Mulţimea stărilor este mulţimea
par, impar iar
starea
iniţială este par .
1/1
0/0
par
0/1
impar
1/0
Figura 3.1
3.1.2. Funcţionarea automatului Turing
Populaţia este formată din   1 indivizi, fiecare fiind un
automat Turing. Să considerăm exemplul din figura următoare [9].
0/c
B
0/b
0/b
1/a
A
1/c
1/b
Figura 3.2
C
218
Automatul are stările S  A, B, C , alfabetul de intrare I  0, 1,
alfabetul de ieşire O  a, b, c. Tranziţia între două stări este data de
funcţia
 : S  I  S O
definită printr-o etichetă de forma i / o care apare pe o latură între
două stări s k şi s l , însemnând că
 sk , i   sl , o  ;
adică dacă maşina este în starea s k şi primeşte la intrare simbolul
i  I atunci ea trece în starea s l şi produce la ieşire simbolul o O .
Prin acest mecanism, automatul transformă un şir format din
simboluri de intrare ( interpretat ca mediul maşinii) într-un şir format
din simboluri de ieşire. Performanţa automatului în raport cu mediul
poate fi măsurată pe baza capacităţii predictive a ei: se compară
fiecare simbol de ieşire cu următorul simbol de intrare şi se măsoară
valoarea predicţiei cu ajutorul unei funcţii de câştig.
Paradigma programării evolutive a fost implementată de Fogel
lucrând cu o populaţie de   1 părinţi care generează  descendenţi
prin mutaţii asupra fiecărui părinte. Mutaţia a fost implementată ca o
schimbare aleatoare a componentelor automatului; o schimbare se
poate realiza în cinci moduri: schimbarea unui simbol de ieşire,
modificarea
unei
tranziţii,
adăugarea/eliminarea
unei
stări,
modificarea stării iniţiale.
Pentru fiecare individ al populaţiei se alege uniform şi aleator
unul din cei cinci operatori de mutaţie. Este, însă, posibil ca asupra
219
aceluiaşi individ să se aplice mai mulţi operatori de mutaţie, numărul
mutaţiilor putând să fie fix sau ales conform unei distribuţii de
probabilitate. După evaluarea descendenţilor se selectează cei mai
buni  indivizi dintre părinţi şi descendenţi; deci se efectuează o
selecţie de tip     .
Fogel nu a folosit încrucişarea, de aceea mulţi cercetători din
domeniul algoritmilor genetici au criticat metoda lui, considerând că
nu e suficient de puternică. Totuşi, rezultatele teoretice şi empirice din
ultimii 30 de ani au arătat că rolul mutaţiei în algoritmi genetici a fost
subestimat iar cel al încrucişării a fost supraestimat [3, 19, 21, 52].
3.1.3. Optimizare folosind programarea evolutivă
Variantele curente de programare evolutivă folosite în
probleme de optimizare cu parametri continui au multe lucruri în
comun cu strategiile evolutive, în special în privinţa reprezentării
indivizilor, al modului de efectuare a mutaţiei şi al autoadaptării
parametrilor [97].
Iniţial programarea evolutivă a lucrat cu spaţii mărginite
n
 ui , vi   R n , cu ui  vi .
i 1
Mai târziu domeniul de căutare a fost extins la I  R n , un individ
fiind un vector a  x  I . În [22] se introduce conceptul de
220
metaprogramare evolutivă, care presupune un mecanism de autoadaptare similar celui de la strategii evolutive. Pentru a încorpora
v  Rn , spaţiul indivizilor este extins la
vectorul varianţelor
I  R n  Rn . Funcţia de evaluare a  se obţine din funcţia obiectiv
f (x ) prin scalare la valori pozitive şi, eventual, prin impunerea unor
modificări aleatoare k ale parametrilor; deci (a)    f x, k  , unde
 este functia de scalare. În cazul programării evolutive standard
mutaţia transformă pe x în x' , x'  mut1,, n ,1,,  n ( x) , după regula
x'i  xi   i N i (0, 1)
 i   i  x    i
unde constantele de proporţionalitate  i şi  i sunt alese în funcţie de
problema de rezolvat; totuşi, de obicei se consideră  i  1 şi  i  0 ,
astfel că mutaţia devine
x'i  xi  ( x)  N i (0, 1) .
În cazul meta-programării evolutive individul a  ( x, v) se
transformă prin mutaţie în a'  mut (a)  ( x' , v' ) astfel:
x'i  xi  vi  N i (0, 1)
v'i  vi    vi  N i 0, 1
221
unde  are rolul de a asigura că v'i primeşte o valoare pozitivă.
Totuşi, dacă varianţa devine negativă sau zero atunci i se atribuie o
valoare mică   0 .
Se consideră că în programarea evolutivă se codifică mai
degrabă specii decât indivizi; şi cum încrucişarea nu acţionează la
nivelul speciilor, programarea evolutivă nu foloseşte acest tip de
operator. După crearea a  descendenţi din  părinţi, prin aplicarea
mutaţiei o singură dată asupra fiecărui părinte, se selectează 
indivizi din mulţimea părinţilor P (t ) reunită cu cea a descendenţilor
P ' (t ) . Se utilizează o variantă stochastică a selecţiei turneu cu
parametrul
q 1
care
constă
în:
pentru
fiecare
individ
ak  P(t )  P' (t ) se selectează aleator q indivizi din P (t )  P ' (t ) şi
se compară evaluarea lor cu cea a lui a k . Numărul wk  0, 1, , q al
indivizilor mai puţin performanţi decât a k constituie scorul lui a k .
Formal, acesta se poate scrie
 
1 dacă  ai    a i
wi   
j 10 altfel
q
unde indicii
 j 1, 2, , 2 sunt valori aleatoare uniforme,
calculate pentru fiecare comparare. După ce se efectuează această
operaţie pentru toţi cei 2  indivizi, ei se ordonează descrescător după
scorul wi , 1  i  2  , şi se aleg cei mai buni  indivizi care vor
forma generaţia următoare P (t  1) . Rezultă următorul algoritm
222
begin
t : 0


iniţializează P(0) : a1 (0), , a 0  I  ,
unde I  R n  Rn , ai  ( xi , vi ) i  1, , n
a1(0), , a 0
unde a j 0    f x j 0, k j 
evaluează P(0) :
while ( T Pt   true ) do
begin
aplică mutaţia: a'i (t ) : mut ai t  i  1, , 


a'1 (t ), , a' t  cu a' t     f x' t , k 
evaluează P' (t ) : a'1 (t ), , a'  t  calculând
i
i
i
selectează P(t  1) : turnq P(t )  P' (t )
t : t  1
end
end
Am evidenţiat anterior similaritatea dintre strategiile evolutive şi
programarea evolutivă. Există, totuşi, şi diferenţe, cele mai evidente
fiind la nivelul:
•
codificării: strategiile evolutive codifică indivizi iar programarea
evolutivă codifică specii
223
•
selecţiei: strategiile evolutive aleg cei mai buni indivizi dintre
părinţi şi descendenţi, pe când programarea evolutivă face această
alegere dintr-un număr de indivizi selectaţi anterior din populaţia
curentă reunită cu cea a descendenţilor.
Programarea evolutivă are numeroase aplicaţii, dintre care
amintim: optimizarea numerică continuă, dezvoltarea sistemelor de
clasificare, antrenarea reţelelor neuronale, proiectarea sistemelor de
control ce pot fi modelate prin automate finite, controlul deplasării
roboţilor.
3.2. Programare genetică
Programarea genetică
reprezintă o nouă direcţie în cadrul
calculului evolutiv, dezvoltată de către J. Koza [52] în jurul anilor
1990. Programarea genetică este, de fapt, o variantă a algoritmilor
genetici care operează cu populaţii constituite din „structuri de
calcul”, din acest punct de vedere fiind similară programării evolutive.
Structurile care constituie populaţia sunt programe care atunci când
sunt executate sunt soluţii candidat ale problemei. Programarea
genetică a fost dezvoltată iniţial cu scopul de a genera automat
programe care să rezolve (aproximativ) anumite probleme.
Ulterior aria de aplicaţii s-a extins către proiectarea evolutivă,
un domeniu aflat la intersecţia dintre calculul evolutiv, proiectarea
224
asistată de calculator şi biomimetism (subdomeniu al biologiei care
studiază procese imitative din natură). Programarea genetică urmează
structura generală a unui algoritm genetic, folosind încrucişarea ca
operator principal şi mutaţia ca operator secundar. Particularităţile
programării genetice sunt legate de modul de reprezentare a
indivizilor, fapt ce necesită şi alegerea adecvată a operatorilor.
3.2.1 Reprezentarea indivizilor
În programarea genetică indivizii sunt văzuţi nu ca o
succesiune de linii de cod ci ca arbori de derivare asociaţi
„cuvântului” pe care îl reprezintă în limbajul formal asociat limbajului
de programare utilizat. În practică se lucrează cu limbaje restrânse,
bazate pe o mulţime mică de simboluri asociate variabilelor şi o
mulţime de operatori; în aceste condiţii, orice program este de fapt o
expresie în sens general. Alegerea simbolurilor şi a operatorilor este
strâns legată de problema de rezolvat. Această alegere determină
esenţial rezultatele ce se vor obţine; nu există, însă, reguli generale
care să stabilească legătura dintre o problemă de rezolvat şi mulţimile
de simboluri şi operatori folosite, rolul important revenindu-i
programatorului.
Koza a propus ca modalitate de reprezentare scrierea prefixată
a expresiilor, care corespunde parcurgerii în preordine a arborelui de
structură al expresiei. Pentru a simplifica descrierea, considerăm că se
operează cu „programe” care sunt expresii ce conţin operatori
225
aritmetici, relaţionali şi logici precum şi apeluri ale unor funcţii
matematice. În acest caz, limbajul formal asociat este independent de
context şi fiecărui cuvânt (expresie) i se poate asocia un arbore de
descriere. Nodurile interioare ale arborelui sunt etichetate cu operatori
sau nume de funcţii iar cele terminale sunt etichetate cu nume de
variabile sau constante. De exemplu, expresia max( x  y, x  5  y )
va fi reprezentată prin
max
*
x
+
y
x
*
5
y
Figura 3.3
Mulţimea nodurilor interioare se numeşte mulţimea
funcţiilor
F  { f1 , f 2 ,, f n f } ; în exemplul nostru F  max,  , . Fiecare
funcţie
f i  F are aritatea (numărul argumentelor) cel puţin 1.
Funcţiile din F pot fi de diferite tipuri:
226
•
aritmetic: , , , /
•
matematic: sin, cos, exp, log
•
boolean: AND, OR, NOT
•
condiţional: if  then  else
•
repetitiv: for, while, repeat
Mulţimea nodurilor terminale din arborele de derivare se numeşte
muţimea
terminalelor


T  t1 , t 2 , , t nt ;
în
exemplul
nostru
T  x, y, 5. Mulţimile F şi T pot fi reunite într-un grup uniform
C  F  T , dacă se consideră că terminalele sunt funcţii de aritate
zero. Pentru ca programarea genetică să funcţioneze eficient,
mulţimile F şi T trebuie să respecte două condiţii [102]:
•
cea de închidere, adică fiecare funcţie din F este aptă să accepte ca
argument orice valoare sau tip de dată ce poate fi returnat de orice
funcţie din C ; această proprietate previne erorile din timpul rulării
•
cea de suficienţă, care cere ca funcţiile din C să poată exprima
soluţiile problemei, adică cel puţin o soluţie aparţine mulţimii tuturor
compunerilor posibile ale funcţiilor din C .
Câteva exemple de mulţimi C închise sunt:
•
C  AND, OR, NOT , x, y, true, unde x şi y sunt variabile
booleene
•
C  , , , x, y, 1, 0, cu x şi y variabile întregi
•
C  , , sin, cos, exp, x, y, cu x şi y variabile reale.
227
Există mulţimi pentru care proprietatea de închidere nu este
verificată; de exemplu:
•
C  , , , /, x, y, 1, 0, cu x şi y variabile reale, nu este închisă
deoarece se pot genera împărţiri prin zero, cum sunt
•
x x y
,
, etc
0 xx
C  , , sin, cos, log, x, y, cu x şi y variabile întregi, nu este
închisă deoarece se pot genera valori negative pentru logaritm; de
exemplu, log( x) .

Mulţimea C  , ,

, x poate fi închisă sau nu, în funcţie de
domeniul valorilor lui x . Închiderea poate fi forţată prin folosirea
funcţiilor „protejate”. Astfel de funcţii sunt:
dacă y  0
x 1

, operaţie ce va fi notată în continuare cu
y  x y dacă y  0
•
div
•
•
dacă x  0
0
log(x)  
log x  altfel
x
x .
Dacă nu vrem să folosim funcţii de protecţie atunci trebuie
redusă foarte mult adecvarea (fitness-ul) expresiilor nevalide;
problema este similară funcţiilor de penalizare din cazul problemelor
de optimizare.
Suficienţa este garantată numai pentru unele probleme, când
teoria sau alte metode ne spun că o soluţie poate fi obţinută
228
combinând elementele lui C . De exemplu, logica ne spune că
C  AND, OR, NOT , x, y permite implementarea oricărei funcţii
booleene, deoarece conţine o multime completă de conectori. Dacă C
nu este suficientă, programarea genetică poate doar să dezvolte
programe care să realizeze o aproximare cât mai bună. De exemplu,
mulţimea C  , , , /, x, 0, 1, 2 nu poate să dea decât o aproximare
pentru exp(x) , ştiind că aceasta este o funcţie transcedentală (nu poate
fi aproximată exact cu ajutorul unor expresii algebrice finite). În acest
caz programarea genetică nu poate decât să realizeze aproximări
algebrice finite de tipul
exp( x)  1
exp( x)  1  x
exp( x)  1  x 
x
2
exp( x)  1  x 
x2
1

x3
2 1 1 2  2
3.2.2 Populaţia iniţială
Arborii iniţiali sunt generaţi alegând aleator funcţii din C . De
exemplu, pentru C  , , , /, x, y, 0, 1, 2, 3 putem avea
229
*
*
+
2
x
2
3
x
*
*
-
x
/
/
2
3
x
-
1
x
+
y
0
y
Figura 3.4
Dimensiunea şi forma programelor iniţiale sunt controlate
selectând noduri din F şi T , în funcţie de adâncimea lor în arbore.
Arborii pot fi reprezentaţi şi ca liste de liste. De exemplu, pentru
primii doi arbori anteriori avem

2 x , *  x 3 2.
Din acest motiv, iniţializarea populaţiei în programarea genetică este
bazată, de obicei, pe proceduri recursive care returneaza liste.
Metoda „full” selectează noduri din F dacă adâncimea lor nu
depăşeşte o valoare maximă şi noduri din T în caz contrar. Această
tehnică duce la obţinerea unei populaţii iniţiale cu frunzele la acelaşi
nivel (Figura 3.5).
230
*
*
*
+
*
+
*
+
+
2
2
y
*
y
*
+
+
/
/
2
x
y
2
/
y
x
3
Figura 3.5
Metoda „grow” selectează noduri din C dacă adâncimea lor
este mai mică dacât o valoare minimă şi din T în caz contrar. Cum C
conţine şi elemente terminale, această metodă produce arbori iniţiali
de diferite forme şi adâncimi, ca în Figura 3.6
*
*
x
*
x
*
+
*
+
x
1
x
+
1
y
Figura 3.6
Metoda „ramped half and half” combină cele două metode
anterioare, pentru a da o mai mare diversitate populaţiei iniţiale. Ea
lucrează după următorul algoritm
231
for i  1 to max _ adâncime do
begin


50

generează 
 max _ adâncime  1 
%
din populaţie folosind metoda
„full” cu adâncimea maximă i


50

generează 
 max _ adâncime  1 
%
din populaţie folosind metoda
„grow” cu adâncimea maximă i
end
Metodele „full” şi „grow” pot fi implementate cu următoarea
procedură recursivă [102]
Generează_expresie( F , T , max_adâncime, metodă)
begin
if max _ adâncime  0
then
begin
selectează t  T
inserează t în arbore
end
else
begin
232
if metoda = full
then selectează f  F
else selectează f  F  T
inserează f în arbore
if f  F
then
begin
n : aritatea lui f
for i  1 to n do
generează_expresie( F , T , max_adâncime-1, metodă)
end
end
end
La apelul
generează _ expresie   / , x y 0 1 2 3, 3, full
se poate genera expresia
  
x 1 2 0 /  y 3 2 x  
care corespunde arborelui următor
233
*
+
/
-
*
x
1
2
-
+
0
y
3
2
Figura 3.7
La apelul
generează - expresie   1, x y 0 1 2 3, 3, grow
se generează expresia
 
3 x 2 / 1 y 
care corespunde arborelui
Figura 3.8
x
234
3.2.3 Operatori de evoluţie
Încrucişarea constă în selectarea aleatoare a punctului de
încrucişare (nod sau arc în arbore) în fiecare părinte şi schimbarea
subarborilor care încep de la punctul de încrucişare. Este indicat ca
punctele de încrucişare să nu fie selectate uniform aleator ci să fie
favorizate nodurile interioare; de exemplu, în 90% din cazuri se aleg
noduri neterminale. Este posibil ca numărul încrucişărilor să fie
controlat de probabilitatea de încrucişare. Un exemplu de încrucişare
este dat în figura următoare.
(a*b)*sin(c)
a*sin(c)
*
*
*
a
a
sin
sin
c
b
c
a*(b+c)
(a*b)*(b+c)
*
*
a
+
b
*
c
părinţi
a
+
b
descendenţi
Figura 3.9
b
c
235
Deşi este un operator secundar, mutaţia permite modificarea
structurilor arborescente în moduri în care încrucişarea nu o poate
face. În funcţie de efectul pe care îl are asupra structurii, există trei
variante de mutaţie:
•
mutaţia simplă: modifică eticheta unui nod selectat aleator
Figura 3.10
•
mutaţia de expandare: constă în înlocuirea unui nod terminal cu
un subarbore, construit după aceleaşi reguli dar care nu face parte
neapărat din populaţia curentă, aşa cum se întâmplă în cazul
încrucişării
+
a
(a+b)*sin(c)
(a+b)*sin(c+b)
*
*
sin
b
c
sin
+
a
b
+
c
Figura 3.11
b
236
•
mutaţia de reducere: constă în înlocuirea unui subarbore cu un nod
terminal
(a*b)*sin(c)
a*sin(c)
*
*
*
a
sin
b
a
c
sin
c
Figura 3.12
În general se poate alege un nod al arborelui şi subarborele
dominat de acel nod este înlocuit cu altul generat aleator. Astfel,
mutaţia poate fi văzută ca încrucişarea cu un arbore generat aleator.
3.2.4 Rularea programelor în Programarea
genetică
În programarea genetică programele sunt reprezentate prin
arbori sau liste; întotdeauna este posibil să transformăm o astfel de
structură într-un cod C, C++, Java, Lisp, etc. Dar, în majoritatea
cazurilor o asemenea operaţie este ineficientă deoarece
•
iniţial, majoritatea programelor au un fitness foarte mic şi vor
supravieţui puţine generaţii
237
•
multe programe (bune sau mai puţin bune) vor fi schimbate prin
încrucişare sau mutaţie şi vor trebui recompilate.
O abordare mai bună este de a interpreta programele, în loc de
a le compila. Unele limbaje de programare, cum este Lisp, au deja un
interpretor în mediul de dezvoltare a programelor. Pentru a-l utiliza
trebuie să ne asigurăm că sintaxa este corectă; de exemplu, să utilizăm
paranteze rotunde în locul celor pătrate. Pentru toate celelalte limbaje
de programare este necesară construcţia unui astfel de interpretor. A
interpreta un program înseamnă a-l parcurge în adâncime şi a evalua
toate nodurile începând cu frunzele. Parcurgerea în adâncime permite
evaluarea nodurilor numai după ce valorile argumentelor lor sunt
cunoscute. De exemplu,
5
x  1
-1
4
-
2
*
*
2
0
+
1
-1
2
1
x
2
-
-
x
x
2
Figura 3.13
În urma interpretării, valoarea nodului rădăcină este valoarea
programului. Aplicaţiile posibile ale programării genetice sunt multe
şi variate, singura problemă fiind definirea unei funcţii fitness
238
corespunzătoare. Tehnicile de scalare şi penalizare folosite în
algoritmi genetici sunt aceleaşi, cu diferenţa că pentru a decide dacă
un program este bun sau nu, trebuie executat odată sau de mai multe
ori cu date de intrare diferite sau în diferite contexte.
O clasă de probleme în care programarea genetică s-a dovedit
foarte utilă este regresia simbolică. Aceasta este o tehnică folosită
foarte des în interpretarea datelor şi constă în găsirea coeficienţilor
unei funcţii, obţinute ca o combinaţie de funcţii elementare astfel încât
aceasta să aproximeze cât mai bine o funcţie cunoscută prin valorile ei
în puncte date. Termenul „simbolic” semnifică faptul că nu suntem
interesaţi în găsirea parametrilor optimi (numere) ci a funcţiilor
optime (expresii, reprezentări simbolice).
Pentru a folosi programarea genetică în rezolvarea problemelor
de regresie simbolică este necesar:
•
să avem o mulţime de puncte date, unde fiecare punct reprezintă
valorile luate de anumite variabile la un anumit moment
•
să selectăm variabilele pe care le considerăm dependente de altele
•
să definim o funcţie fitness care să măsoare capacitatea fiecărui
program de a determina valorile variabilelor independente când sunt
date valorile celor dependente
•
să selectăm mulţimile adecvate de funcţii şi terminale; terminalele
trebuie să includă toate variabilele dependente şi poate şi altele iar
funcţiile trebuie selectate pe baza cunoştintelor despre domeniu.
Ca exemplu [102] să găsim expresia simbolică ce aproximează
cel mai bine mulţimea de date
239
xi , yi    1.0, 0.0,  0.9,  0.1629,  0.8,  0.2624, , 1.0, 4.0
care a fost generată folosind funcţia
y  f x   x  x 2  x 3  x 4 , x   1, 1 .
Parametrii programului genetic sunt
•
dimensiunea populaţiei = 1000
•
mulţimea funcţiilor = , , *, log, exp, sin, cos, div
•
mulţimea terminalelor = x
•
adâncimea maximă = 4
•
metoda de generare a populaţiei iniţiale = full
•
număr de generaţii = 50
•
probabilitatea de încrucişare = 0.7
•
pobabilitatea de mutaţie = 0
•
funcţia fitness =
 yi  eval prog, xi 
i
Câteva din programele optime obţinute la diverse generaţii sunt [102]:
•
la generatia 1
[+ [- [log [exp x] ] [+ [sin x] [- x x] ] ] [+ [exp [log x] ] [sin [log x]
]]]
cu fitnessul 8.20908
•
la generaţia 2
[* [+ [+ [+ x x] [div x x] ] x ] [log [exp [* x x] ] ] ]
cu fitnessul 7.0476
•
la generaţia 3
[* [log [- [sin x] [exp x] ] ] [+ [cos [* x x] ] [+ [+ x x] [cos x] ] ] ]
cu fitnessul 4.74338
240
•
la generaţia 6
[* [+ [+ [+ x [exp [log x] ] ] [div x x] ] x ] [log [exp x] ] ]
cu fitnessul 2.6334
Se observă că fitnessul descreşte, ceea ce înseamnă că programul
tinde spre găsirea combinaţiei optime de funcţii; de exemplu, la
iteraţia 26 se obţine fitnessul 0.841868.
Applied Soft Computing 4 (2004) 65–77
Extracting rules from trained neural network using
GA for managing E-business
A. Ebrahim Elalfi a,∗ , R. Haque b , M. Esmel Elalami a
a
Department of Computer Instructor Preparation, Faculty of Specific Education, Mansoura University, Mansoura, Egypt
b High Tech. International.com, Montreal, Que., Canada
Received 23 September 2002; received in revised form 13 August 2003; accepted 19 August 2003
Abstract
The ability to intelligently collect, manage and analyze information about customers and sellers is a key source of competitive
advantage for an e-business. This ability provides an opportunity to deliver real time marketing or services that strengthen
customer relationships. This also enables an organization to gather business intelligence about a customer that can be used
for future planning and programs.
This paper presents a new algorithm for extracting accurate and comprehensible rules from databases via trained artificial
neural network (ANN) using genetic algorithm (GA). The new algorithm does not depend on the ANN training algorithms
also it does not modify the training results. The GA is used to find the optimal values of input attributes (chromosome),
Xm , which maximize the output function ψk of output node k. The function ψk = f(xi , (WG1)i,j , (WG2)j,k ) is nonlinear
exponential function. Where (WG1)i,j , (WG2)j,k are the weights groups between input and hidden nodes, and hidden and
output nodes, respectively. The optimal chromosome is decoded and used to get a rule belongs to classk .
© 2003 Elsevier B.V. All rights reserved.
Keywords: E-business; Artificial neural network; Genetic algorithms; Personalization; Online shopping; Rule extraction
1. Introduction
E-commerce has evolved from consumers conducting basic transactions on the Web, to a complete
retooling of the way partners, suppliers and customers transact. Now one can link dealers and suppliers online, reducing both lag time and paperwork.
One can move procurement online by setting up an
extranet that links directly to vendors, cutting inventory carrying costs and becoming more responsive
to his/her customers. Also, you can streamline your
∗ Corresponding author.
E-mail address: ael alfi@hotmial.com (A.E. Elalfi).
financial relationships with customers and suppliers
by Web-enabling billing and payment systems.
Recent literature suggests that Internet and WWW
as a business transaction tool provides both firms and
consumers with various benefits including lower transaction cost, lower search cost, and greater selection of
goods [1].
The ability to provide content and services to individuals on the basis of knowledge about their preferences and behavior has become important marketing
tool [2].
A complete customer profile has two parts: factual
and behavioral. The factual profile contains information, such as name, gender, and date of birth that the
personalization system obtained from the customer’s
1568-4946/$ – see front matter © 2003 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2003.08.004
66
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
factual data. The factual profile also can contain information derived from the transaction data. A behavioral
profile models the customer’s actions and is usually
derived from transactional data.
Personalization begins with collecting customer
data from various sources. This data might include
histories of customers; web purchasing and browsing
activities, as well as demographic and psychological
information. After the data is collected, it must be
prepared, cleaned, and stored in data warehouse.
Real world data is dirty. Data cleaning including the
removal of contradictory and redundant data items and
the elimination of irrelevant attributes has been an important topic in data mining research development [3].
Extracting rules from a given database via trained
neural networks is important [4]. Although several algorithms have been proposed by several researchers
[5,6], there is no algorithm which can be applied to
any type of networks, to any training algorithm, and
to both discrete and continuous values [4]. A method
for extracting M-of-N rules from trained artificial neural networks (ANN) was presented by Setiono [5].
However, the algorithm was based on the standard
three—layered feed forward networks. Also the attributes of the database are assumed to have binary
values −1 or 1. Hiroshi had presented a decomposition algorithm that can be applied to multilayer ANN
and recurrent networks [6]. The units of ANN are approximated by Boolean functions. The computational
complexity of the approximation is exponential, and
so a polynomial algorithm was presented [7]. To reduce the computational complexity higher order terms
were neglected. Consequently the extraction of accurate rules is not guaranteed.
An approach for extracting rules from trained ANNs
for regression was presented [13]. Each rule in the extracted rule set corresponds to subregion of the input
space and a linear function involving the relevant input
attributes of the data approximates the network output
for all data samples in this subregion. However, the
method extracts rules from trained ANN by approximating the hidden activation function; h(x) = tanh(x)
by either three-piece or five-piece linear function. This
approximation yields to less accuracy and makes the
computation burdensome.
This paper presents a new algorithm for extracting
rules from trained neural network using genetic algorithm. It does not depend on the training algorithms of
ANN and does not modify the training results. Also
the algorithm can be applied on discrete and continuous attributes. The algorithm does not make any
approximation to the hidden unit activation function.
Additionally it takes into consideration any number
of hidden layers in the trained ANN.
The extracted rules can be used to define customer
profile in order to make easy online shopping.
2. Problem formulation
A supervised ANN uses a set of training examples
or records. These records include N attributes. Each
attribute, An (n = 1, 2, . . . , N), can be encoded into a
fixed length binary sub-string {x1 . . . xi . . . xmn }, where
mn is the number of possible values for an attribute
An . The element xi = 1 if its corresponding attribute
value exists, while all the other elements = 0. So, the
proposed number of input nodes, I, in the input layer
of ANN can be given by
I=
N
mn
(1)
n=1
The input attributes vectors, Xm , to the input layer can
be rewritten as
Xm = {x1 . . . xi . . . xI }m
(2)
where m = (1, 2, . . . , M), M is the total number of
input training patterns.
The output class vector, Ck (k = 1, 2, . . . , K), can
be encoded as a bit vector of a fixed length K as follows:
Ck {ψ1 . . . ψk . . . ψK }
(3)
where K is the number of different possible classes. If
the output vector belongs to classk then the element ψk
is equal to 1 while all the other elements in the vector
are zeros. Therefore, the proposed number of output
nodes in the output layer of ANN is K. Accordingly the
input and the output nodes of the ANN are determined
and the structure of the ANN is shown in Fig. 1. The
ANN is trained on the encoded vectors of the input
attributes and the corresponding vectors of the output
classes. The training of ANN is processed until the
convergence rate between the actual and the desired
output will be achieved. The convergence rate can be
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
67
Fig. 1. The structure of the ANN.
improved by changing the number of iterations, the
number of hidden nodes (J), the learning rate, and the
momentum rate.
After training the ANN, two groups of weights can
be obtained. The first group, (WG1)i,j , includes the
weights between the input node i and the hidden node
j. The second group, (WG2)j,k , includes the weights
between the hidden node j and the output node k. The
activation function used in the hidden and output nodes
of the ANN is a sigmoid function.
The total input to the jth hidden node, IHN j, is
given by;
IHNj =
I
xi (WG1)i,j
(4)
i=1
So, the final value of the kth output node, ψk , is given
by
ψk








= 


−





 1+e
1
J
j=1 WG2j,k
1/1+e
−
I x (WG1)
i,j
i=1 i
















(7)
The function, ψk = f(xi , (WG1)i,j , (WG2)j,k ) is an
exponential function in xi since (WG1)i,j , (WG2)j,k
are constants. Its maximum output value is equal
one.
The output of the jth hidden node, OHNj , is givenby
OHNj =
1+e
−
1
I
i=1 xi (WG1)i,j
(5)
The total input to the kth output node, IONk , is given
by
IONk =
J
j=1
(WG2)j,k
1+e
−
1
I
i=1 xi (WG1)i,j
(6)
Definition. An input vector, X m , belongs to a
classk iff ψk ∈ Cm = 1 and all other elements in
Cm = 0.
Consequently, for extracting relation (rule) between
the input attributes, Xm relating to a specific classk
one must find the input vector, which maximizes ψ k .
This is an optimization problem and can be stated as:
Maximize
68
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
ψk (xi )








= 


−





 1+e
1
J
j=1 WG2j,k
1/1+e
−
I x (WG1)
i,j
i=1 i
















(8)
Subjected to:
xi are binary values (0 or 1)
(9)
Since the objective function ψk (xi ) is nonlinear and the
constraints are binary so, it is a nonlinear integer optimization problem. The genetic algorithm (GA) can
be used to solve it. The following algorithm explains
how the GA can be used to obtain the best chromosome, which maximizes objective function ψk (xi ):
Begin
{
Assume the fitness function as ψk (xi )
Create a chromosome structural as follows:
{
Generate number of slots equal I, which represent input vector X.
Put a random value 0 or 1 in each slot}
G=0
where G is the number of generation.
Create the initial population, P, of T chromosomes,
P(t)G ,
where t = 1 to T.
Evaluate the fitness function according to P(t)G
while termination conditions not satisfied
Do {G = G + 1
Select number of chromosomes from P(t)G according to the roulette wheel procedure
Recombine between them using crossover and mutation;
Modify the population from P(t)G −1 to P(t)G
Evaluate the fitness function according to P(t)G
}
Display the best chromosome that satisfies the
conditions}
End
For extracting a rule belongs to classk the best chromosome must be decoded as follows:
• The best chromosome is divided into N segments.
• Each segment represents one attribute, An (n =
1, 2, . . . , N), and has a corresponding bits length
mn which represents their values.
• The attribute values are existed if the corresponding
bits in the best chromosome equal one and vice
versa.
• The operators “OR” and “AND” are used to correlate the existing values of the same attribute and the
different attributes, respectively.
• After getting the set of rules make rule refinement
and cancel redundant attributes, e.g. if an attribute
has three values such as A, B, and C and a rule
looks like:
If attk has value A or B or C then classk such
attribute can be dropped (redundant).
The overall methodology of rule extraction is shown
in Fig. 2.
3. Generalization for multiple hidden layers
The objective function obtained in Eq. (8) can be
generalized for ANN, which has more than one hidden
layer. Fig. 3 shows the ANN that includes three hidden
layers.
The function, ψk , in the final form for the kth output
mode is given by
ψk =
1+e
1
−
J
−A )(WG4)j k]
3
j3 =1 [1/(1+e
(10)
where

J

A=

j2 =1
1
1+e
−
J
j1
=1 1/1+e
−
I X WG1) j (WG2)j j
i 1
1 2
i=1 i

× (WG3)j2 j3 
(11)
Xi is the input values, where i = 1, 2, . . . , I. I is the
total number of nodes at input layer, j1 = 1, 2, . . . , J,
for first hidden layer; j2 = 1, 2, . . . , J, for second hidden layer; j3 = 1, 2, . . . , J, for third hidden layer; J
is the total number of nodes at each hidden layer; k =
1, 2, . . . , K; K is the total number of nodes at output
layer; (WG1)ij1 weights group between input layer i,
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
69
Separate database into input vectors (Xm) and
crossponding output vectors (Cm)
Database is coded as bit string
Iteration = 0.0
.
Input nodes
= 1 , 2 ,...................,
I
Hideen nodes = 1 , 2 ,..................., J
Output nodes = 1, 2 ,...................., K
Iteration =
Iteration +1
Structure ANN with random paramters
( learning coef. , momentum coef. )
No
Is the
iteration
reach max. ?
No
Is the error
satisfactory ?
Yes
Create another random paramters for
ANN
Yes
Extract the weight groups { (WG1) i,j & (WG2) j,k }
Create a set of general form for output function,
k
(xi )
k=1
Create an intial population
Iteration = 0.0
Maximize the fittness function
k (xi )
Evaluate the fittness function
Selection
Crossover
Mutation
Update the population
Iteration =
Iteration +1
No
Is the iteration
reach max. ?
Yes
Arrange the fittness function from up to down until to certain level.
Decode the crossponding population into equivalnt rules which meet the classk
k=k+1
No
Is k reach max. ?
Yes
Fig. 2. Overall flowchart for the proposed methodology.
Stop
70
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
Input attributes
vectors
1
0
X1
0
1
X2
1
0
XI
Input layer
i
Output layer
k
Hidden layer
j
(WG1) i,,j1 (WG2) j1,,j2 (WG3)
j1
j2
j2,,j3
Output class
vectores
(WG4) j3,k
j3
Class Class
S1
1
Sk
K
1
0
1
1
0
1
Class
1
0
0
0
0
0
1
0
0
0
0
0
1
XI
Fig. 3. ANN with three hidden layers.
and first hidden layer, j1 . (WG2)j1 j2 is the weights
group between first hidden layer, j1 , and second hidden layer, j2 . (WG3)j2 j3 is the weights group between
second hidden layer, j2 , and third hidden layer, j3 .
(WG4)j3 k is the weights group between third hidden
layer, j3 , and output layer, k.
4. Personalized marketing and customer retention
strategies
As organizations attempt to develop marketing and
customer retention strategies, they will need to collect
visitor’s statistics and integrate data across systems.
Additionally, there is a need to improve data about
inventories. Personalization is a relatively new field,
and different authors provide various definitions of the
concept [11]. Fig. 4 shows the stages of personalization as an iterative process [2].
A framework in order to identify individual user behavior by a system to make easy online shopping and
to maximize user satisfaction has presented [12]. It is
clear that the individual user behavior will act based
on his/her preferences, attitude and personality. Each
individual behavior such as preferences and attitudes
are different from the others. An individual activities
or expressions are monitored and captured by using
sensing devices. The individual user behaviors are recognized by pattern recognition systems (PRSs). The
intelligent agents are used to make system strategies
or plans based on the individual user behaviors and
product state; so that the system can act as per individual behaviors to make it easy online shopping.
• A proposed record for products and inventories can
have the following attributes: product name, color,
store size, city, month, quantity, quantity sold, profit.
• A record for factual data include: customer ID, customer name, gender, birth date, nationality.
Fig. 4. Stages of the personalization process.
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
Table 1
Example for target concept play tennis [8]
71
Table 3
Group of weights (WG1)i,j between input and hidden nodes
Day
Outlook
Temperature
Humidity
Wind
Play
tennis
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Strong
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
Input
nodes
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
Hidden nodes
H1
H2
H3
H4
−4.09699
6.154562
−0.82675
−0.42227
4.128692
−2.73254
−4.93463
5.282225
3.060052
−3.63009
3.741246
−4.56639
1.114981
0.2961
−3.07741
2.595217
4.005334
−4.36782
−3.11607
2.284223
−1.2106
0.349845
0.153325
−0.19704
−0.15498
−0.56767
−1.17037
0.235355
1.106763
−1.36338
−1.42853
1.109533
−0.47917
−0.55404
0.651919
−0.32539
−0.89697
0.616702
0.56799
−1.02158
Table 4
Group of weights (WG2)j,k output and hidden nodes
Output
nodes
• A record for transactional data may include the attributes: customer ID, date, time, store, product,
coupon used.
ψ1
ψ2
5. Illustrative example
Hidden nodes
H1
H2
H3
H4
−9.20896
9.22879
9.012731
−9.00487
−1.2113
0.773881
−0.90564
1.218929
classes vectors, Cm . The number of input nodes is
given by
A given database (has four attributes and two different output classes) is shown in Table 1 [8]. The
encoding values of the given database are shown in
Table 2. The ANN is trained on the encoding input
attributes vectors, Xm , and the corresponding output
I=
N
mn = m1 + m2 + m3 + m4 = 10
n=1
The number of output nodes is K = 2.
Table 2
Encoding database
i/p
patt.
Xm
Outlook, m1 = 3
Temperature, m2 = 3
Humidity, m3 = 2
Wind, m4 = 2
O/P patt.
Play tennis
Sunny
(x1 )
Overcast Rain
(x2 )
(x3 )
Hot
(x4 )
Mild
(x5 )
Cool
(x6 )
High
(x7 )
Norm
(x8 )
Weak
(x9 )
Strong
(x10 )
Cm
ψ1
No
ψ2
Yes
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
1
1
0
0
0
0
0
1
1
0
1
0
0
0
0
0
1
0
0
0
1
0
0
0
0
1
1
0
1
1
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
1
1
1
0
1
0
0
0
0
1
1
1
0
1
0
0
0
0
0
1
1
1
1
0
0
0
1
0
0
0
1
0
1
0
0
0
0
1
1
1
0
1
1
1
0
1
0
1
0
1
1
1
0
0
1
1
1
0
0
1
0
0
1
0
0
0
1
1
0
0
0
1
1
0
1
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
1
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
1
1
1
0
1
0
1
1
1
1
1
0
0
0
0
1
1
1
0
0
0
1
0
0
0
1
72
Rule no.
Fitness
Xi vector from GA
A1
A2
A3
Directly extracted rules (don’t play)
Rules refinement (don’t play)
A4
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
1
0.99988
1
0
0
1
1
1
1
0
0
1
If Outlook is Sunny And Temperature is
Hot or Mild or Cool And Humidity is
High And WIND is Strong
If Outlook is Sunny And Humidity is High
And WIND is Strong
2
0.999874
1
0
1
0
0
1
1
0
0
1
If Outlook is Sunny or Rain And
Temperature is Cool And Humidity is
High And WIND is Strong
If Outlook is Sunny or Rain And
Temperature is Cool And Humidity is High
And WIND is Strong
3
0.999867
1
0
0
1
1
1
1
0
1
1
If Outlook is Sunny And TEPERATURE
is Hot or Mild or Cool And Humidity is
High And WIND is weak or Strong
If Outlook is Sunny And Humidity is High
4
0.999849
0
0
1
0
0
1
1
1
0
1
If Outlook is Rain And Temperature is
Cool And Humidity is High or Normal
And Wind is Strong
If Outlook is Rain And Temperature is Cool
And Wind is Strong
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
Table 5
The rule extraction for class no (ψ1 is maximum)
Rule no.
Xi vector from GA
A1
A2
A3
Directly extracted rules (play)
Rules refinement (play)
A4
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
1
0.99998
0
1
1
0
0
0
0
0
1
0
If Outlook is Overcast or Rain And Wind
is Weak
If Outlook is Overcast or Rain And Wind
is Weak
2
3
0.999972
0.999960
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
If Outlook is Overcast
If Outlook is Sunny or Overcast And
Humidity is Normal
If Outlook is Overcast
If Outlook is Sunny or Overcast And
Humidity is Normal
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
Table 6
The rule extraction for class yes (ψ2 is maximum)
73
74
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
Table 7
RITIO induced rule set from Table 1 [9]
Rule no.
Rule
1
2
3
4
5
6
7
If
If
If
If
If
If
If
Outlook is Sunny And Humidity is High Then CLASS No
Outlook is Overcast And Humidity is High Then CLASS Yes
Humidity is Normal Then CLASS Yes
HUMIDITY is Normal And Wind is Weak Then CLASS Yes
Outlook is Rain And Humidity is High And Wind is Weak Then CLASS Yes
Outlook is Rain And Humidity is normal And Wind is Strong Then CLASS No
Outlook is Rain And Humidity is high And Wind is Strong Then CLASS No
The convergence rate between the actual and
the desired output is achieved by: 4 hidden nodes,
0.55 learning coefficient, 0.65 momentum coefficient
and 30,000 iterations. The allowable error equals
0.000001. Table 3 shows the first group of weights
(WG1)i,j between each input node and the hidden
nodes. The second group of weights (WG2)j,k between each hidden node and the output nodes is
shown in Table 4.
Applying the GA to solve the equation ψ1 in order
to get the i/p attributes vector which maximizes that
function.
The GA has population of 10 individuals evolving during 1300 generations. The crossover and the
mutation were 0.25 and 0.01 respectively. The output chromosomes of play and don’t play target classes
are sorted descendingly according to their fitness values. The threshold levels of the two target classes are
0.99996 and 0.999849, respectively.
Therefore, both the local and global maximum of
output chromosomes has been determined and will be
translated into rules. Tables 5 and 6 present the best
set of rules belonging to don’t play and play target,
respectively.
Table 7 shows RITIO induced set of rule for the
same database [9]. Although RITIO gives a good
indication of the algorithm stability over different
databases; the rule number 3 is not verified. The algorithm proposed here shows that all rules are verified.
6. Application and results
The MONK’S problems are benchmark binary classification tasks in which robots are described in terms
of six characteristics and a rule is given which specifies the attributes that determine membership of the
Table 8
The attributes and their values of MONK1’S database [10]
Robot characteristics (attributes)
Nominal values
Head shape
Body shape
Is smiling
Holding
Jacket colour
Has tie
Round, square, octagon
Round, square, octagon
Yes, no
Sword, flag, balloon
Red, yellow, green, blue
yes, no
target class [10]. The six attributes and their values are
shown in Table 8.
The two rules that determine the memberships of
the target class in the MONK1’S database are shown
in Table 9.
The ANN is trained on 123 input vectors, Xm . The
corresponding output classes vectors, Cm are shown
in Table 10. The number of input nodes, I = 17, and
the number of output nodes, K = 2. The convergence
rate between the actual and desired output is achieved
by: 6 hidden nodes, 0.25 learning coefficient, 0.85 momentum coefficient and 31,999 iterations. The allowable error equals 0.0000001.
Table 11 shows the first group weights (WG1)i,j
between each input node and the hidden nodes. The
second group weights (WG2)j,k between each hidden
node and the output nodes is shown in Table 12.
Table 9
Two rules satisfy the target
Rule 1
Rule 2
If Head Shape Value =
Body Shape Value
THEN
Robot is in Target Class
If Jacket Color = Red
THEN Robot is in
Target Class
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
75
Table 10
The MONK1’S database [10]
Xm
Head shape
Body shape
Is smiling
Holding
Jacket colour
Has tie
1
2
3
Round
Round
Round
Round
Round
Square
Yes
Yes
Yes
Sword
Flag
Sword
Green
Yellow
Green
Yes
Yes
Yes
1
2
3
Yes
Yes
No
4
Round
Octagon
Yes
Flag
Blue
Yes
4
No
55
56
57
58
Square
Square
Square
Square
Round
Square
Square
Octagon
Yes
Yes
Yes
No
Sword
Sword
Flag
Balloon
Green
Green
Red
Red
Yes
Yes
No
Yes
55
56
57
58
No
Yes
Yes
Yes
120
121
122
123
Octagon
Octagon
Octagon
Octagon
Round
Round
Octagon
Octagon
No
No
No
No
Sword
Balloon
Flag
Flag
Red
Yellow
Yellow
Green
Yes
No
No
No
120
121
122
123
Yes
No
Yes
Yes
Cm
Target
Table 11
Group of weights (WG1)i,j between each input and hidden nodes
Input nodes
Hidden nodes
H1
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
−5.08851
4.094656
2.711605
−2.9641
0.929943
3.494829
0.475753
0.358807
−0.10996
0.385337
−0.13311
−7.31878
2.941625
2.469945
2.658616
0.48247
0.878135
H2
−6.40872
−0.55311
7.121283
−7.48084
7.760751
0.138298
0.275564
0.269779
0.243966
−0.31376
−0.07916
12.26899
−3.99095
−4.1919
−3.47783
0.314717
0.340808
H3
H4
H5
H6
2.478146
2.24007
−2.49793
1.351769
−2.3443
2.217123
0.829914
0.623271
0.019956
0.989733
0.539239
−4.98723
1.822638
2.270769
2.435963
0.777509
0.315489
−0.53785
−1.00648
−0.15809
−0.69977
−0.53314
−0.63468
−1.09122
−1.23704
−0.29096
−0.58041
−1.02715
0.279794
−0.49974
−0.57977
−0.62123
−0.83715
−0.77439
3.331379
−6.64513
0.468151
−6.00667
5.33333
−1.24655
−1.47744
−1.61803
−1.02741
−0.54741
−0.74859
−4.79433
0.666357
0.686182
1.15922
−1.61191
−2.04905
−1.01267
−0.53136
−0.3962
−0.18359
−0.2059
−0.4458
−0.8716
−0.92063
0.006704
−0.50737
−0.77975
0.471633
−1.03168
−1.01134
−0.59382
−0.56232
−1.21304
Table 12
Group of weights (WG2)j,k between each hidden and output nodes
Output nodes
1
2
Hidden nodes
H1
H2
H3
H4
H5
H6
13.3740
−13.37457
−14.5207
14.52426
−6.48067
6.48808
−0.40159
0.07072
11.70462
−11.7054
−0.52939
0.33697
76
Table 13
The set of rules belongs to target class
Fitness
Xi vector from GA
A1
A2
A3
A4
A5
Directly extracted rules
Rules refinement
If Jacket color is Red
If Head Shape is Octagon AND
Body Shape is Octagon AND Is
Smiling is Yes OR No AND
Holding is Sword OR Flag OR
Balloon AND Jacket Color is
Red OR Yellow OR Green OR
Blue
If Head Shape is Square AND
Body Shape is Square AND Is
Smiling is Yes OR No AND
Holding is Sword OR Flag OR
Balloon AND Has Tie is Yes
OR No
If Head Shape is Round AND
Body Shape is Round AND Is
Smiling is Yes OR No AND
Holding is Sword OR Flag OR
Balloon
If Jacket color is Red
If Head Shape is Octagon
AND Body Shape is Octagon
A6
1
2
0.9999
0.99947
x1
0
x2
0
x3
1
x4
0
x5
0
x6
1
x7
1
x8
1
x9
1
x10
1
x11
1
x12
1
x13
1
x14
1
x15
1
x16
0
x17
0
3
0.99946
0
1
0
0
1
0
1
1
1
1
1
0
0
0
0
1
1
4
0.99845
1
0
0
1
0
0
1
1
1
1
1
0
0
0
0
0
0
If Head Shape is Square
AND Body Shape is Square
If Head Shape is Round
AND Body Shape is Round
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
Rule no.
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77
Table 14
Accuracy results for different algorithms [9]
Database
MONK1’S
HCV (%)
C4.5 (%)
RITIO (%)
C4.5 rules (%)
Proposed algorithm (%)
100
83.3
97.37
100
100
The GA has a population of 10 individuals evolving
during 1225 generations. The crossover and mutation
are 0.28 and 0.002, respectively. The output chromosomes for target class are sorted according to their fitness values until the level 0.99845. Table 13 presents
the best set of rules, belongs to target class according
to the fitness values.
From Table 13, the rules extracted from the proposed algorithm and the standard rules given in Table 9
are identical. This shows a good indication of the algorithm stability. The accuracy of the proposed algorithm
among different algorithms for MONK1’s database is
shown in Table 14 [9].
The discovered rules for hypothetical individual
person data and the products are in the following
format:
IF PRODUCT = Hat THEN Profit = Medium.
IF Color = Blue THEN Profit = High.
IF MONTH = June THEN Profit = Medium.
IF MONTH = December THEN Profit = High.
7. Conclusions
A novel machine learning algorithm for extracting
comprehensible rules have been presented in this paper. It does not need the computational complexity as
deterministic finite state automata (DNF) algorithm.
It takes all input attributes into consideration so it
produces an accurate rules but other algorithms such
as DNF uses only the input attributes up to certain
level. Also, it uses only part of weights to extract rules
belongs to certain class. So it has a less computational
time compared with another algorithms. The proposed
77
methodology does not make any approximation to the
activation function.
The user profile information stored in a database
along with a unique user ID and password. A data
warehouse repository with such data can be analyzed.
This algorithm can help devise rules to govern which
messages are offered to the an anonymous prospect,
how to counter points of resistance, and when to attempt to close a sale.
The future work should consist of more experiments
with other data sets, as well as more elaborated experiments to optimize the GA parameters of the proposed
algorithm.
References
[1] J. Jhang, H. Jain, K. Ramamurthy, Effective design of
electronic commerce environments: a proposed theory of
congruence and an illustration, IEEE Trans. Systems Man
Cybernet. Part A: Syst. Hum. 30 (4) (2000) 456–471.
[2] G. Adomavicius, A. Tuzbilin, Using data mining methods to
build customer profiles, IEEE Comput. 34 (2) (2001) 74–82.
[3] X. Wu, D. Urpani, Induction by attribute elimination, IEEE
Trans. Knowl. Data Eng. 11 (5) (1999) 805–812.
[4] H. Tsukimoto, Extracting rules from trained neural networks,
IEEE Trans. Neural Networks 11 (2) (2000) 377–389.
[5] R. Setiono, Extracting M-of-N rules from trained neural
networks, IEEE Trans. Neural Networks 11 (2) (2000) 512–
519.
[6] F. Wotawa, G. Wotawa, Deriving qualitative rules from
neural networks—a case study for ozone forecasting AI
communications, vol. 14, 2001, 23-33 ISSN 0921-7126, ©
2001, IOS Press.
[7] H. Tsukimoto, Extracting rules from trained neural networks,
IEEE Trans. Neural Networks 11 (2) (2000) 377–389.
[8] Tom M. Mitchell, Machine Learning Book, Copyright 1997.
[9] Xindond Wu, D. Urpani, Induction by attribute elimination,
IEEE Trans. Knowl. Data Eng. 11 (5) (1999).
[10] http://www.cse.unsw.edu.au∼cs3411/C4.5/Data.
[11] Comm. ACM, Special Issue on Personalization, vol. 43, no.
8, 2000.
[12] A.E. El-Alfy, R. Haque, Y. Al-Ohali, A framework to
employ multi AI systems to facilitate easy online shopping,
http://www-3.ibm.com/easy/eou ext.nsf/Publish/2049.
[13] R. Setiono, W.K. Leow, J.M. Zurada, Extraction of rules
from artificial neural networks for nonlinear regression, IEEE
Trans. Neural Networks 13 (3) (2002) 564–577.
Download