Research Center for Artificial Intelligence Department of Computer Science University of Craiova, Romania Rules-Based Reasoning Under Uncertainty and Imprecision Ion Iancu Editura Universitaria 2010 Referenţi ştiinţifici: Prof. dr. George Georgescu, Universitatea din Bucureşti Prof. dr. Dumitru Buşneag, Universitatea din Craiova 2 CONTENTS Preface…….…………….……………………………………………5 1 Fuzzy sets…………………………..………………..……….......9 1.1 Basic notions…………………….............................................9 1.2 Operations on fuzzy sets........................................................11 1.3 Fuzzy numbers……………...................................................28 1.4 Fuzzy relations........................................................................35 2 Uncertainty.........................................................….……………39 2.1 Possibility and necessity measures …….......…….………….39 2.2 Belief and plausibility functions…...………………………..46 2.3 Dempster’s rule of combination….………………………….50 2.4 Approximations of basic assignment probability……………54 2.5 Algorithms for generalized belief functions…………………63 3 Uncertain and imprecise knowledge representation..………..75 3.1 Linguistic representation………... …….......…….………….75 3.2 Facts representation…………...……………………………..80 3.3 Implications………………………………………………….82 3.4 Rules representation…………………………………………86 4 Reasoning with imprecise and/or uncertain rules.....………..91 4.1 Generalized Modus Ponens rule……………...….…………..91 4.2 Uncertain Generalized Modus Ponens reasoning…..………101 4.3 Uncertain hybrid reasoning………………………………...106 4.4 Fuzzy logic control..………………………………………..113 4.5 Extended Mamdani fuzzy logic controller…………………120 4.5.1 The proposed model……………………………….121 4.5.2 An application…………………………………..…130 4.6 Fuzzy classifier system…………………………………… 134 References........................................................................................ 143 3 4 PREFACE Many rule-based expert systems or applications of Artificial Intelligence (AI) are based on two valued logic. However, classifications in the real world often do not have these sharp boundaries. For instance the characteristic of intelligence, in many cases, is only true to a degree. Classical two-valued logic is not designed to deal with properties that are a matter of degree. The multi-valued logic, in which an attribute can be possessed to a degree from [0, 1], solved some of these vagueness issues. In the 1930's quantum the philosopher Max Black drew the first fuzzy-set membership functions and called the uncertainty of the structures vagueness. In 1951, Menger coined the term ensemble flou which has become the French counterpart of fuzzy set, with flou meaning hazy, vague, fuzzy. But multi-valued logic systems were not used extensively because they did not go far enough. A very important event was given by the landmark paper by Zadeh (1965), in which he presented enough mathematical theory in order to work with the concept of fuzzy sets. The main difference between Zadeh’s fuzzy logic and multivalued logic consists in the fact that in fuzzy logic one can deal with fuzzy quantifiers like very, few, most, where fuzzy logic truth itself can be fuzzy. It is clear that fuzzy logic provides a system which is flexible enough to serve as a framework for linguistic control. When rule-based AI systems were first conceived in the middle of 1950's they were supposed to provide the ability to simulate human decision in an uncertain environment. In a typical rule-based AI system, the knowledge is acquired, stored and processed as facts and rules, not numerical entities. The collection of rules are defined as the knowledge base. The framework of the if − then rules is chained through, firing a premise only if it is true. Knowledge is searched through logical paths of the knowledge tree via an 5 inference process. These systems exploit structured knowledge when they are available, but in most cases the experts are unable to define the propositional rules in the format required to approximate the behavior of an expert . Fuzzy systems have been introduced in the middle of 60's and have the ability to learn the system knowledge, using numerical or linguistic data as input, and produce an estimation of the input-output relationships. The fuzzy system fires each fuzzy rule in parallel, but with different degrees, and then infers a conclusion from a weighted combination of the consequence from each fired rule. The main purpose of fuzzy logic systems is to deal with systems which are inherently fuzzy: the processes are too complicated to be fully understood in terms of exact mathematical models and therefore incapable of being controlled in a precise way using classical control techniques. Fuzzy logic systems have been demonstrating, since the middle of 70's (Mamdani & Assilian, 1975), the ability to deal with the goals and constraints that are required by these ill-defined fuzzy systems. A problem area is a candidate for fuzzy logic if (Pedrycz, 1983): (1) the considered system is complex or ill-defined (2) there are major difficulties for creating an exact mathematical model (3) there are extensive experience and intuition available from process operators (4) lack of measurements, due to costs or noise, makes it impossible to apply conventional statistical and/or control methods Fuzzy systems can be of two types: the more common rule-based systems, or, relational-based system, which permits numerical analysis. In rule-based fuzzy systems a series of rules are developed that equate the fuzzy input membership functions to the fuzzy output membership functions. The rules can be formulated using the same if - then rules, typical of expert systems, or by using look-up-tables, which consolidate all the rule-base information. 6 Rule-based systems are most commonly used in applications of fuzzy control. Other names for the if- then rules are production, premiseaction or antecedent-consequent rules. The rules describe in qualitative terms how the controller output will behave when it is subjected to various inputs. The consequent part of the rule assigns a value to the output set, based on the conditional part of the statement. The degree of this assignment modifies the value of the output membership by applying it to the degree of truth for the conditional expression. Each rule produces a fuzzy output set and the union of these sets is the overall output. The biggest problem with rule-based systems is to obtain the appropriate rules and then to ensure that the rules are consistent and complete (Graham et al., 1988). Adaptive techniques are available for some types of applications which allow the rule-based systems to learn and to self-modify (Graham et al., 1989). Also, genetic algorithm techniques allow the process to self-generate the rules (Karr, 1991a, 1991b; Nakashima, 2000). Professor Lotfi Zadeh introduced the possibility theory in 1978 as an alternative to the probability theory and an extension of his theory of fuzzy sets and fuzzy logic. D. Dubois and H. Prade further contributed to its development, in a series of papers. Approximate reasoning, based on fuzzy set theory and possibility theory, provides several techniques to reason with fuzzy and uncertain concepts in knowledge-based systems. Applying fuzzy techniques in knowledge-based systems can provide a knowledge representation and inference which is closer to the humans reasoning rather than the conventional knowledge-based systems based on “classical logic”. This comes very close to the field of natural language understanding and processing Fuzzy logic and approximate reasoning enable us to model human reasoning by means of computer implementations. Because the uncertainty and the imprecision cannot be ignored when modeling human expertise, we present some possibilities to work with imprecise and/or uncertain knowledge in inferential processes. The paper is organized as follows: 7 • The chapter “Fuzzy Sets” contains the basics of fuzzy set theory that are necessary for a correct understanding of the rest of this paper. We present the fuzzy set definition and representation, operations with fuzzy sets using the Zadeh’s formulas and t-norm, t-conorm and negation operators. Also, a special type of fuzzy sets, referred to as fuzzy numbers, is described further on restricted to LR representation. Finally, some notions about fuzzy relations: definition, operations and the composition of a fuzzy relation with another fuzzy relation or a fuzzy set are detailed. • The chapter “Uncertainty” describes two possibilities to quantify the uncertainty: the possibility and necessity measures, denoted as Π and N , and the belief and plausibility functions, referred to as Bel and Pls , respectively. • The chapter “Uncertain and Imprecise Knowledge Representation” is dedicated to knowledge representation that will be used in inferential methods from the last part of this chapter, such as: linguistic variables, linguistic hedges, canonical form of an elementary proposition, rules representation by fuzzy implications. • The last chapter presents some methods for reasoning and their applications in fuzzy logic control and fuzzy classification. Because the analyzed domain of this book is very large, the proofs of the theorems are omitted in order to present a high amount of theoretical concepts. The corresponding proofs can be found in the works included in the references. This volume is addressed to graduate students and to each person interested in approximate reasoning and its applications. 8 1 FUZZY SETS 1.1 BASIC NOTIONS A classical (crisp) set is defined as a collection of elements x ∈ X . An element can either belong or not belong to a set A, A ⊆ X . Such a classical set can be described in different ways: one can enumerate its elements, one can use the analytical representation (for instance, A = {x / x ≤ 5}) or the membership (characteristic) function. The characteristic function χ A of a subset A ⊆ X is a mapping χ A : X → {0,1} , where the value zero is used to represent the non-membership and the value one is used to represent the membership. The truth and falsity of the statement “ X is in A ” is determined by the ordered pair ( x , χ A ( x )) : the statement is true if the second element of the ordered pair is 1 and the statement is false if it is 0. Fuzzy sets were introduced by Zadeh (1965) in order to represent and manipulate data that was not precise, but rather fuzzy. Similarly with the crisp case, a fuzzy subset A of a set X is defined as a collection of ordered pairs with the first element from X and the second element from the interval [0, 1] ; the set X is referred to as the universe of discourse for the fuzzy subset A . 9 Definition 1.1. (Zadeh, 1965) If X is a nonempty set then a fuzzy set A X is defined by its membership function μ A : X → [0, 1] , where in μ A (x ) represents the membership degree of the element x in the fuzzy set A ; then A is represented as A = {( x , μ A ( x )) / x ∈ X } . Example 1.1. We can define the set of natural numbers “close to 1” by A = {(− 2 , 0.25) , (− 1, 0.5) , (0 , 0.75) , (1, 1) , (2 , 0.75) , (3, 0.5) , (4 , 0.25)} or by μ A (x ) = 1 . 2 1 + ( x − 1) If A is a fuzzy set in X then we often use the notation n A = μ A ( x1 ) / x1 + μ A ( x 2 ) / x 2 + L + μ A ( x n ) / x n = ∑ μ A ( xi ) / xi i =1 and A = μ A ( x1 ) / x1 + μ A ( x 2 ) / x 2 + L = ∫X μ A ( x ) / x for discrete case and continuous case, respectively. Definition 1.2. A fuzzy subset A of a classical set X is called normal if there exists x ∈ X such that μ A ( x ) = 1 ; otherwise A is subnormal. A nonempty fuzzy set A can be always normalized by dividing μ A (x ) with sup μ A ( x ) . x Definition 1.3. Let A be a fuzzy subset of X ; the support of A , denoted with supp ( A) , is the crisp subset of X given by supp ( A) = {x ∈ X / μ A ( x ) > 0}. 10 Definition 1.4. An α -level set or α -cut of a fuzzy set A of X is a non- fuzzy set Aα defined by ⎧{x ∈ X / μ A ( x ) ≥ α } Aα = ⎨ ⎩cl (supp ( A)) if if α >0 α =0 where cl (supp ( A)) is the closure of the support of A ; the 1-level set of A is named core of A . Example 1.2. (Fuller, 1995, 1998) For X = {− 2 , − 1, 0 , 1, 2 , 3, 4} and A = 0 .0 / − 2 + 0 .3 / − 1 + 0 .6 / 0 + 1 .0 / 1 + 0 .6 / 2 + 0 .3 / 3 + 0 .0 / 4 we have ⎧{− 1, 0 , 1, 2 , 3} if 0 ≤ α ≤ 0.3 ⎪ Aα = ⎨{0 , 1, 2} if 0.3 < α ≤ 0.6 ⎪{1} if 0.6 < α ≤ 1 ⎩ 1.2 OPERATIONS ON FUZZY SETS The classical set of theoretical operations from the ordinary set theory can be extended, in different ways, to fuzzy sets via their membership functions. The basic operations were suggested by Zadeh (1965) . Let A and B are fuzzy subsets of a nonempty (crisp) set X . Definition 1.5. The intersection of A and B is defined as μ A∩ B ( x ) = min{μ A ( x ), μ B ( x )}, ∀x ∈ X Definition 1.6. The union of A and B is defined as μ A∪ B ( x ) = max{μ A ( x ), μ B ( x )}, ∀x ∈ X 11 Definition 1.7. The complement ¬A of a fuzzy set A is defined as μ ¬A ( x ) = 1 − μ A ( x ) Example 1.3. (Zimmermann, 1991) Let A and B be two fuzzy subsets of X = {1, 2 , L , 10} A = {(1, 0.2 ) , (2 , 0.5) , (3, 0.8) , (4 , 1) , (5, 0.7 ) , (6 , 0.3)} B = {(3, 0.2 ) , (4 , 0.4 ) , (5, 0.6 ) , (6 , 0.8) , (7 , 1) , (8, 1)}. Then C = A ∩ B = {(3, 0.2 ) , (4 , 0.4 ) , (5, 0.6 ) , (6 , 0.3)} D = A ∪ B = {(1, 0.2 ) , (2 , 0.5) , (3, 0.8) , (4 , 1) , (5, 0.7 ) , (6 , 0.8) , (7 , 1) , (8, 1)} ¬B = {(1, 1) , (2 , 1) , (3, 0.8) , (4 , 0.6 ) , (5, 0.4 ) , (6 , 0.2 ) , (9 , 1) , (10 , 1)} . Example 1.4. (Zimmermann, 1991) We consider the fuzzy sets A = „ x is considerable larger than 10“ B = „ x is approximately 11“, given by their membership functions ⎧⎪0 ⎪⎩ 1 + ( x − 10)− 2 μ A (x ) = ⎨ ( and ( ) −1 μ B ( x ) = 1 + ( x − 11)4 Then −1 x ≤ 10 for x > 10 . [( ) , (1 + (x − 11) ) ] [( ) , (1 + (x − 11) ) ] ⎧⎪min 1 + ( x − 10)− 2 μ A∩ B ( x ) = ⎨ ⎪⎩0 μ A∪ B ( x ) = max 1 + (x − 10)−2 12 ) for −1 −1 4 −1 4 −1 for x > 10 for x ≤ 10 for x ∈ X . Figure 1.1 Intersection and union of two fuzzy sets Another operations with fuzzy sets are the algebraic operations. Some of these are presented below. Definition 1.8. Let A1 , L , An be fuzzy sets in X 1 , L , X n ; the Cartesian product is a fuzzy set in the product space X 1 × L × X n with the membership function μ A ×L× A (x ) = min{μ A ( xi ) / x = (x1 , L , x n ), xi ∈ X i }. 1 n i i Definition 1.9. The m -th power of a fuzzy set A is defined by μ A ( x ) = [μ A ( x )] m , x ∈ X . m Definition 1.10. The algebraic (probabilistic) sum C = A + B is defined as C = {( x ,μ A+ B ( x )) / x ∈ X } where μ A+ B ( x ) = μ A ( x ) + μ B ( x ) − μ A ( x ) × μ B ( x ) . Definition 1.11. The bounded sum C = {( x ,μ A⊕ B ( x )) / x ∈ X } , C = A⊕ B is defined by μ A⊕ B ( x ) = min{1, μ A (x ) + μ B (x )} . Definition 1.12. The bounded difference C = AΘB C = {( x ,μ AΘB ( x )) / x ∈ X } , is given by 13 μ AΘB ( x ) = max{0, μ A ( x ) + μ B ( x ) − 1} . Definition 1.13. The algebraic product C = A ∗ B is defined as C = {( x ,μ A∗B ( x )) / x ∈ X } , μ A∗B ( x ) = μ A ( x ) × μ B ( x ) . Example 1.5. (Zimmermann, 1991) Let A = {(3, 0.5), (5, 1), (7 , 0.6 )} and B = {(3, 1), (5, 0.6 )} . Then, according to the above definitions, we obtain A × B = { [(3, 3),0.5] ,[(5, 3),1], [(7 , 3) , 0.6], [(3,5) , 0.5], [(5, 5) , 0.6],[(7 , 5), 0.6] } A 2 = {(3, 0.25) , (5, 1) , (7 , 0.36)} A + B = {(3, 1) , (5, 1) , (7 , 0.6 )} A ⊕ B = {(3, 1) , (5, 1) , (7 , 0.6 )} AΘB = {(3, 0.5) , (5, 0.6 )} A ∗ B = {(3, 0.5) , (5, 0.6 )}. Definition 1.14. The inclusion and the equality operations are defined by A = B ⇔ μ A ( x ) = μ B ( x ) ∀x ∈ X A ⊆ B ⇔ μ A ( x ) ≤ μ B (x ) ∀x ∈ X . Another way to define intersection and union of two fuzzy sets was addressed by Bellman and Giertz in 1973, by interpreting the intersection as “logical and” and union as “logical or”. There are used triangular norms and triangular conorms in order to model logical connectives “and” and “or”, respectively. Triangular norms and conorms were introduced by Schweizer and Sklar (1960) in order to model the distances in probabilistic metric spaces. Definition 1.15. A function T : [0, 1]× [0 , 1] → [0 , 1] is a t-norm iff it is commutative, associative, non-decreasing and T (x , 1) = x ∀x ∈ [0 , 1] . A continuous t-norm T is called Archimedean if T ( x, x ) < x ∀x ∈ (0, 1) . 14 The most important t-norms are: • Minimum: Tm ( x , y ) = min{x , y} • Lukasiewicz: TL ( x , y ) = max{x + y − 1, 0} • Probabilistic: T p (x , y ) = xy • ⎧min{x , y} if max{x , y} = 1 Weak: Tw (x , y ) = ⎨ otherwise ⎩0 Definition 1.16. A function S : [0 , 1]× [0 , 1] → [0 , 1] is a t-conorm iff it is commutative, associative, non-decreasing and S (x , 0 ) = x ∀x ∈ [0 , 1] . A continuous t-conorm S is called Archimedean if S ( x , x ) > x ∀x ∈ (0 , 1) . The basic t-conorms are: • Maximum: S m ( x , y ) = max{x , y} • Lukasiewicz: S L (x , y ) = min{x + y , 1} • Probabilistic: S p (x , y ) = x + y − xy • ⎧max{x , y} if min{x , y} = 0 Strong: S s ( x , y ) = ⎨ otherwise ⎩1 The names weak t-norm and strong t-conorm result from the following inequalities: Tw ( x , y ) ≤ T ( x , y ) ≤ min{x , y} max( x , y ) ≤ S ( x , y ) ≤ S s {x , y} for every t-norm T and t-conorm S . In addition, for every t-norm T and t-conorm S , T (0 , 0 ) = S (0 , 0) = 0 , T (1, 1) = S (1, 1) = 1 . Definition 1.17. A strong negation is an involutive decreasing function from [0 , 1] into itself. 15 The relation between a t-norm T and a t-conorm S , via a strong negation , is given by the next theorem. Theorem 1.1. (Alsina, Trillas & Valverde, 1980) If T is a t-norm and C is a strong negation then (( )) (( )) S ( x , y ) = C T C( x ) , C ( y ) is a t-conorm and reciprocally, T ( x , y ) = C S C( x ) , C ( y ) ; namely, T and S are C-dual. Ling (1965) proved that any continuous Archimedean t-norm can be written in the form T ( x, y) = f ( −1) ( f ( x) + f ( y) ) where f : [0 ,1] → [0 , ∞ ) is a strict decreasing continuous function and f ( −1) is the pseudo-inverse of f defined by ⎧1 ⎪ f (−1) ( x ) = ⎨ f −1 ( x ) ⎪0 ⎩ if x ∈ [0 , f (1)] if x ∈ [ f (1), f (0)] if x ∈ [ f (0 ), ∞ ). A representation for strong negations was given by Trillas (1979) in the form C ( x ) = t −1 (t (1) − t ( x )) where t : [0,1] → [0, ∞ ) is a continuous and strictly increasing (or decreasing) function with t (0 ) = 0 (or t (0 ) = 1 ) and t (1) is a finite number. Instead of ''+'' and ''- '' from Ling’s and Trillas’s formulas we can use general operations obtaining, in this way, t-norms and negations on a easier way. 16 f : [0 , 1] → I ⊆ [0 , ∞ ) be a continuous Theorem 1.2. (Iancu, 1997) Let strictly decreasing function and Δ : I × I → I properties: with the following (2.1) Δ( x, y ) = Δ( y, x ) (2.2) Δ( x, Δ( y, z )) = Δ(Δ( x, y ), z ) (2.3) Δ( x, y ) ≤ Δ( x, z ) if y ≤ z with equality iff y = z (2.4) Δ is continuous (2.5) Δ( x , e ) = x for all x , y , z ∈ I and e = f (1) . Then ∀x, y ∈ [0, 1] T ( x, y ) = f (−1) (Δ( f ( x ), f ( y ))) is a t-norm where f ( −1 ) is the pseudo-inverse of f . Example 1.6. For I = [0 , ∞ ), Δ( x , y ) = x + y + xy , f ( x ) = 1 − x we obtain the t-norm T ( x , y ) = max(2 x + 2 y − xy − 2, 0 ) . Theorem 1.3. (Iancu, 1997) Let I ⊆ R and Δ : I × I → I an application satisfying the following conditions, for all x , y , z ∈ I : (3.1) - (3.4) identical to (2.1) - (2.4) (3.5) there is e ∈ I such that Δ( x , e ) = x ∀x ∈ I (3.6) ∀x ∈ I there is x' ∈ I such that Δ( x , x' ) = e and ϕ : I → I' , ϕ ( x ) = x' is a continuous strictly decreasing function (3.7) let J = [e , ∞ ) ⊆ I and t : [0 , 1] → J be a continuous strictly increasing function with t (0 ) = e and t (1) is a finite number. We have that C ( x ) = t −1 (Δ(t (1) , ϕ(t ( x )))) is a strong negation for x ∈ [0, 1] . Example 1.7. For I = R , Δ ( x , y ) = x + y − 1, e = 1 , J = [1, ∞ ) , t ( x ) = (2 x + 1) ( x + 1) and ϕ ( x ) = 2 − x one obtain C ( x ) = (1 − x ) (1 + 3 x ) . 17 Observation 1.1. Simultaneously usage of the functions Δ and f (respectively t ) in the previous theorems allows to obtain t-norms (respectively negations) in an easier way than in the case Δ = + . For instance, if Δ = x + y then we don't work with functions of the type f ( x ) = ax + b in order to obtain the t-norm from Example 1, more complicated forms being necessary. If the ordinary t-norms (t-conorms) are used for combining the information with great (small) belief degrees then the obtained results are often contrary with the reality (Iancu, 1997). We consider for combination n facts having the same belief degree, d ; the previous observation is illustrated in table 1 and 2. t-norm xy d 0.9 0.8 MAX ( 0, x + y − 1) xy x + y − xy 0.9 0.8 0.9 0.8 λxy 1 − ( 1 − λ) ( x + y − xy ) 18 0.9 n 5 10 20 5 10 20 5 ≥ 10 ≥5 5 10 20 5 10 20 5 10 20 result 0.59049000000 0.34867844010 0.12157665459 0.32768000000 0.10737418240 0.01152921504 0.5 0 0 0.64285714286 0.47368421053 0.31034482758 0.44444444444 0.28571428571 0.16666666667 0.53656519764 0.23700106268 0.03550161915 and λ = 1 2 t-conorm x + y − xy 0.8 5 10 20 Table 1 0.23272727273 0.03409185491 0.00060127649 d 0.1 n 5 10 20 5 10 20 5 ≥ 10 ≥5 5 10 20 5 10 20 5 10 20 result 0.40951000000 0.65132155990 0.87842334541 0.67232000000 0.89262581760 0.98847078495 0.5 1 1 0.35714285714 0.52631578947 0.68965517242 0.55555555556 0.71428571428 0.83333333333 0.46343480236 0.76299893732 0.96449838084 5 10 20 Table 2 0.76727272727 0.96590814509 0.99939872350 0.2 MIN ( 1, x + y ) x + y − 2 xy 1 − xy 0.1 0.2 0.1 0.2 λ( x + y ) + xy( 1 − 2 λ) λ + xy( 1 − λ) and λ = 1 2 0.1 0.2 Starting from a given t-norm (t-conorm) we can obtain a new tnorm (t-conorm) which gives results in accordance with the reality if it is used for combining the information with great (small) belief degrees. The first result with this property is given in (Pacholczyk, 1987), where are presented the following operators with threshold a ∈ (0, 1) : 19 ⎧ 1− a ⎪⎪1 − a x - negation C a ( x ) = ⎨ ⎪ a (1 − x ) ⎪⎩1 − a if x ≤ a if x ≥ a ⎧ a ⎛1− a 1− a ⎞ T⎜ x, y⎟ ⎪ - t-norm Ta ( x , y ) = ⎨1 − a ⎝ a a ⎠ ⎪min( x , y ) ⎩ if x ≤ a and y ≤ a if x > a or y > a. corresponding to t-norm T ( x , y ) ; ⎧S ( x , y ) - t-conorm S a ( x , y ) = ⎨ ⎩max ( x , y ) if x ≥ a and y ≥ a if x < a or y < a . corresponding to t-conorm S (x , y ) . The t-norm Ta and the t-conorm S a obtained from T ( x , y ) = xy and respectively S ( x , y ) = x + y − xy has been used with very satisfactory results (Pacholczyk, 1987) for constructing an expert system that processes the uncertain questions - the SEQUI system. In a set of papers (Iancu, 1997a; 1997b; 1998a; 1999a; 1999b; 2003; 2005) this result was generalized working in an ordered field (I , ⊕ , ⊗), I ⊂ R and with an arbitrary number n ≥ 1 of thresholds; in this way, various classes of operators with threshold are obtained. We remain in the conditions of Theorem 1.3 and note Δ( x , y ) = x ⊕ y. We take ⊗: I × I → I with the following properties: i) x ⊗ y < x ⊗ z iff y < z ∀x , y , z ∈ I and x > e , ii) (I , ⊕, ⊗) is a field. 1 the inverse element of x corresponding to ⊕ and ⊗ x 1 x respectively. For the simplification of writing we note x ⊗ = and y y x ⊗ x = x2. We note Θx and 20 Theorem 1.4. (Iancu, 2003; 2005) If ⊕, Θ, ⊗ and t have the previous meaning and n ∈ N , n ≥ 1 , 0 < a1 < a2 < L < a n < 1 , δ (i ) = t (a n −i )Θt (a n −i +1 ) t (a ) ⊗ t (a n −i +1 )Θt (ai ) ⊗ t (a n −i ) and θ (i ) = i +1 then t (ai +1 )Θt (a i ) t (ai +1 )Θt (ai ) ⎧ −1 ⎛ ⎞ t (1)Θt (a n ) ⊗ t ( x )⎟⎟ if x ≤ a1 ⎪t ⎜⎜ t (1)Θ t (a1 ) ⎠ ⎪ ⎝ ⎪ −1 C a1 ,Lan ( x ) = ⎨t (t (x ) ⊗ δ (i ) ⊕ θ (i )) if ai ≤ x ≤ ai +1 and 1 ≤ i < n ⎪ ⎪t −1 ⎛⎜ (t (1)Θt ( x )) ⊗ t (a1 ) ⎞⎟ if x ≥ a n ⎪ ⎜⎝ t (1)Θt (a n ) ⎟⎠ ⎩ ⎧ −1 ⎛ ⎞ t (a1 ) ⊗ t (a n ) ⊗ (t (1) ⊕ t ( x )) ⎟⎟ if x ≤ a1 ⎪t ⎜⎜ ( ( ) ( ) ( ) ) ( ) ( ) ( ) 1 ⊕ Θ ⊗ ⊕ ⊗ t t a t a t x t a t a 1 1 n n ⎠ ⎪ ⎝ ⎪ −1 * C a1 ,Lan ( x ) = ⎨t (t ( x ) ⊗ δ (i ) ⊕ θ (i )) if ai ≤ x ≤ ai +1 and 1 ≤ i < n ⎪ ⎞ t (a1 ) ⊗ t (a n ) ⊗ (t (1)Θt ( x )) ⎪t −1 ⎛⎜ ⎟ if x ≥ a n ⎜ ⎪ ⎝ (t (1) ⊕ t (a1 )Θt (a n )) ⊗ t ( x )Θt (a1 ) ⊗ t (a n ) ⎟⎠ ⎩ and ⎧ −1 ⎛ ⎞ t (a1 ) ⊗ t (a n ) ⊗ t (1) ⎟⎟ if x ≤ a1 ⎪t ⎜⎜ ( ) ( ) ( ( ) ( ) ) ( ) 1 ⊗ ⊕ Θ ⊗ t a t a t t a t x 1 n n ⎝ ⎠ ⎪ ⎪ C a*1*,Lan ( x ) = ⎨t −1 (t (x ) ⊗ δ (i ) ⊕ θ (i )) if a i ≤ x ≤ a i +1 and 1 ≤ i < n ⎪ ⎪t −1 ⎛⎜ t (a1 ) ⊗ t (a n ) ⊗ (t (1)Θt ( x )) ⎞⎟ if x ≥ a n ⎪ ⎜⎝ (t (1)Θt (a n )) ⊗ t (x ) ⎟⎠ ⎩ are strong negations and they have 21 ⎛ ⎛ ⎞⎞ ⎞ ⎛ ⎜ t⎜ a ⎟ ⊕ t⎜ a ⎟⎟ ⎜ ⎜ ⎡⎢ n ⎤⎥ ⎟ ⎜ ⎡⎢ n +1 ⎤⎥ ⎟ ⎟ ⎣2⎦ ⎠ ⎝ ⎣ 2 ⎦ ⎠⎟ t −1 ⎜ ⎝ 2 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ as fixed point, where [x] is the greatest integer which is smaller than or equal to x . Observation 1.2. It is easy to verify the following properties: (i ) (ii ) (iii ) x ≤ a1 ⇔ N a1 ,....,an ( x ) ≥ a n x ≥ a n ⇔ N a1 ,....,an ( x ) ≤ a1 x ∈ [ai , ai +1 ] ⇔ N a1 ,...,an ( x ) ∈ [a n −i , a n −i +1 ] ∀i ∈ {1, 2 ,L , n − 1} { } where N a1 ,L, an ∈ C a1 ,L, an , C * a1 ,L, an , C ** a1 ,L, an . Observation 1.3. The relation (ii) says that if the confidence of a proposition p is greater than or equal to the threshold a n then the confidence of non p is smaller than or equal to the threshold a1 . This observation can be used for handling the confidences associated with the information from a knowledge base. Example 1.8. (Iancu, 2003; 2005) For ⊕ = + , ⊗ = × , n = 2 and 2x one obtain t (x ) = x +1 ⎧ 2a1 a 2 x + a 2 x − x + a1 a 2 + a1 if x ≤ a1 ⎪ ⎪ 2a1 x − a 2 x + x + a1 a 2 + a1 ⎪ (a a − 1)x + a1 + a 2 + 2a1 a 2 C a1 ,a2 (x ) = ⎨ 1 2 if a1 ≤ x ≤ a 2 ( ) + + + − x a a a a 2 1 1 2 1 2 ⎪ ⎪ (1 − x )a1 (1 + a 2 ) if x ≥ a 2 ⎪ ⎩ x(1 + 2a1 − a 2 ) + 1 − 2a1 a 2 − a 2 22 ⎧ a1 a 2 (1 + 3 x ) if x ≤ a1 ⎪ ⎪ x(1 + 3a1 − a 2 ) + a1 a 2 ⎪ (a a − 1)x + a1 + a 2 + 2a1 a 2 C a*1 ,a2 (x ) = ⎨ 1 2 if a1 ≤ x ≤ a 2 ( ) + + + − 2 1 x a a a a 1 2 1 2 ⎪ ⎪ (1 − x )a1a 2 if x ≥ a 2 ⎪ ⎩ x(1 + 3a1 − a 2 ) − 3a1 a 2 and ⎧ a1 a 2 (1 + x ) if x ≤ a1 ⎪ ( ) + − + x a a a a 1 1 2 1 2 ⎪ ⎪ (a a − 1)x + a1 + a 2 + 2a1 a 2 if a1 ≤ x ≤ a 2 C a*1*,a2 (x ) = ⎨ 1 2 ⎪ x(a1 + a 2 + 2) + 1 − a1 a 2 ⎪ (1 − x )a1 a 2 if x ≥ a 2 ⎪ ⎩ x(1 + a1 − a 2 ) − a1 a 2 Theorem 1.5. (Iancu, 2005) Let S be a t-conorm and S '∈ {S M , S }. For 0 < a1 < a 2 < L < a n < 1 ⎧max( x , y ) if x < a1 or y < a1 ⎪ S a1 ,L,an ;S ' ( x , y ) = ⎨S ( x , y ) if x ≥ a n and y ≥ a n ⎪ S' ( x , y ) otherwise ⎩ is a t-conorm. Theorem. 1.6. (Iancu, 2005) Let (T , S ) be a pair (t-norm, t-conorm) dual with respect to C ( x ) = t −1 (t (1)Θt ( x )) , S '∈ {S M , S } and T ' is the dual of S ' with respect to the same negation, i.e. T '∈ {TM , T } . We use the following notations k= t (a1 ) , t (1)Θt (a n ) 23 ς = t( 1 ) ⊕ t (a1 )Θt (a n ) , ⎛1 ⎝k ⎞ ⎠ α (z ) = t −1 ⎜ ⊗ t ( z ) ⎟ , ⎞ t (z ) ⊗ (t (1) ⊕ t (a1 )) ⊗ (t (1)Θt (a n )) ⎟⎟ , ( ( ) ( ) ( ) ) ( ) ( ) ( ) ⊕ Θ ⊗ ⊕ ⊗ t 1 t a t a t z t a t a 1 n 1 n ⎠ ⎝ ⎛ γ (z ) = t −1 ⎜⎜ β (z , i ) = t −1 (t (1)Θt (z ) ⊗ δ (i )Θθ (i )) , where δ (i ) and θ (i ) have the same signification from the last theorem. For 0 < a1 < a2 < L < a n < 1 we define ⎧t −1 (k ⊗ t (T (α ( x ) ,α ( y )))) if x ≤ a1 and y ≤ a1 ⎪ ⎪ ⎪t −1 (k ⊗ t (T ' (α ( x ) , β ( y ,i )))) ⎪ ⎪ if x ≤ a1 and y ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1 ⎪ ⎪ ⎪t −1 (k ⊗ t (T ' (β ( x ,i ) ,α ( y )))) ⎪ if y ≤ a1 and x ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1 ⎪ ⎪ ⎪ ⎪⎪t −1 ((t (1)Θt (T ' (β ( x ,i ) , β ( y , j )))) ⊗ δ (k ) ⊕ θ (k )) Ta1 ,L,an ;T ' ( x , y ) = ⎨ if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 , l = max(i , j ) and ⎪ ⎪ there is an int eger k ∈ [n − l , n − 1] such that ⎪ ⎪ T ' (β ( x ,i ), β ( y , j )) ∈ [C (a k +1 ), C (a k )) ⎪ ⎪ ⎪t −1 (k ⊗ t (T ' (β ( x ,i ) , β ( y , j )))) ⎪ if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 and ⎪ ⎪ T ' (β ( x ,i ), β ( y , j )) < C (a n ) ⎪ ⎪ ⎪ ⎪⎩min( x , y ) if x > a n or y > a n and 24 ( ] ( ] ⎧ −1 ⎛ ⎞ t (a1 ) ⊗ t (a n ) ⊗ t (T (γ ( x ) ,γ ( y ))) ⎟⎟ ⎪t ⎜⎜ ⎪ ⎝ ς ⊗ (t (1)Θt (T (γ ( x ),γ ( y ))))Θt (a1 ) ⊗ t (a n ) ⎠ ⎪ if x ≤ a1 and y ≤ a1 ⎪ ⎪ ⎪ ⎞ t (a1 ) ⊗ t (a n ) ⊗ t (T ' (γ (x ) , β ( y ,i ))) ⎪t −1 ⎛⎜ ⎟ ⎜ ⎪ ⎝ ς ⊗ (t (1)Θt (T ' (γ ( x ), β ( y , j ))))Θt (a1 ) ⊗ t (a n ) ⎟⎠ ⎪ if x ≤ a1 and y ∈ (ai , ai +1 ], 1 ≤ i ≤ n − 1 ⎪ ⎪ ⎪ ⎪ −1 ⎛ ⎞ t (a1 ) ⊗ t (a n ) ⊗ t (T ' (β ( x ,i ) ,γ ( y ))) ⎟⎟ ⎪t ⎜⎜ ( ( ) ( ( ( ) ( ) ) ) ) ( ) ( ) ⊗ Θ Θ ⊗ ς β γ 1 t t T ' x , i , y t a t a 1 n ⎠ ⎪ ⎝ ⎪ if x ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1 and y ≤ a1 ⎪ ⎪ ⎪ Ta*1 ,Lan ;T ' ( x , y ) = ⎨t −1 ((t (1)Θt (T ' (β ( x ,i ), β ( y , j )))) ⊗ δ (k ) ⊕ θ (k )) ⎪ if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 , l = max(i , j ) and ⎪ ⎪ there is an int eger k ∈ [n − l , n − 1] such that ⎪ T ' (β ( x ,i ), β ( y , j )) ∈ [C (a k +1 ), C (a k )) ⎪ ⎪ ⎪ ⎪ −1 ⎛ ⎞ t (a1 ) ⊗ t (a n ) ⊗ t (T ' (β ( x ,i ) , β ( y , j ))) ⎟⎟ ⎪t ⎜⎜ ( ( ) ( ( ( ) ( ) ) ) ) ( ) ( ) ⊗ Θ Θ ⊗ t t T ' x , i , y , j t a t a ς β β 1 1 n ⎠ ⎝ ⎪ ⎪ if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 and ⎪ T ' (β ( x ,i ), β ( y , j )) < C (a n ) ⎪ ⎪ ⎪ ⎪min( x , y ) if x > a n or y > a n ⎪ ⎪ ⎪ ⎩ ] ( ( ] Then (i) Ta1 ,L,an ;T ' is a t-norm C a1 ,L,an - dual with t-conorm S a1 ,L,an ;S' (ii) Ta*1 ,L,an ;T ' is a t-norm C a*1 ,L,an - dual with t-conorm S a1 ,L,an ;S' . 25 Example 1.9 (Iancu, 2005) By particularization, one can be obtained new extensions of another known t-operators. For instance, for n = 2 , a = a1 , b = a 2 , ⊕ = + , ⊗ = × , t ( x ) = x and T ' = TM = min in Theorem 1.6 it results ⎧ a ⎛1− b 1− b ⎞ T⎜ x, y⎟ ⎪ Ta / b ( x , y ) = ⎨1 − b ⎝ a a ⎠ ⎪min( x , y ) ⎩ if x≤a and y≤b otherwise that is an extension of Pacholczyk’s (1987) t-norm and * a/b T abT (γ ( x ), γ ( y )) ⎧ if ⎪ (x , y ) = ⎨ (1 − a + b )(1 − T (γ (x ), γ ( y ))) − ab ⎪min( x , y ) otherwise ⎩ where γ ( z ) = (1 + a )(1 − b )z (1 + a − b )z + ab x ≤ a and y≤b , which is a t-norm with 1-threshold a and parameter b . Theorem. 1.7. (Iancu, 2003) Let (T , S ) be a pair (t-norm, t-conorm) dual with respect to C ( x ) = t −1 (t (1)Θt ( x )) , S '∈ {S M , S } and T ' is the dual of S ' with respect to the same negation, i.e. T '∈ {TM , T } . We use the following notations k= t (a1 ) ⊗ t (a n ) , t (1)Θt (a n ) (t (1)Θt (a n )) ⊗ t (z ) ⊗ t (1) ⎞⎟ ⎟ ⎝ t (a1 ) ⊗ t (a n ) ⊕ (t (1)Θt (a n )) ⊗ t ( z ) ⎠ ⎛ α (z ) = t −1 ⎜⎜ and β (z , i ) = t −1 (t (1)Θt (z ) ⊗ δ (i )Θθ (i )) , where δ (i ) and θ (i ) have the same signification from the Theorem 1.4. 26 For 0 < a1 < a2 < L < a n < 1 we define ⎧ −1 ⎛ t (T (α ( x ) ,α ( y ))) ⎞ ⎟⎟ ⎪t ⎜⎜ k ⊗ ( ) ( ( ( ) ( ) ) ) Θ α α t t T x , y 1 ⎝ ⎠ ⎪ ⎪ if x ≤ a1 and y ≤ a1 ⎪ ⎪ ⎪ ⎪t −1 ⎛⎜ k ⊗ t (T ' (α ( x ) , β ( y ,i ))) ⎞⎟ ⎪ ⎜⎝ t (1)Θt (T ' (α ( x ), β ( y ,i ))) ⎟⎠ ⎪ if x ≤ a1 and y ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1 ⎪ ⎪ ⎪ ⎪ −1 ⎛ t (T ' (β ( x ,i ) ,α ( y ))) ⎞ ⎟ ⎪t ⎜⎜ k ⊗ t (1)Θt (T ' (β ( x ,i ) ,α ( y ))) ⎟⎠ ⎪ ⎝ ⎪ if x ∈ (ai , ai +1 ] , 1 ≤ i ≤ n − 1 and y ≤ a1 ⎪ ⎪ ⎪ ** Ta1 ,Lan ;T ' ( x , y ) = ⎨t −1 ((t (1)Θt (T ' (β ( x ,i ), β ( y , j )))) ⊗ δ (k ) ⊕ θ (k )) ⎪ if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 , l = max(i , j ) and ⎪ ⎪ there is an int eger k ∈ [n − l , n − 1] such that ⎪ T ' (β ( x ,i ), β ( y , j )) ∈ [C (a k +1 ), C (a k )) ⎪ ⎪ ⎪ ⎪ −1 ⎛ t (T ' (β ( x ,i ) , β ( y , j ))) ⎞ ⎟⎟ ⎪t ⎜⎜ k ⊗ ( ) ( ( ( ) ( ) ) ) Θ β β t t T ' x , i , y , j 1 ⎝ ⎠ ⎪ ⎪ if x ∈ (ai , ai +1 ] , y ∈ a j , a j +1 and ⎪ T ' (β ( x ,i ), β ( y , j )) < C (a n ) ⎪ ⎪ ⎪ ⎪min( x , y ) if x > a n or y > a n ⎪ ⎪ ⎪ ⎩ ] ( ( ] Then Ta*1*,L,an ;T ' is a t-norm C a*1*,L,an - dual with t-conorm S a1 ,L,an ;S' . 27 Example 1.10. (Iancu, 2003) For n = 2 , a = a1 , b = a 2 , ⊕ = + , ⊗ = × , t ( x ) = x and T ' = TM = min in Theorem 1.7 we obtain a new parametric extension for t-norms with 1-threshold ⎧ ab T (α ( x ), α ( y )) ⎪ Ta / b ( x , y ) = ⎨1 − b ⎪⎩min( x , y ) where α ( z ) = (1 − b )z (1 − b )z + ab if x ≤ a and y≤b otherwise , a is the threshold and b is the parameter. As is presented in some of our papers (for instance, (Iancu, 1987a)) the operators with thresholds can be used with good results for combining the belief degrees attached of the information from a knowledge base. The choosing of the thresholds allows us to obtain the results in accordance with the reality even if the operators used for constructing the operators with threshold have not this property. 1.3 FUZZY NUMBERS An important concept in fuzzy set theory is the extension principle used to extend any point operations to operations involving fuzzy sets. Definition 1.18. Let X be the cartesian product X = X 1 × X 2 × L × X n , A1 , L , An be fuzzy sets in X 1 , L , X n , respectively and f a mapping from X to an universe Y , y = f (x1 , L , x n ) . Then the extension principle allows us to define a fuzzy set B in Y by B = {( y , μ B ( y )) / y = f ( x1 , L , x r ), ( x1 , L , x r ) ∈ X } 28 where { } ⎧ min μ A 1 ( x1 ), L , μ A n ( x n ) if f −1 ( y ) ≠ ∅ sup ⎪ μ B ( y ) = ⎨( x1 , L, xn )∈ f −1 ( y ) ⎪0 otherwise ⎩ For r = 1 the extension principle reduces to B = f ( A) = {( y , μ B ( y )) / y = f ( x ), x ∈ X } where ⎧ sup μ A ( x ) if f −1 ( y ) ≠ ∅ ⎪ μ B ( y ) = ⎨ x∈ f −1 ( y ) ⎪⎩0 otherwise Example 1.11. For A = {(− 1, .5), (0 , .8), (1, 1), (2 , . 4 )} and f ( x ) = x 2 it results B = {(0 , .8), (1, 1), (4 , .4 )} Definition 1.19. A fuzzy number A is a fuzzy set of the real line with a normal, fuzzy convex and continuous membership function of bounded support. The set of fuzzy numbers will be notated by F (R). Definition 1.20. A quasi fuzzy number A is a fuzzy set of the real line with a normal, fuzzy convex and continuous membership function satisfying the limit conditions lim μ A (t ) = 0 , lim μ A (t ) = 0 . t →∞ t → −∞ Definition 1.21. A fuzzy number A is called positive (negative) if its membership function satisfies μ A ( x ) = 0 , ∀x < 0 (∀x > 0 ) . Definition 1.22. A binary operation ∗ is called increasing (decreasing) if x1 > y1 and x2 > y 2 . Then x1 ∗ x 2 > y1 ∗ y 2 (x1 ∗ x 2 < y1 ∗ y 2 ) . From the extension principle it results: 29 a) b) c) f ( x ) = − x the opposite of a fuzzy number A is − A = {( x , μ − A ( x )) / x ∈ X } , where μ − A ( x ) = μ A (− x ) ; 1 then the inverse of a fuzzy number A is given by if f ( x ) = x ⎛1⎞ A −1 = {(x , μ A−1 ( x )) / x ∈ X }, where μ A−1 ( x ) = μ A ⎜ ⎟ ; ⎝ x⎠ for λ ∈ R − {0} and f ( x ) = λx then the scalar multiplication of a fuzzy number is given by λA = {( x , μ λA ( x )) / x ∈ X }, where μ λA (x ) = μ A (λx ) . for The extensions of algebraic operations + , − , ×, : to operations on ~, − ~ and ~: , respectively. ~, × fuzzy numbers will be denoted by + Some of important properties of ~ + and ~ × operations, for real fuzzy numbers, are: + operation for ~ 1. ~ − (A ~ + B ) = (~ − A) ~ + (~ − B) + is commutative 2. ~ 3. ~ + is associative ~ , that is 4. 0 ∈ R ⊆ F(R) is the neutral element for + A~ + 0 = A ∀A ∈ F(R) + there does not exist an inverse element, that is, 5. for ~ ∃A ∈ F(R) - R : A ~ + (~ − A) ≠ 0 ∈ R. for ~ × operation 1. (~ − A) ~ ×B=~ −( A ~ × B) 2. ~ × is commutative 3. ~ × is associative ~ , that is 4. 1∈ R ⊆ F(R) is the neutral element for + A~ × 1 = A ∀A ∈ F(R) ~ 5. for × there does not exist an inverse element, that is, ∃A ∈ F(R)- R : A ~ × A −1 ≠ 1 30 Theorem 1.8. (Dubois & Prade, 1980). If A and B are fuzzy numbers with continuous and surjective membership functions from R to [0, 1] and ∗ is a continuous increasing (decreasing) binary operation, then A ~∗ B is a fuzzy number whose membership function is continuous and surjective from R to [0, 1] . The membership function of A~ ∗ B can be determined from the membership functions of A and B , according with the Theorem 1.9. (Dubois & Prade, 1980). If A, B ∈ F ( R) with continuous membership functions then the extension principle for the binary function ∗ : R × R → R gives the fuzzy number A ~∗ B : μ A~∗ B ( z ) = sup min{μ A (x ), μ B ( y )} . z = x∗ y Example 1.12. Let A = {(1, 0.3) , (2 , 1) , (3, 0.4 )} and B = {(2 , 0.7 ) , (3, 1) , (4 , 0.2 )} be fuzzy numbers. Then A~ × B = {(2 , 0.3) , (3, 0.3), (4 , 0.7 ) , (6 , 1) , (8, 0.2 ) , (9 , 0.4 ) , (12 , 0.2 )}. Frequently, in applications one work with triangular or trapezoidal fuzzy numbers, given by the following definitions. Definition 1.23. A fuzzy set A is called triangular fuzzy number with center a , left width α > 0 and right width β > 0 , denoted as A = (a , α , β) , if its membership function is ⎧1 − (a − x ) α if x ∈ [a − α , a ] ⎪ μ A ( x ) = ⎨1 − ( x − a ) β if x ∈ [a , a + β ] ⎪0 otherwise ⎩ 31 A triangular fuzzy number with center a is seen as a fuzzy quantity “ x is approximately equal to a ”. Definition 1.24. A fuzzy set A is called trapezoidal fuzzy number with tolerance interval [a , b] , left width α > 0 and right width β > 0 , denoted as A = (a , b , α , β ) , if its membership function is ⎧1 − (a − x ) α if x ∈ [a − α , a ] ⎪1 if x ∈ [a , b] ⎪ μ A (x ) = ⎨ ⎪1 − ( x − b ) β if x ∈ [b , b + β ] ⎪⎩0 otherwise A trapezoidal fuzzy number with tolerance interval [a , b] is seen as a fuzzy quantity “ x is approximately in the interval [a , b] ”. Computational efficiency has a particular importance when fuzzy set are used to solve real problems. The LR-representation of fuzzy numbers, suggested by Dubois and Prade (1979), increases computational efficiency, without limiting the generality. Definition 1.25. Any fuzzy number A can be represented as ⎧ L((a − x ) α ) ⎪1 ⎪ μ A (x ) = ⎨ ⎪ R(( x − b ) β ) ⎪⎩0 if x ∈ [a − α , α ] if x ∈ [a , b] if x ∈ [b , b + β ] otherwise where [a , b] is the core of A , L , R : [0 , 1] → [0 , 1] are continuous and nonincreasing shape functions with L(0 ) = R (0 ) = 1 and L(1) = R(1) = 0 ; a such representation is referred as A = (a , b , α , β )LR . Definition 1.26. Any quasi fuzzy number A can be represented as 32 ⎧ L((a − x ) α ) ⎪1 ⎪ μ A (x ) = ⎨ ⎪ R(( x − b ) β ) ⎪⎩0 if x ≤ a if x ∈ [a , b] if x ≥ b otherwise where [a , b] is the core of A , L , R : [0 , ∞ ] → [0 , 1] are continuous and non-increasing shape lim L(x ) = lim R(x ) = 0 x →∞ x →∞ functions ; a such L(0 ) = R (0 ) = 1 with representation is referred and as A = (a , b , α , β )LR . For a = b we use the notation A = (a , α , β)LR and A is named quasitriangular fuzzy number. If L( x ) = R (x ) = 1 − x then instead of A = (a , b , α , β )LR the notation A = (a , b , α , β ) is used. In what follows we will name, shortly, the quasi-fuzzy numbers as fuzzy numbers. Example 1.13. (Zimmerman, 1991). Let L(x ) = 1 1 , R(x ) = , 2 1+ 2 x 1+ x α = 2, β = 3, a = 5. Then A = (5, 2 , 3)LR is given by ⎧ ⎛5− x⎞ 1 ⎪ L⎜ 2 ⎟ = 2 ⎠ ⎛5− x⎞ ⎪ ⎝ 1+ ⎜ ⎟ ⎪ ⎝ 2 ⎠ μ A (x ) = ⎨ 1 ⎪ R⎛ x − 5 ⎞ = ⎜ ⎟ ⎪ ⎝ 3 ⎠ 2(x − 5) 1+ ⎪ 3 ⎩ if x ≤5 if x ≥5 The operations with LR fuzzy numbers are considerably simplified: Dubois and Prade gave exact formulas for ~ + and ~ − and approximate ~ ~ expressions for × and : . 33 Theorem 1.10. (Dubois & Prade, 1980) Let LR fuzzy numbers A = (a , α , β )LR and B = (b , γ , δ )LR . Then 1) (a , α , β)LR ~ + (b , γ , δ )LR = (a + b , α + γ , β + δ )LR 2) − (a , α , β )LR = (− a , β , α )LR − (b , γ , δ )LR = (a − b , α + δ , β + γ )LR . 3) (a , α , β)LR ~ Example 1.14. For B = (2, 0.6, 0.2)LR it L(x ) = R( x ) = 1 , 1+ x2 results A = (1, 0.5, 0.8)LR A~ + B = (3, 1.1, 1)LR and and A~ − B = (− 1, 0.7 , 1.4 )LR . Theorem 1.11. (Dubois & Prade, 1980) If A and B are LR fuzzy numbers, then (a , α , β)LR ~× (b , γ , δ)LR ≅ (ab , aγ + bα , aδ + bβ)LR if A and B are positive; (a , α , β)LR ~× (b , γ , δ)LR ≅ (ab , bα − aδ, bβ − aγ )LR if A < 0 and B > 0 ; (a , α , β)LR ~× (b , γ , δ )LR ≅ (ab , − bβ − aδ, − bα − aγ )LR if A and B are negative. Example 1.15. (Zimmerman, 1991). For A = (2, 0.2, 0.1)LR , B = (3, 0.1, 0.3)LR and ⎧1 if -1 ≤ x ≤ 1 L(x ) = R( x ) = ⎨ ⎩0 otherwise we obtain ⎧ ⎛2− x⎞ ⎪ L⎜ 0.2 ⎟ if ⎪ ⎝ ⎠ μ A (x ) = ⎨ ⎪ R⎛⎜ x − 2 ⎞⎟ if ⎪⎩ ⎝ 0.1 ⎠ 34 x≤2 ⎧1 if 1.8 ≤ x ≤ 2.2 = ⎨ ⎩0 otherwise x≥2 ⎧ ⎛3− x⎞ ⎪ L⎜ 0.1 ⎟ if ⎪ ⎝ ⎠ μ B (x ) = ⎨ ⎪ R⎛⎜ x − 3 ⎞⎟ if ⎪⎩ ⎝ 0.3 ⎠ x≤3 x≥3 ⎧1 if 2.7 ≤ x ≤ 3.3 = ⎨ ⎩0 otherwise therefore A and B are positive numbers. According to the last theorem, it results A~ × B ≅ (2 × 3, 2 × 0.1 + 3 × 0.2, 2 × 0.3 + 3 × 0.1)LR = (6 , 0.8, 0.9)LR . 1.4 FUZZY RELATIONS Fuzzy relations are natural extensions of classical relations and they are important because they describe interactions between variables. We consider only binary relations, because the extension to n -any relations is straight forward. Definition 1.27. Let X and Y be nonempty sets. A fuzzy relation R is a fuzzy subset of X ×Y : R = {(( x , y ), μ R ( x , y )) / (x , y ) ∈ X × Y } where μ R ( x , y ) is the degree of membership of (x , y ) in R . Example 1.16. Fie X = Y = R and R = „considerably larger than“. This relation can be defined by 35 ⎪⎧0 ⎪⎩ 1 + ( y − x )− 2 μ R~ (x , y ) = ⎨ ( ) −1 if x≤ y if x > y. The same relation can be defined using a table: x1 x2 x3 y1 y2 y3 y4 0.8 1 0.1 0.7 0 0.8 0 0 0.9 1 0.7 0.8 where X = {x1 , x 2 , x3 } and Y = {y1 , y 2 , y 3 , y 4 } . The operations with fuzzy relations can be defined by analogy with the operations of fuzzy sets. Definition 1.28. Let R and S be two fuzzy relations in the same product space X × Y ; then their union/intersection is defined by μ R ∪ S (x , y ) = max{μ R ( x , y ) , μ S ( x , y )}, μ R ∩ S ( x , y ) = min{μ R ( x , y ) , μ S ( x , y )}, ∀(x ,y ) ∈ X × Y We can use any t-norm T and t-conorm S instead of min and max operations, respectively. Some other operations, such as the projection and the cylindrical extension of fuzzy relations can also be useful. Definition 1.29. Let R be a fuzzy relation on X × Y . The projection of R on X is a fuzzy subset of X , defined by ⎫ ⎧⎛ ⎞ Π X (R ) = ⎨⎜ x , sup μ R ( x , y )⎟ / ( x , y ) ∈ X × Y ⎬ y ⎠ ⎭ ⎩⎝ The total projection is given by 36 Π T (R ) = sup sup{μ R ( x , y ) / ( x , y ) ∈ X × Y }. x y Definition 1.30. The largest relation in X for which the projection is R is called the cylindrical extension of R . Example 1.17. An example of fuzzy relation and its projections is ⎛ ⎜ ⎜ x1 ⎜ ⎜ x2 ⎜x ⎜ 3 ⎜ ΠY ⎜ ⎝ ΠT y1 y2 y3 y4 y5 y6 0.1 0.2 0.3 0.6 1 0.2 0.3 0.5 0.4 1 0.5 0.6 0.4 0.8 1 0.7 0.8 0.8 0.4 0.8 1 1 1 0.8 ΠX ⎞ ⎟ 1 ⎟ ⎟ 1 ⎟ 1 ⎟ ⎟ 1 ⎟ ⎟ 1 ⎠ The cylindrical extension of Π Y is y1 ⎛ ⎜ ⎜ x1 0.4 ⎜ ⎜ x2 0.4 ⎜ x 0.4 ⎝ 3 y2 y3 y4 y5 0.8 1 1 1 0.8 1 1 1 0.8 1 1 1 ⎞ ⎟ 0.8 ⎟ ⎟ 0.8 ⎟ 0.8 ⎟⎠ y6 Fuzzy relations in different product spaces can be combined by the operation “composition”. Different versions of “composition” have been suggested but the sup-min composition is the best known and the most frequently used. Definition 1.31. Let R and S be two fuzzy relations on X × Y and Y × Z , respectively. Their sup-min composition is defined as ⎫ ⎧ (R o S ) = ⎨⎡⎢(x , z ), sup min{R(x , y ), S ( y , z )}⎤⎥ / x ∈ X , y ∈ Y , z ∈ Z ⎬ ⎩⎣ y∈Y ⎦ ⎭ Example 1.18. (Fullér, 1995). The sup-min composition of the relations 37 z1 ⎛ ⎜ y1 y 2 y3 y 4 ⎞ ⎛ ⎜ ⎟ ⎜ y1 0.4 ⎜ x1 0.5 0.1 0.1 0.7 ⎟ ⎜ R=⎜ ⎟ and S = ⎜ y 2 0 0⎟ ⎜ x 2 0 0.8 0 ⎜ y 0.9 ⎜ x 0.9 1 0.7 0.8 ⎟ ⎜ 3 ⎝ 3 ⎠ ⎜ y 4 0.6 ⎝ is z2 0.9 0.4 0.5 0.7 z3 ⎞ ⎟ 0.3 ⎟ ⎟ 0 ⎟ 0.8 ⎟ ⎟ 0.5 ⎟ ⎠ z1 z2 z3 ⎞ ⎛ ⎜ ⎟ ⎜ x1 0.6 0.8 0.5 ⎟ RoS =⎜ ⎟ ⎜ x2 0 0.4 0 ⎟ ⎜ x 0.7 0.9 0.7 ⎟ ⎝ 3 ⎠ Changing the operation min from the last definition with a t-norm T one obtain the sup− T composition. Following ( Zadeh, 1973), the sup-min composition of a fuzzy set and fuzzy relations can be obtained as follows. Definition 1.32. Let T be a t-norm. The membership function of the composition of a fuzzy set A in X and a fuzzy relation R in X × Y is defined by μ Ao R ( y ) = sup T (μ A ( x ), μ R ( x , y )), ∀y ∈ Y x∈ X 38 2 UNCERTAINTY 2.1 POSSIBILITY AND NECESSITY MEASURES In 1974 Sugeno introduced the concept of fuzzy measure as in the next definition: Definition 2. 1. Given the universe X (supposed to be finite, for sake of simplicity) a fuzzy measure is a set function g from the set 2 X of subsets of X to the interval [0, 1] , such that i) g (∅ ) = 0 , ii) g ( X ) = 1 iii) ∀A, B ∈ 2 X , if A ⊆ B then g ( A) ≤ g (B ) The following inequalities hold ∀A, B ∈ 2 X , g ( A ∩ B ) ≤ min(g ( A), g (B )) ∀A, B ∈ 2 X , g ( A ∪ B ) ≥ max( g ( A), g (B )) In order to combine the uncertainties of two events A and B we use the relations 1) A ∩ B = ∅ ⇒ g ( A ∪ B ) = g ( A)* g (B ) where ∗ is a t-conorm A ∪ B = X ⇒ g ( A ∩ B ) = g ( A) ⊥ g (B ) 2) where ⊥ is a t-norm The most important fuzzy measures are: • the possibility measure Π : ∀A, B ∈ 2 X , Π ( A ∪ B ) = max(Π ( A), Π (B )) • the necessity measure: N: ∀A, B ∈ 2 X , N ( A ∩ B ) = min( N ( A), N (B )) It is easy to verify that Ν is a necessity measure if and only if Π is defined as Π ( A) = 1 − Ν (¬A) ∀A ⊆ X . Let x1 , x 2 , L , x n be elementary events of X; the values π i = Π ({xi }) , i ∈ {1, 2, L , n}, define possibility distribution of Π . The possibility measure Π can be defined using a possibility distribution π : X → [0, 1] with Π ( A) = sup π ( x ) . x∈ A It results that the necessity measure is defined with the relation Ν ( A) = inf {1 − π ( x ) / x ∉ A}. Some important properties of these measures are: max(Π ( A), Π (¬A)) = 1 , min( N ( A), N (¬A)) = 0 N ( A) + N (¬A) ≤ 1 , Π ( A) + Π (¬A) ≥ 1 Π ( A) ≥ N ( A) or, more generally, g ( A) ∗ g (¬A) = 1 , g' ( A) ⊥ g' (¬A) = 0 , where ∗ is a t-conorm, ⊥ is a t-norm, g is a fuzzy measure and 40 g' ( A) = 1 − g (¬A) . Example 2.1. If E ⊆ X is a certain event then, we can define a possibility measure and a necessity measure as follows ⎧1 1) Π E ( A) = ⎨ ⎩0 if if A∩ E ≠ ∅ A∩ E = ∅ where Π E ( A) = 1 has the meaning: the event X is possible ⎧1 2) N E ( A) = ⎨ ⎩0 if E ⊆ A otherwise where N E ( A) = 1 says: the event X is necessary true. Having a basic fuzzy event defined by a fuzzy set A , the possibility and the necessity measures for an event defined by the fuzzy set F are (Dubois, 1983): Π T ; A (F ) = sup T (μ F ( x ), μ A ( x )) x∈ X N S ,C ; A (F ) = inf S (μ F ( x ), C (μ A (x ))) x∈ X If the t-norm T and the t-conorm S are C-dual then Π T ; A (F ) = C (N S ,C ; A (¬F )) The inequality from Π and N from the crisp case, generally, is not true in the fuzzy case; a such case is (Yager, 1983a): C ( x ) = 1 − x , T (x , y ) = min( x , y ) , S ( x , y ) = max( x , y ) , max μ A ( x ) ≤ 0.5 , x∈X when N S ,C ; A (F ) > Π T ; A (F ) ∀F ∈ 2 X 41 Following (Iancu, 1988b) we present a general method for constructing possibility and necessity measures. Definition 2.2. We call a maximization operator (or the Pedrycz's operator) associated to a t-norm T, the application τ T : [ 0, 1] × [ 0, 1] → [ 0, 1] with the properties: xτ T y ≤ zτ T y if x ≤ z , T (x ,y )τ T y ≥ x , T ( xτ T y , y ) ≤ x . Theorem 2.1. (Iancu, 1988b) Let f : [ 0, 1] → I ⊆ J ⊆ R be a continuous and strictly decreasing function and Δ: J × J → J an application which satisfies the conditions (1.2.1) - (1.2.6) from Theorem 1.3 for x , y , z ∈ J and e = f (1) . Then there is the maximization operator associated with tnorm T ( x ,y ) = f (−1) (Δ( f ( x ), f ( y ))) and it is given by ⎧⎪1 xτ T y = ⎨ −1 ⎪⎩ f Δ f ( x ) , ϕ f ( y ) (( ( ))) if x ≥ y if x < y. A R-implication associated with a t-norm T is defined by I T ( x , y ) = sup{z ∈ [0 , 1] / T ( x , z ) ≤ y} x , y ∈ [0, 1] We consider the following R-implications: ;T a ⎯R⎯ → b = sup{x ∈ [0 , 1] / T (a , x ) ≤ b} a ⎯⎯ ⎯→ b = C (inf {x ∈ [0 ,1] / S (b , x ) ≥ a}) R ;T ,C R ;T ⎯ C (a ) = sup{x ∈ [0 ,1] / T (C (b ), x ) ≤ C (a )} = C (b ) ⎯⎯→ where T is a t-norm, C is a strict negation and S is the t-conorm C -dual with T. Using the Pedrycz's operator we obtain 42 if a ≤ b ⎧1 R ;T ⎯ b = bτT a = ⎨ −1 a ⎯⎯→ ⎩ f (Δ( f (b ),ϕ( f (a )))) if a > b ⎧1 ⎯→ b = C (a )τT C (b ) = ⎨ −1 a ⎯⎯ R ;T ,C ⎩ f (Δ( f (C (a ),ϕ( f (C (b )))))) We define { if a ≤ b if a > b. } ;T ΓT ; A (B ) = inf μ A ( x ) ⎯R⎯ → μ B (x ) x∈ X { } LT ,C ; A (B ) = inf μ A ( x ) ⎯R⎯ ⎯→ μ B ( x ) , ;T ,C x∈ X Λ T ,C ; A (B ) = C (ΓT ; A (B )) , VT ,C ; A (B ) = C (LT ,C ; A (B )) . Theorem 2.2. (Iancu, 1988b) ΓT ; A and LT ,C ; A are necessity measures and Λ T ,C ; A and VT ,C ; A are possibility measures. Example 2.2. (Iancu, 1988b) For f ( x ) = 2 − x , Δ ( x , y ) = 2 x + 2 y − xy − 2, ϕ ( x ) = T ( x , y ) = xy ; then, for C ( x ) = 1 − x we obtain ⎧1 ⎪ ΓT ; A (B ) = inf ⎨ μ B ( x ) x∈ X ⎪ μ (x ) ⎩ A ⎧1 ⎪ LT ,C ; A (B ) = inf ⎨1 − μ A (x ) x∈X ⎪1 − μ (x ) B ⎩ 3 − 2x we have 2− x if μ A ( x ) ≤ μ B ( x ) if μ A ( x ) > μ B ( x ) if μ A ( x ) ≤ μ B ( x ) if μ A ( x ) > μ B ( x ) 43 ⎧0 ⎪ Λ T ,C ; A (B ) = sup ⎨ μ A ( x ) + μ B ( x ) − 1 x∈ X ⎪ μ A (x ) ⎩ ⎧0 ⎪ VT ,C ; A (B ) = sup ⎨ μ A ( x ) + μ B ( x ) − 1 x∈ X ⎪ μ B (x ) ⎩ if μ A ( x ) ≤ 1 − μ B ( x ) if μ A (x ) > 1 − μ B (x ) if μ A ( x ) ≤ 1 − μ B (x ) if μ A ( x ) > 1 − μ B ( x ). Observation 2.1. We call R-measures the ones constructed by Rimplications. Observation 2.2. Using the previous method, the implication a → b = S ( C( a ) , b) (called S-implication) generates the measures Π and N . Observation 2.3. There are some implications which cannot generate the necessity and possibility measures using the previous method. For ( a → b = S C( a ) , T ( a , b ) instance, we consider the implication ) ; for T ( x ,y ) = xy , S ( x ,y ) = x + y − xy , C ( x ) = 1 − x and the universe Ω we obtain Ν T ,C ; A (Ω ) = inf S (C (μ A (ω )), T (μ A (ω ), μ Ω (ω ))) ω∈Ω ( ) ⎡3 ⎤ = inf μ A2 (ω )−μ A (ω )+1 ∈ ⎢ ,1⎥ , ω∈Ω ⎣4 ⎦ therefore, generally, Ν T ,C ; A ( Ω) ≠ 1. Some properties of R-measures are: { } (P1) Μ( Ω) = 1 and Μ( ∅) = 0 for Μ ∈ ΓT ; A , L T ,C ; A , Λ T ,C ; A ' VT ,C ; A , ( ) { } (P2) Μ( B1 ∩ B2 ) = min Μ( B1 ) , Μ( B2 ) for Μ ∈ ΓT ; A , L T ,C ; A , 44 ( ) { } (P3) Μ( B1 ∪ B2 ) = max Μ( B1 ) , Μ( B2 ) for Μ ∈ Λ T ,C; A , VT ,C ; A , { } (P4) B1 ⊆ B2 ⇒ Μ( B1 ) ≤ Μ( B2 ) for Μ ∈ ΓT ; A , L T ,C ; A , Λ T ,C ; A , VT ,C ; A , { ∈ {Λ } (P5) A1 ⊆ A2 ⇒ Μ1 ( B) ≥ Μ 2 ( B) for Μ i ∈ ΓT ; A i , L T ,C ; A i , i ∈ {1, 2} , (P6) A1 ⊆ A2 ⇒ Μ1 ( B) ≤ Μ 2 ( B) for Μ i T ,C ; A i } , VT ,C ; Ai , i ∈ {1, 2} , (P7) ΓT ; A ( B) = 1 ⇔ L T ,C ; A ( B) = 1, (P8) Λ T ,C ; A ( B ) = 0 ⇔ VT ,C ; A ( B ) = 0 , (P9) Π T ; A ( B) = 1 ⇒ Λ T ,C ; A ( B) = 1, (P10) Ν T ,C ; A ( B) = 0 ⇒ ΓT ; A ( B) = 0 , (P11) L T ,C ; A ( B) < 1 ⇔ ΓT ; A ( B) < 1, (P12) Λ T ,C ; A ( B ) > 0 ⇔ VT ,C ; A ( B ) > 0 , (P13) ΓT ; A ( B) = L T ,C ; A ( B) = 1 ⇒ Ν T ,C ; A ( B) ≥ γ , (P14) L T ,C ; A ( B) = VT ,C ; A ( B) = 0 ⇒ Π T ; A ( B) ≤ γ where Ω is the universe, A, B, A1 , A2 , B1 , B2 ∈ 2 Ω and γ is the fixed point of the negation. Their justification is immediately. For instance, for (P13) we have ΓT ; A ( B) = L T ,C ; A ( B ) = 1 ⇒ μ A ( ω ) ≤ μ B ( ω ) since μ B ( ω ) ≤ γ ( ∀ω ∈ Ω ; or C( μ B ( ω ) ) ≤ γ we have ) ( ) T μ A ( ω ) , C( μ B ( ω ) ) ≤ T μ B ( ω ) , C( μ B ( ω ) ) ≤ T ( γ , 1) = γ and therefore ⎛ ⎞ Ν T ,C ; A ( B) = C⎜ sup T μ A ( ω ) , C( μ B ( ω ) ) ⎟ ≥ C( γ ) = γ . ⎝ ω ∈Ω ⎠ ( ) For particular cases one can obtain a lot of other properties. For instance, for the measures given in Example 2.2 we have (P15) Ν T ,C ; A ( B) = 1 ⇒ ΓT ; A ( B) = L T ,C ; A ( B) = 1 , 45 (P16) Π T ; A ( B) = 0 ⇒ VT ,C ; A ( B ) = Λ T ,C ; A ( B) = 0 , (P17) ΓT ; A ( B) ≤ 0.5 ⇒ ΓT ; A ( B ) ≤ Ν T ,C ; A ( B) , (P18) Ν T ,C ; A ( B) < 0.5 ⇒ ΓT ; A ( B) < 0.5 , (P19) Π T ; A ( B) > 0.5 ⇒ Λ T ,C ; A > 0.5, (P20) ΓT ; A ( B) > 0 ⇒ Ν T ,C ; A ( B) > 0 , (P21) Λ T ,C ; A ( B ) < 1 ⇒ Π T ; A ( B) < 1, (P22) Π T ; A ( B) < 1 ⇒ L T ,C ; A ( B ) = 0. 2.2 BELIEF AND PLAUSABILITY FUNCTIONS Independently from the development of fuzzy sets and possibility theory, Shafer (1976) has proposed a theory of evidence where he has introduced the mathematical concept of belief function. Most of the measures in this theory can be defined in terms of the basic probability assignment (bpa) m , which satisfies the following conditions: m: 2 Ω → 0, 1 , m(∅ ) = 0, ∑ m( A) = 1. A⊆ Ω where Ω is the frame of discernment (or universe). Subsets of Ω with nonzero basic probabilities are called focal elements. The basic probability assignment m determines the lower probability and the upper probability of a subset A of Ω , called belief function, denoted as Bel ( A) , and plausibility function, denoted as Pls ( A) , respectively. These functions are defined by 46 Definition 2.3. Given the universe Ω , a belief function is an application Bel: 2 Ω → [0, 1] with the properties i) Bel (∅ ) = 0 , ii) Bel (Ω ) = 1 , iii) ∀n ∈ N , ∀Ai ⊆ Ω , i ∈ {1, 2 , L , n} n +1 ⎛n ⎞ n ⎛n ⎞ Bel ⎜ U Ai ⎟ ≥ ∑ Bel ( Ai ) − ∑ Bel (Ai ∩ A j ) + L + (− 1) Bel ⎜ I Ai ⎟ . ⎝ i =1 ⎠ i =1 ⎝ i =1 ⎠ i< j The plausibility function is given by ∀A ⊆ Ω , Pls( A) = 1 − Bel (¬A) . These two quantities are obtained from the bpa as follows (Shafer, 1976): Bel ( A) = ∑ m( B ) B⊆ A Pls ( A) = ∑ m(B ) . A∩ B ≠ ∅ For computing the belief degree in a fuzzy set A due to a nonfuzzy bpa (i. e. all focal elements B j are crisp sets) one can use the formulas (Smets, 1981) Bel ( A) = μ A (x ) ∑ m(B j )× xinf ∈B B j ⊆Ω Pls ( A) = ∑ B j ⊆Ω ( ) j m B j × sup μ A ( x ) . x∈B j The probabilistic constraint of a fuzzy focal element A is expressed by decomposing it into the level sets of A , which form a group of consonant crisp focals. The decomposition of a fuzzy focal element A is a collection of nonfuzzy subsets such that (Dubois & Prade, 1982, 1985a; Yen, 1992): - they are the level-sets of A Aα1 ⊃ Aα 2 ⊃ ...... ⊃ Aα n , α 1 < α 2 < .....α n ; and - their basic probabilities are 47 ( ) m Aαi = (α i − α i −1 ) × m( A) i ∈ {1, 2,....., n}, α 0 = 0, α n = 1. Now, the previous formulas become (Yen, 1992) ( (m(B )∑ ) (α )) Bel ( A) = ∑ B m(B )∑α (α i − α i −1 ) f B ,A (α i ) Pls ( A) = ∑ B i αi (α i − α i −1 )g B ,A i where f B ,A (α ) = inf μ A ( x ) and g B ,A (α ) = sup μ A ( x ) . x∈Bα x∈Bα The following example ( Iancu, 1997c) illustrates how one applies these formulas for computing the generalized belief functions. We want to determine the uncertainty in the assertion "George's result in mathematics is very good", denoted as the fuzzy set A . For this we compute the support pair as [Bel ( A), Pls( A)], using the following focal elements: i) George obtained the marks between 60 and 70 (denoted as the fuzzy set B ) with 0.3 basic probability and ii) George obtained the marks about 90 (denoted as the fuzzy set C ) with 0.7 basic probability, i. e., m(B ) = 0.3, m(C ) = 0.7 ; the frame of discernment is Ω = [0 , 100] . The fuzzy sets A, B and C are characterized below by lists in the form of A( xi ) / xi : A = {0.25 / 40, 0.5 / 50, 0.75 / 60, 0.8 / 70, 0.9 / 80, 1 / 90, 1 / 100} , B = {0.25 / 30, 0.5 / 40, 0.75 / 50, 1 / 60, 1 / 70, 0.75 / 80, 0.5 / 90, 0.25 / 100} C = {0.25 / 50, 0.5 / 60, 0.8 / 70, 1 / 80, 1 / 90, 1 / 100}. We decompose the fuzzy focal B into the following non-fuzzy focal elements: B0.25 = {30, 40, 50, 60, 70, 80, 90, 100} with mass 0.25 × m(B ) B0.5 = {40 , 50 , 60 , 70 , 80 , 90} 48 with mass 0.25 × m(B ) B0.75 = {50, 60, 70, 80} with mass 0.25 × m(B ) B1 = {60 , 70} with mass 0.25 × m(B ) and the fuzzy focal C into the following non-fuzzy focal elements: C0.25 = {50, 60, 70, 80, 90, 100} with mass 0.25 × m(C ) C 0.5 = {60, 70, 80, 90, 100} with mass 0.25 × m(C ) C0.8 = {70, 80, 90, 100} with mass 0.3 × m(C ) C1 = {80, 90, 100} with mass 0.2 × m(C ) . Then, m( B) ∑α ( α i − α i −1 ) f B , A ( α i ) = i [ ] m( B ) × 0.25 × f B , A ( 0.25) + f B , A ( 0.5) + f B , A ( 0.75) + f B , A ( 1) = 0.3 × 0.25 × [0 + 0.25 + 0.5 + 0.75] = 0.1125 and m( C ) ∑α ( α i − α i −1 ) f C , A ( α i ) = i [ ] m( C ) × 0.25 × f C , A ( 0.25) + 0.25 × f C , A ( 0.5) + 0.3 × f C , A ( 0.8) + 0.2 × f C , A ( 1) = 0.7 × [0.25 × 0.5 + 0.25 × 0.75 + 0.3 × 0.8 + 0.2 × 0.9] = 0.51275 therefore, Similarly, Bel ( A) = 0.1125 + 0.51275 = 0.62525 Pls ( A) = 0.9775. The basic operator to update information in Probability Theory is conditioning. In the Theory of evidence there is a generalization of this concept, given by Dempster (1967). Another formulas have been proposed by various authors: - Dempster’s rule Pls D (B A) = Pls ( A ∩ B ) , Bel D (B A) = 1 − Pls D (¬B A) Pls( A) 49 - geometrical rule (Suppes & Zanotti, 1977) Bel ( A ∩ B ) , Pls g (B A) = 1 − Bel g (¬B A) Bel g (B A) = Bel ( A) - Shafer’s strong conditioning rule ( Shafer, 1976; Dubois & Prade, 1986a) Bel ( A ∩ B ) Bel ( A − B ) , Pls S (B A) = 1 − Bel S (B A) = Bel ( A) Bel ( A) - Planchet’s weak conditioning rule (Planchet, 1989) Bel P (B A) = Pls P (B A) = Bel (B ) − Bel (B − A) , Pls( A) Pls( A) + Pls (B ) − Pls ( A ∪ B ) Pls( A) - De Campos, Lamata & Moral (1990) and Fagin, Halpern (1989) rule Pls( A ∩ B ) P * (B A) = Pls ( A ∩ B ) + Bel ( A ∩ ¬B ) P* (B A) = Bel ( A ∩ B ) Bel ( A ∩ B ) + Pls ( A ∩ ¬B ) which is also called upper and lower probabilities. 2.3 DEMPSTER’S RULE OF COMBINATION We consider two independent evidential sources with the same frame of discernment, Ω . Let Bel1 and Bel2 be the belief functions determined by the basic probability assignment m1 and respectively m2 and having the focal elements 50 A1 , L , As and respectively B1 , L , Bt . According to Dempster’s rule (Dempster, 1967), a new belief function Bel = Bel1 ⊕ Bel 2 can be obtained m12 (∅ ) = 0 ∀A ∈ 2 Ω , A ≠ ∅ , ∑ (m1 I m2 )( A) = i , j / Ai ∩B j = A m1,2 ( A) = (m1 ⊕ m2 )( A) = 1 − (m1 I m2 )(∅ ) where k = ∑ i , j / Ai ∩ B j = ∅ ( ) m1 ( Ai ) × m2 B j m1 ( Ai ) × m2 (B j ) 1− k is the conflict of degree between the two sources. The effect of the normalizing factor 1 − k consists in eliminating the conflicting pieces of information between the two sources to combine. When k = 1 the combined bpa m12 does not exist and the bodies of evidences are said to be in full contradiction. Several interesting and valuable alternative rules have been proposed in literature to circumvent the limitations of Dempster’s rule of combination. Because, generaly, there is a dependence between the sources , Garibba and Servida (1988) gave a new aggregation method, extending the Dempster’s rule . The intersection from the last formula is extended to (Garibba & Servida, 1988) (m1 ∗ m2 )( A) = i , j / A∨∗B = A(m1 ( Ai ) ∧ m2 (B j )) i j where ∗ represents a set of operators, such as I, U, ⊂ and ∨ and ∧ are generalized union and intersection operators. Usually, the the operators ∗ ≡ I , ∨ ≡ max , ∧ ≡ min are used. In this case the last formula becomes (m1 I m2 )( A) = max i,j/Ai ∩ B j = A min(m1 ( Ai ), m2 (B j )) and its normalized form is 51 (m1 I m2 )( A) . ∑ (m1 I m2 )(B ) m12 ( A) = B∈2Ω , B ≠ ∅ Working with n consonante bpa, the previous rule is m1,L,n ( A) = max i1,L,in /Ai1 ∩L∩ Ain = A ∑ ( ( ) ( )) . min(m (A ) ,L , m (A )) min m1 Ai1 ,L , mn Ain max i ,L,in /Ai1 ∩L∩ Ain = B B∈2Ω 1 1 i1 n in Dubois and Prade (1896b ; 1988) and Smets (1993) proposed the formula m DP1 (∅ ) = 0 ∀A ∈ 2 Ω , A ≠ ∅ , mDP1 ( A) = ∑ i , j / Ai ∪ B j = A m1 ( Ai ) × m2 (B j ) which reflects the disjunctive consensus and is usually preffered when one knows that one of the source S 1 or S 2 is mistaken but without knowing which one among S 1 and S 2 . Murphy’s rule of combination (Murphy, 2000 ; Yager, 1985 ; Dubois & Prade, 1988) consists in average of belief functions associated with m1 and m2 : BelM ( A) = Bel1 ( A) + Bel2 ( A) , ∀A ∈ 2 Ω . 2 Smets’s rule of combination (Smets & Kennes, 1994) of two independent sources of evidence eliminates the division by 1 − k involved in Dempster’s rule : m S (∅ ) = k = ∑ m ( Ai ) × m2 (B j ) 1 i , j / Ai ∩ B j =∅ ∀A ∈ 2 Ω , A ≠ ∅ , mS ( A) = ∑ i , j / Ai ∩ B j = A m1 ( Ai ) × m2 (B j ) . Yager’s rule of combination (Yager, 1983b ; 1985 ; 1987) admits that in case of conflict the result is not reliable, so that k plays the role of 52 an absolute discounting term added to the weight of ignorance. This rule is given by mY (∅ ) = 0 ∀A ∈ 2 Ω , A ∉ {∅ ,Ω}, mY ( A) = mY (Ω ) = m1 (Ω )m2 (Ω ) + ∑ i , j / Ai ∩ B j = A ∑ i , j / Ai ∩ B j =∅ m1 ( Ai ) × m2 (B j ) m1 ( Ai ) × m2 (B j ) Dubois and Prade (1988) defined a new rule of combination, according to the following principle : if one observe a value in a set X while the other observes this value in a set Y then the truth lies in X ∩ Y as long X ∩ Y ≠ ∅ . This rule is defined by m DP 2 (∅ ) = 0 ∀A ∈ 2 Ω , A ≠ ∅ , m DP1 ( A) = ∑ m ( Ai ) × m2 (B j ) + 1 i , j / Ai ∪ B j = A , Ai ∩ B j =∅ ∑ m ( Ai ) × m2 (B j ) 1 i , j / Ai ∩ B j = A , Ai ∩ B j ≠ ∅ . Lefevre, Colot and Vanoorenberghe (2000) presented an unified framework to embed all the existing combination rules involving conjunctive consensus in the same general mechanism of construction : Step1. Computation of the total conflicting mass based on conjunctive consensus k= ∑ m ( A )× m (B ) 1 i , j / Ai ∩ B j =∅ i j 2 Step 2. Rellocation of the conflicting masses on A ⊂ Ω, A ≠ ∅ with some given coefficients wm ( A) ∈ [0, 1] such that ∑w A⊆ Ω m ( A) = 1 according to m(∅ ) = wm (∅ ) ⋅ k ∀A ∈ 2 Ω , A ≠ ∅, m( A) = ∑ m ( A )× m (B ) + w 1 i , j / Ai ∩ B j = A i 2 j m ( A) ⋅ k 53 The particular choice of the set of coeficients wm (⋅) provides a paticular rule of combination. This rule provides all existing rules involving conjunctive consensus developed in the literature based on Shafer’s model. For instance (Lefevre, Colot & Vanoorenberghe, 2000) • Dempster’s rule can be obtained by choosing wm (∅) = 0 and wm ( A) = • 1 ∑ m1 ( Ai )m2 (B j ) 1 − k i , j / Ai ∩ B j = A Yager’s rule is obtained by choosing wm (Ω) = 1 and wm ( A ≠ Ω) = 0 • Smets’ rule is obtained by choosing wm (∅) = 1 and wm ( A ≠ ∅) = 0 • the second Dubois and Prade’s rule is obtained by choosing 1 ∀A ⊆ 2 Ω , wm ( A) = ∑ m1 Ai m2 (B j ) 1 − k i , j / Ai ∪ B j = A, ( ) Ai ∩ B j =∅ Various examples of using these rules of combination can be found in (Smarandache & Dezert, 2004). 2.4 APPROXIMATIONS OF BASIC ASSIGNMENT PROBABILITY Given a frame of discernment of size Ω = N , a bpa m can have up to 2 N focal elements all of which have to be represented explicitly to capture the complete information encoded in m . It results that the combination of two bpa’s requires computation of up to 2 N +1 54 intersections. Orponen (Orponen, 1990) showed that the combination of various pieces of evidence using Dempster's rule has a #P complexity. Reducing the number of focal elements of the bpa's under consideration while retaining the essence of the information is an important problem for DempsterShafer theory. The most important algorithms known in the literature in order to solve this problem are the following. The Bayesian approximation This approximation (Voorbraak, 1989) reduces a given bpa m to a probability distribution m ⎧ ∑ m(S ) ⎪ S / A⊆ S if A = 1 ⎪ m B ( A) = ⎨ ∑ C m(C ) ⎪ C / C ⊆Ω ⎪⎩0 otherwise Example 2.3. Let m be a bpa over the frame of discernment Ω = {a,b, c, d, e} with the values ⎧0.33 ⎪0.3 ⎪⎪ m( A) = ⎨0.27 ⎪0.06 ⎪ ⎪⎩0.04 if A = {a ,b} if A = {a ,b , c} if A = {b , c , d } if A = {c , d } if A = {d , e} Applying the Bayesian approximation to m yields the following result ⎧0.245 ⎪0.35 ⎪⎪ mB ( A) ≅ ⎨0.245 ⎪0.143 ⎪ ⎪⎩0.015 if A = {a} if A = {b} if A = {c} if A = {d } if A = {e} 55 This example shows that the Bayesian approximation is not reasonable in the cases when the number of focal elements of the input bpa is ≤ Ω . The k-l-x method The basic idea of this approximation (Tessem, 1993) is to incorporate into the approximation mklx only at least k and at most l focal elements with the highest values in the original bpa and having the sum of the m -values at least 1 − x , where x ∈ [0,1) . Finally, the values from the approximation are normalized in order to guarantee the basic properties of bpa. Example 2.4. For the bpa m given in the previous example and the values k = 2 , l = 3 and x = 0.1 the following result is obtaining ⎧ 11 ⎪ 30 ≅ 0.366 if A = {a ,b} ⎪ ⎪1 mklx ( A) = ⎨ ≅ 0.333 if A = {a ,b ,c} ⎪3 ⎪3 ⎪10 = 0.3 if A = {b ,c ,d } ⎩ Summarization This method (Lowrance, Garvey and Strat, 1986) works likewise as klx . Let k be the number of focal elements to be contained in the approximation m S of a given bpa m . M denotes the set of the k − 1 subsets of Ω with the highest value in m . Then m S is given by 56 ⎧ if A ∈ M ⎪m ( A ) ⎪ m S ( A) = ⎨ ∑ m( A' ) if A = A0 ⎪ A' ⊆ A ,A'∉M ⎪⎩0 otherwise where A0 is defined as A0 = U A' A'∉M ,m ( A' )> 0 . Example 2.5. For the bpa m from the Example 2.3 and k = 3 , mS has the following values ⎧0.33 if A = {a ,b} ⎪ m S ( A) = ⎨0.3 if A = {a ,b , c} ⎪0.37 if A = {b , c , d , e} ⎩ The D1 approximation Let m be a bpa to be approximated and k the desired number of focal elements of the approximated bpa m D . The following notations are usefully: a) M + is the set of k − 1 focal elements of m with the highest values { } M + = A1 ,L , Ak −1 ⊆ Ω / ∀A ∉ M + : m( Ai ) ≥ m( A), i = 1, 2,L , k − 1 b) M − is the set containing all other focal elements of m : M − = {A ⊆ Ω / m( A) > 0, A ∉ M + }. Given a focal element A ∈ M − of m the collection M A of supersets of A is computed; if M A is empty (i.e. M + contains no supersets of A ) then the set M 'A is computed: 57 { } M 'A = B ∈ M + / B ≥ A , B ∩ A ≠ ∅ , where A represents the cardinal of the set A . The ideas of the D1 algorithm are (Bauer, 1996): i) all the members of M + are kept as focal elements of m D ; ii) for every A ∈ M − , the value m( A) is dispensed uniformly among the members of M A with the smallest cardinality; iii) if M A is empty then the value m( A) is shared among the smallest members of M 'A and the value to be assigned to a focal element depends on the size of its intersection with A . iv) the procedure of distribution masses is invoked recursively until all m( A) are assigned to the members of M + or the set M 'A becomes empty. In this case, the remaining value is assigned to Ω which thus becomes a focal element of m D . The approximation m D of a bpa with n focal elements can be computed in time O(k (n − k )) . Example 2.6. For the bpa m from the Example 2.3 and k = 3 , the algorithm D1 yields the following values ⎧0.33 if A = {a ,b} ⎪ m D ( A) = ⎨0.51 if A = {a ,b , c} ⎪0.16 if A = {a ,b , c , d , e} ⎩ A mixed algorithm An analysis of the approximation of the original bpa is made in (Bauer, 1996) and the conclusion is: the "best" approximation algorithm 58 with respect to decision making does not exist. However, the k − l − x , D1 and Bayesian approximations yield definitely better results than the summarization does. A new algorithm obtained as a combination between the k − l − x , summarization and D1 algorithms is proposed in (Iancu, 2008c). Let m be the bpa to be approximated; this combination is constructed in three steps, to obtain a new approximation m M : S1) Given the parameters k ,l , x , having the same signification as in the k − l − x method, we keep at least k and at most l focal elements, from the original bpa, with the sum of m -values at least 1 − x ; let M be the set of these focal elements. S2) The set M is considered as the set M + from the D1 algorithm. The focal elements not included in the set M at the step S1 play the role of M − set in the D1 algorithm. S3) The components of all focal elements A ∈ M − not distributed among the members of M A or M 'A are included in a new focal element; the m M value of this set is computed as sum of m -values of its components. This idea is used by the summarization method to construct the set A0 . Example 2.7. For the bpa m from the Example 2.3 and k = 3, l = 2 and x = 0.1 , the algorithm M yields the following values: Step S1): Removing {c , d } and {d , e} from m , the constraints concerning the number of focal elements and the numerical mass deleted are satisfied. Thus, the following approximation is obtained ⎧0.33 if A = {a ,b} ⎪ m M 1 ( A) = ⎨0.3 if A = {a ,b , c} ⎪0.27 if A = {b , c , d } ⎩ Steps S2 and S3: The Step S2 is applied with the following sets parameters: 59 M + = {A1 , A2 , A3 } , M − = {A4 , A5 } m M 2 ( A1 ) = m M 2 ({a ,b}) = m M 1 ({a ,b}) = 0.33 m M 2 ( A2 ) = m M 2 ({a ,b , c}) = m M 1 ({a ,b , c}) = 0.3 m M 2 ( A3 ) = m M 2 ({b ,c , d }) = m M 1 ({b , c , d }) = 0.27 m M 2 ( A4 ) = m M 2 ({c , d }) = m({c , d }) = 0.06 m M 2 ( A5 ) = m M 2 ({d ,e}) = m({d , e}) = 0.04 The set A3 ∈ M + is the unique superset of A4 ∈ M − such that the value of A3 is increased by 0.06 . Furthermore A3 covers half of the elements of A5 which adds up another 0.4 / 2 = 0.02 to m M value of A3 . The rest is assigned to {e}, the set constructed in the Step S3. The approximation m M of the original bpa m is: ⎧0.33 ⎪0.3 ⎪ m M ( A) = ⎨ ⎪0.35 ⎪⎩0.02 if if if if A = {a ,b} A = {a ,b ,c} A = {b , c , d } A = {e} An analysis of the error measure associated to an approximation algorithm can be made using the pignistic probability P induced by a bpa that can be considered the standard function for decision making in Dempster-Shafer Theory (Smets, 1988). It is given by P ({x}) = m( A ) . A A / x∈ A⊆ Ω ∑ The error quantifies the maximal deviation in the pignistic probability induced by an approximated bpa. Let P0 be the pignistic probability induced by the original version of a bpa m and Pm' the one induced by the approximation m' . Then the error measure is defined as 60 Error (m' ) = ∑ P0 ( A) − Pm' ( A) . A⊆ Ω For the approximations from the previous examples, we obtain: Error (m S ) = 0.3225 , Error (mklx ) = 0.19 , Error (m D ) = 0.466 , Error (m M ) = 0.186 . One can observe that the best result is given by our approximation m M . We notice that, in all experiments the mixed algorithm gave better results than the D1 algorithm from which it is derived. If we increase the number of focal elements of the approximation algorithms the error decreases, because a greater number of focal elements from the original version of the bpa and the approximated bpa coincide. The mixed algorithm can be implemented as follows input: m , k ,l , x ; output: m M P1) S := focal elements of bpa m , sorted in decreasing order w. r. t. m -values P2) keep the focal elements of m that satisfied the condition (nf < l ) and ( (nf < k ) or (tmass < 1 − x ) ) where nf and tmass represent the number and total mass of these focal sets P3) M + := the sets given by P2 M − := S − M + m M ( A) := m( A) ∀A ∈ M + R := ∅ m M (R ) := 0 for all A ∈ M − do 61 M A := {B ∈ M + / A ⊂ B} if M A ≠ ∅ then M 'A := {B ∈ M A / B is min imal in M A } for all B ∈ M 'A do m M (B ) := m M (B ) + m( A) M 'A end do else N A := {B ∈ M + / B ≥ A , A ∩ B ≠ ∅} if N A = ∅ then R := R ∪ A m M (R ) := m M (R ) + m( A) else N 'A := {B ∈ N A / B is min imal in N A } = {B1 ,L , Bn } for all a ∈ A do let n1 the number of the sets Bi ∈ N 'A : a ∈ A ∩ Bi m M (Bi ) := m M (Bi ) + m( A ) A ⋅ n1 end do n ⎛ ⎞ for all b ∈ A with b ∉ ⎜ A ∩ U Bi ⎟ i =1 ⎝ ⎠ do R := R ∪ {b} m M (R ) := m M (R ) + 62 m( A) A end do end if end if enddo 2.5 ALGORITHMS FOR GENERALIZED BELIEF FUNCTIONS A major difficulty in applying the generalized Dempster-Shafer theory to intelligent systems lies in computing the functions f A ,B and g A ,B ’ where A denotes a continuous fuzzy focal element and B denotes a continuous fuzzy subset of the frame of discernment. The functions f A ,B and g A ,B have the following properties: 1) f A ,B (α ) and g A ,B (α ) are defined on the interval [0, 1] 2) 0 ≤ f A ,B (α ) ≤ h(B ) , 0 ≤ g A ,B (α ) ≤ h(B ) where h(B ) = sup μ B ( x ) is the height of B B x 3) f A ,B is monotonically nondecreasing 4) g A ,B is monotonically nonincreasing. Definition 2.3. A take-off-point α t of a function f A, B is the value where the function becomes positive: f A ,B (α ) = 0 , f A ,B (α ) > 0 , 0 ≤ α ≤ αt α > αt . 63 Definition 2.4. A saturation point α s of a function f A ,B is the value Definition 2.5. A falloff point α f of a function g A ,B is the value where the function reaches B ’s height: f A ,B (α ) = sup μ B ( x ), α s ≤ α ≤ 1 x f A ,B (α ) < sup μ B ( x ), α < α s . x where the function becomes lower than B ’s height: g A ,B (α ) = sup μ B ( x ), 0 ≤ α ≤ α f x g A ,B (α ) < sup μ B ( x ), α > α f . x Definition 2.6. A reset point α r of a function g A ,B is the value where the function reaches zero: g A ,B (α ) > 0, α < α r g A ,B (α ) = 0, α r ≤ α ≤ 1 . The following theorems facilitie the computation of f A ,B and g A ,B . Theorem 2.3. (Yen, 1992) Suppose A is a continuous convex fuzzy subset. Then all its level sets Aα are intervals for 0 < α ≤ 1. Theorem 2.4. (Yen, 1992) A convex fuzzy subset B of the frame of discernment X has the smallest membership value for an interval I = [a , b] ⊂ X at one of I ’s endpoints, that is inf μ B ( x ) = μ B (c ) where x∈[a,b ] c ∈ {a , b} . 64 Theorem 2.5. (Yen, 1992) Suppose B is a continuous convex fuzzy subset of the frame X , and I is an interval of X . The highest membership value of B within the interval I is either B ’s height or the membership value of the I ’s endpoints, that is if I ∩ Bh ( B ) ≠ ∅ ⎧h ( B ) sup μ B ( x ) = ⎨ x∈I ⎩μ B ( xe ) otherwise where x e is an endpoint of I . From the previous theorems we get the following formulas f A ,B (α ) = min{μ B ( x L ), μ B ( x R )} if Aα ∩ Bh ( B ) ≠ ∅ ⎧h ( B ) g A ,B (α ) = ⎨ ⎩max{μ B ( x L ), μ B ( x R )} otherwise where Aα = [x L , x R ] . In order to reduce the effort in computing f A ,B and g A ,B , a subclass of convex fuzzy sets will be used. Definition 2.7. A continuous fuzzy subset A of the frame of discernment X is a strong convex fuzzy set if it satisfies the following conditions: 1) A is convex 2) ∀x1 , x 2 ∈ X , μ A (x1 ) ∉ ⎧⎨0 , sup μ A ( x )⎫⎬ , ⎩ x∈A ⎭ μ A ( x 2 ) ∉ ⎧⎨0 , sup μ A ( x )⎫⎬ ⎩ x∈A ⎭ and ∀λ ∈ (0 , 1) it results μ A ( x ) > min{μ A ( x1 ), μ A ( x 2 )} . Theorem 2.6. (Yen, 1992) A continuous fuzzy set A is strongly convex if its membership function is in the form of 65 ⎧0 , ⎪ LF ( x ), ⎪⎪ A μ A ( x ) = ⎨h , ⎪ RF ( x ), ⎪ A ⎪⎩0 , x≤a a ≤ x ≤ b; a ≠ b b≤ x≤c c ≤ x ≤ d; c ≠ d x≥d where LFA ( x ) is a continuous monotonically increasing function, h is the height of A and RFA ( x ) is a continuous monotonically decreasing function. Computing the function f A ,B (Yen, 1992) We consider that A and B are continuous strong fuzzy sets characterized by a A , b A , c A , d A , LFA , RFA and a B , bB , c B , d B , LFB , RFB , respectively. General algorithm { ( ) ( )} 1. Compute f A ,B (0 ) = min μ B LFA−1 (0) , μ B RFA−1 (0) If f A ,B (0) ≠ 0 then go to step 3 else go to next step . 2. Find the take-off point xt . First compute the intersection of A ’s support and B ’s support: (x1 , x2 ) = support ( A) ∩ support (B ) . Then the take-off point is α t = max{μ A ( x1 ), μ A ( x 2 )}. Thus, we have f A ,B (α ) = 0 for 0 ≤ α ≤ α t . If α t = 1 then stop else β ← α t and continue with next step. 66 3. If LFA−1 (β ) < bB and RFA−1 (β ) > c B then find the smallest crossing point α c by solving the equation ( ) ( ) LFB LFA−1 (α c ) = RFB RFA−1 (α c ) If α c ∈ [β , 1] then go to step 5 else continue with next step. 4. Compute the saturation point α s . If α s ∈ [β , 1] and f A ,B (α s ) = 1 then f A ,B (α ) = 1 for α ∈ [α s , 1] αc ← αs else α c ← 1 . 5. Define the function in the interval [β , α c ] : If μ B (LFA−1 (β )) < μ B (RFA−1 (β )) ( ) then f A ,B (α ) = μ B LFA−1 (α ) for α ∈ [β , α c ] ( ) ( ) else if μ B LFA−1 (β ) > μ B RFA−1 (β ) ( ) then f A ,B (α ) = μ B RFA−1 (α ) for α ∈ [β ,α c ] else if ( ) dμ B LFA−1 (α ) dα < α =β ( ) dμ B RFA−1 (α ) dα α =β ( ) (RF (α )) for α ∈ [β ,α ] . then f A ,B (α ) = μ B LFA−1 (α ) for α ∈ [β ,α c ] else f A ,B (α ) = μ B −1 A c 6. If α c = 1 then stop else β ← α c and goto step 3. 67 Algorithm for linear strong convex fuzzy sets A linear strong convex fuzzy set is characterized by an L function defined below, with the four parameters a , b , c and d . ,x≤a ⎧0 ⎪x − a ⎪ ,a ≤ x ≤ b ⎪b − a ⎪ L[a , b , c , d ] ( x ) = ⎨1 ,b ≤ x ≤ c ⎪d − x ⎪ ,c ≤ x ≤ d ⎪d − c ⎪⎩0 ,x≥d If this case the algorithm is simplified because there is at most one crossing point. 1. Compute f A ,B (0) . If f A ,B (0) ≠ 0 then α t ← 0 , go to step 3 else continue. 2. Find the take-off point α t . Define f A ,B (α ) = 0 for α ∈ [0, α t ] . 3. Find the crossing point α c , solving equation ( ) ( ) LFB LFA−1 (α c ) = RFB RFA−1 (α c ) . If α c is found then define f A ,B (α ) for α ∈ [α t ,α c ] similar to step 5 of the general algorithm else α c ← α t . 4. Find the saturation point α s . If α s is found then define f A ,B (α ) for α ∈ [α c , α s ] similar to step 5 of the general algorithm 68 define f A ,B (α ) = 1 for α ∈ [α s , 1] else define f A ,B (α ) for α ∈ [α c , 1] similar to step 5 of the general algorithm. Example 2.8. Suppose A and B are two linear strong convex continuous fuzzy subsets of a frame of discernment [− 100, 100] whose membership functions are characterized below ,x≤0 ⎧0 ⎪x ⎪ ,0 ≤ x ≤ 5 ⎪5 ⎪ , 5 ≤ x ≤ 10 μ A ( x ) = L[0 , 5 , 10 , 15] ( x ) = ⎨1 ⎪15 − x ⎪ , 10 ≤ x ≤ 15 ⎪ 5 ⎪⎩0 , x ≥ 15 ⎧0 ⎪ x −1 ⎪ ⎪ 2 ⎪ μ B ( x ) = L[1, 3 , 11, 16 ] ( x ) = ⎨1 ⎪16 − x ⎪ ⎪ 5 ⎪⎩0 , x ≤1 ,1 ≤ x ≤ 3 , 3 ≤ x ≤ 11 , 11 ≤ x ≤ 16 , x ≥ 16 Assume that A is a focal element and we compute the contribution of A to the degree of belief in B . We first compute the following functions ( ) LFB LFA−1 (α ) = LFB (5α ) = ( ) 5α − 1 2 RFB RFA−1 (α ) = RFB (15 − 5α ) = 1 + 5α . 5 69 Step 1. Compute f A ,B (0) : Since support ( A) = [0 , 15] we have f A ,B (0 ) = min{μ B (0), μ B (15)} = μ B (0) = 0 . Step 2. Compute the take-off point α t : and define (x1 , x2 ) = support ( A) ∩ support (B ) = (1, 15) α t = max{μ A (1), μ A (15)} = μ A (1) = 0.2 f A , B (α ) = 0 for 0 ≤ α ≤ 0.2 . Step 3. Solving ( ) LFB LFA−1 (α c ) = 5α c − 1 1 + 5α c = = RFB RFA−1 (α c ) 2 5 we get the crossing point α c = ( 7 and 15 ( ) f A ,B (α ) = μ B LFA−1 (α ) = 7 5α − 1 for 0.2 ≤ α ≤ . 2 15 Step 4. Solving the equation ( ) f A ,B (α s ) = μ B RFA−1 (α s ) = we get the saturation point α s = ( 4 and, thus, we have 5 ) f A ,B (α ) = μ B RFA−1 (α ) = 7 4 1 + 5α ≤α ≤ , for 5 15 5 f A ,B (α ) = 1 for Summarizing, we get 70 1 + 5α s =1 5 4 ≤ α ≤ 1. 5 ) , 0 ≤ α ≤ 0.2 ⎧0 ⎪ 5α − 1 ⎪ ⎪ 2 ⎪ f A ,B (α ) = ⎨1 + 5α ⎪ 5 ⎪ ⎪1 ⎪⎩ 7 15 7 4 , ≤α ≤ 15 5 4 , ≤α ≤1 5 , 0.2 ≤ α ≤ Computing g A,B for strong convex fuzzy sets (Yen, 1992) Similar to the strategy for computing f A,B , the algorithm for computing the function g A,B involves finding a few critical points: falloff points and reset points. Unlike f A ,B , g A ,B has no crossing points . Theorem 2.7. (Yen, 1992) Suppose A and B are continuous strong convex fuzzy sets. Then g A,B (α ) < h(B ) if and only if [LF −1 A (α ), RFA−1 (α )]∩ [LFB−1 (h(B )), RFB−1 (h(B ))] = ∅ . Theorem 2.8. (Yen, 1992) Suppose A and B are continuous strong convex fuzzy sets. If ( ) g A ,B (α 1 ) = μ B LFA−1 (α 1 ) < h(B ) then ( ) g A ,B (α ) = μ B LFA−1 (α ) Similarly, if ( for α1 ≤ α ≤ 1 . ) g A ,B (α 1 ) = μ B RFA−1 (α 1 ) < h(B ) then ( ) g A ,B (α ) = μ B RFA−1 (α ) for α 1 ≤ α ≤ 1 . 71 Using the previous two theorems, the function g A ,B is computed with the following algorithm: 1. Compute g A ,B (0) : ( ) If Bh ( B ) ∩ LFA−1 (0 ), RFA−1 (0 ) ≠ ∅ then g A,B (0 ) = h(B ) and continue with next step { ( ( ))} else g A ,B (0 ) = max μ B LFA−1 (0 ), μ B RFA−1 (0 ) α f ← 0 and goto step 3. 2. Find the falloff point α f . [ ] If LFA−1 (1) , RFA−1 (1) ∩ [bB , c B ] ≠ ∅ then g A ,B (α ) = h(B ) for 0 ≤ α ≤ 1 , stop else α f = max{μ A (bB ), μ A (c B )} g A ,B (α ) = h(B ) for 0 ≤ α ≤ α f . 3. Compute the reset point α r . [ ] If α r ∈ α f , 1 and g A,B (α r ) = 0 then g A ,B (α ) = 0 for α r ≤ α ≤ 1 else α r ← 1 . [ ] 4. Define the function in the interval α f , α r : If μ B (LFA−1 (α f )) < μ (RF (α )) (α ) = μ (RF (α )) then g A ,B −1 A B B ( f −1 A ) else g A ,B (α ) = μ B LFA−1 (α ) for α f ≤ α ≤ α r for α f ≤ α ≤ α r . Example 2.9. Suppose A and B are two continuous fuzzy subsets of a frame of discernment [− 100, 100] whose membership functions are 72 ⎧0 ⎪x −8 ⎪ ⎪ 10 ⎪ μ A ( x ) = L[8 , 18 , 19 , 21] ( x ) = ⎨1 ⎪ 21 − x ⎪ ⎪ 2 ⎪⎩0 ⎧0 ⎪ x −1 ⎪ ⎪ 2 ⎪ μ B ( x ) = L[1, 3, 11, 16 ] ( x ) = ⎨1 ⎪16 − x ⎪ ⎪ 5 ⎪⎩0 ,x≤8 , 8 ≤ x ≤ 18 , 18 ≤ x ≤ 19 , 19 ≤ x ≤ 21 , x ≥ 21 , x ≤1 ,1 ≤ x ≤ 3 , 3 ≤ x ≤ 11 , 11 ≤ x ≤ 16 , x ≥ 16 Assume that A is a focal element. The function g A,B is computed with the following algorithm. Step 1. Compute g A,B (0) : support ( A) ∩ B1 = [8, 21] ∩ [3, 11] = [8, 11] ≠ ∅ Thus g A,B (0) = 1. Step 2. Compute the falloff point α f : [LF Bh(B ) = [3,11] , (1), RFA−1 (1)] = [18, 19] , [3, 11] ∩ [18, 19] = ∅ , μ A (LFB−1 (1)) = μ A (3) = 0 , μ A (RFB−1 (1)) = μ A (11) = 0.3 , α f = max{0, 0.3} = 0.3 . −1 A 73 It results g A ,B (α ) = 1 for 0 ≤ α ≤ 0.3 . Step 3. Solving the equation ( ) g A ,B (α r ) = μ B LFA−1 (α r ) = 8 − 10α r =0 5 we get α r = 0.8 and, thus, g A ,B (α ) = 0 for 0.8 ≤ α ≤ 1 . [ ] Step 4. Define g A,B for the interval α f , α r : Because μ B (LFA−1 (α f )) = 1 > μ B (RFA−1 (α f )) = 0 we have ( ) g A ,B (α ) = μ B LFA−1 (α ) = μ B (10α + 8) = 8 − 10α for 0.3 ≤ α ≤ 0.8 . 5 Summarizing, we have ⎧1 ⎪ 8 − 10α ⎪ g A ,B (α ) = ⎨ ⎪ 5 ⎪⎩0 74 , 0 ≤ α ≤ 0.3 , 0.3 ≤ α ≤ 0.8 , 0.8 ≤ α ≤ 1. 3 UNCERTAIN AND IMPRECISE KNOWLEDGE REPRESENTATION 3.1 LINGUISTIC VARIABLES This concept was introduced by Zadeh (1975a, 1975b, 1975c) to provide a means of approximate characterization of phenomena that are too complex or too ill-defined to be amenable to description in conventional quantitative terms. Just as numerical variables take numerical values, in fuzzy logic, linguistic variables take on linguistic values which are words (linguistic terms) with associated degrees of membership in the set. Thus, instead of a variable height assuming a numerical value of 1.75 meters, it is treated as a linguistic variable that may assume, for example, linguistic values of tall with a degree of membership of 0.92, "very short" with a degree of 0.06, or "very tall" with a degree of 0.7. Definition 3.1. A linguistic variable V is characterized by: its name x , an universe U , a term set T (x ) , a syntactic rule G for generating names of values of x , and a semantic rule M for associating meanings with values. Example 3.1. For example, if speed of a car is interpreted as a linguistic variable, then its term set could be T ( x ) = {slow, moderate, fast , very slow, more or less fast } where each term is characterized by a fuzzy set in an universe of discourse U = [0, 100]. We might interpret : slow as “a speed below about 40 mph”, moderate as “speed close to 55 mph”, fast as “a speed above about 70 mph”. The meaning of a term or, more generally, of a fuzzy set can be modified by a linguistic hedge or a modifier. A modifier is a word such as very, more or less, slightly, etc. If T is a linguistic term and m is a modifier then its membership function is defined as (m T )( x ) = M (T ( x )) , where M is a transformation associated to the modifier m . According to the modification of the support of fuzzy set associated with the linguistic term, there are two classes of modifiers : intensive which restrict the support and extensive which dilate the support. Exemples of intensive modifiers are : (litle A)(x) = (μ A (x))1.3 , (slightly A)(x) = (μ A (x))1.7 , (very A)(x) = (μ A (x))2 , (extremely A)(x) = (μ A (x))3 , (very very A)(x) = (μ A (x))4 μ μ very Figure 3.1 : The modifier ”very” Exemples of extensive modifiers are : (more or less A)(x) = (μ A (x))1/ 2 , ⎧2(μ ( x))2 , if 0 ≤ μ A ( x) ≤ 0.5 (indeed A)(x) = ⎪⎨ A ⎪⎩1 − 2(1 − μ A (x))2 , if 0.5 < μ A ( x) ≤ 1 76 μ μ more or less Figure 3.2 : The modifier “more or less” We will present some modifiers of trapezoidal distributions. Depending on the transformation M , one can obtain various possibility to modify a primary term T. For instance, if M = max(0, min (1, α × ϕ int ensive ( x ) + β )) , ⎧ψ 1 ( x ) ⎪ ϕ int ensiv ( x ) = ⎨1 ⎪Ψ ( x ) ⎩ 2 for x ≤ A for A ≤ x ≤ B for x ≥ B where Ψ1 is a non-decreasing function and Ψ2 is a non-increasing function, one obtain various classes of intensive operators (Desprès, 1986): i1) for α > 1 and α + β = 1 one obtain the class of " λ -precise" operators defined as (λ − precise T)(x) = max(0, min (1, λ × ϕ intensiv(x) +1− λ)) which restrict the support of T ; μ μ λ − precise Figure 3.3 : The modifier " λ -precise" 77 i2) the operators obtained for β = 0 restrict the core; they are named " μ very" and are defined by (μ − very T)(x) = max (0, min (1 , μ ×ϕ intensive(x))) . modifier " μ -very" Figure 3.4 : The i3) for α = 1 and 0 < β ≤ 1 one obtain the class "ν -very exact"; these operators restrict the support and the core both and are defined by (ν −very exact T)(x) = max (0, min (1, ϕ intensive(x) +ν )) . μ μν−very exact Figure 3.5: The i) modifier "ν -very exact" Extensive modifiers These modifiers are M = min(1, max(0 , α × ϕ extensive ( x ) + β )) and ϕ extensive obtained for having the same form as ϕ intensive . Depending on α and β there is the following classes of extensive operators (Desprès, 1986): 78 ii1) " π - rather", obtained for 0 < α < 1 and α + β = 1 and defined by (π − rather T )(x ) = min(1, max(0, π × ϕ extensive (x ) + 1 − π )) ; this class spreads the support of primary term. μ μ π − rather Figure 3.6 : The modifier " π -rather" ii2) the second class, named " ρ -approximative", is obtained for α > 1 and β =0 ; it spreads the core and is defined by (ρ − approximative T )(x ) = min(1, max(0, ρ × ϕ extensive (x ))) . μ μ ρ − approximat ive Figure 3.7 : The modifier " ρ -aproximative" ii3) another class spread the core and the support and is obtained for α = 1 and 0 < β < 1 ; it is named " σ - around" and it is defined by (σ − around T )(x ) = min(1, max(0, ϕ extensive (x ) + σ )) . μ μ σ − around Figure 3.8 : The modifier " σ - around" 79 3.2 FACTS REPRESENTATION An elementary piece of information can be expressed as a proposition of the form: an attribute of an entity takes a particular value. An elementary proposition can be symbolical expressed by a triple (attribute, object value). This triple can be reduced of the canonical form “ X is A ” where X is a variable representing the attribute of the entity and A is the value. The proposition “ X is A ” can be understood as “the quantity X satisfies the predicate A ”. As Zadeh pointed out (1973, 1975a, 1975b, 1975c, 1978, 1979) the semantic content of the proposition “ X is A ” can be represented by π X (u ) = μ A (u ), ∀u in the frame of discernment U , where π X is the possibility distribution restricting the possible value of X and μ A is the membership function of the set A . This relation means: the possibility that X may take u as its value is nothing but the membership degree of u in A . Note that if ∃u 0 ∈ U : π X (u 0 ) = 1 and π X (u ) = 0 ∀u ≠ u 0 the fact is precise, it is imprecise but non-fuzzy if ∀u ∈ U, π X (u ) ∈ {0, 1} and it is fuzzy if π X (u ) ∈ [0 , 1] , ∀u ∈ U . The complete absence of information about the value of X is represented by π X (u ) = 1 ∀u ∈ U . Any fact (precise, imprecise or fuzzy) may be uncertain. To represent our confidence in the truth of a fact it can be qualified by a numerical degree η ∈ [0, 1] which is a degree of necessity or certainty i.e. an estimation of the extent to which it is necessary that X is A . The information “ X is A ” with certainty λ is represented by π X (u ) = max(μ A (u ), 1 − η ) which means that there is a possibility equal to 1-η that the value of X lies outside the supp ( A) and a certainty equal to η that X take his value 80 in supp ( A) . Note that “ X is A ” with certainty 0 is equivalent to “ X is U ”. Let p a proposition; provided that p is non-fuzzy (i. e. does not contain any vague predicate), the excluded-middle and the noncontradiction lows hold, then p and ¬p can be regarded as mutually exclusive alternatives. Then, a possibility distribution (Zadeh, 1978) π can be attached to the set {p, ¬p} by two numbers π( p ), π(¬p ) ∈ [0, 1] , which represent the possibility that p is true and the possibility that ¬p is true, respectively. The normalization condition max(π ( p ), π (¬p )) = 1 must hold and it departs from probability theory expressing that at least one of the alternatives must be completely possible, since the alternatives are mutually exclusive and cover all the possibilities. The quantity n( p ) = 1 − π ( p ) can be viewed as a measure of necessity since it expresses that the necessity of p corresponds to the impossibility of ¬p . As recalled in (Prade, 1985) possibility and necessity measures are cases of plausibility and belief functions studied by Shafer (1976). The uncertainty of a fuzzy fact A can be expressed, too, as a pair (Bel ( A) , Pls( A)) ; if we have a basic fuzzy set F , we can measure the uncertainty of a fuzzy event A as being the pair defined by its possibility and necessity degrees (Nec( A), Pos( A)) from the set {(N S ,C;F ( A), ΠT ;F ( A)), (ΓT ;F ( A)), (ΛT ,C;F ( A)), (LT ,C;F ( A)),VT ,C;F ( A)}. Another possibility is to define the uncertainty as a linguistic variable modeled as a fuzzy number. It is possible to know the probability of occurrence of a fuzzy event; in this case we can transform the probability m into a fuzzy number N = (m, α , β ) (Singer, 1990). We assume that the values of μ N are equal or less than 0.1 if the deviation from the middle value is x = ±0.05m ; in other words we assume that the 81 probability with a value differing from the middle value m by ± 5% has a possibility value only 0.1. With this assumption we have m- x 0.05m 1= 1− = 0.1, α = β = m / 18 . α α Therefore, the probability of occurrence m corresponds to the ⎛ m m⎞ uncertainty given by fuzzy number ⎜ m, , ⎟ . ⎝ 18 18 ⎠ The uncertainty can be expressed using the theory of support logic programming (Baldwin, 1987), too. In this case, the uncertainty of an assertion A is a support pair [n , p ] , with n ≤ p , and has the following interpretation: A is necessarily supported to degree n , not A is necessarily supported to degree 1 − p and p − n measures the unsureness associated with the support for the pair ( A, not A) . 3.3 IMPLICATIONS The notion of fuzzy implication plays a major role in order to represent the rules. Let p = ” x is in A ” and q = ” y is in B ” where A and B are crisp sets. The interpretation of the material implication p → q is that the degree of truth of p → q quantifies to what extent q is at least as true as p , i. e. ⎧1 if τ ( p ) ≤ τ (q ) ⎩0 otherwise τ ( p → q) = ⎨ where τ (⋅) denotes the truth value of a proposition. Further on, we consider that A and B are fuzzy sets in U and V , respectively. The membership function of the implication 82 X is A → Y is B should be a function of μ A (x ) and μ B ( y ) : μ A→ B (u ,v ) = I (μ A (u ), μ B (v )) . We shall use the notation μ A→ B (u ,v ) = μ A (u ) → μ B (v ) . One possible extension of material implication is ⎧1 if μ A (u ) ≤ μ B (v ) ⎩0 otherwise μ A (u ) → μ B (v ) = ⎨ However, it is easy to see that this fuzzy implication operator is not appropriate for real-life applications. Namely, let μ A (u ) = μ B (v ) = 0.6 ; then, μ A (u ) → μ B (v ) = 1 . But for μ A (u ) = 0.6 and μ B (v ) = 0.599 we obtain μ A (u ) → μ B (v ) = 0 . This example shows that small changes in the input can cause a big deviation in the output. In order to define an implication, the following definition is very important. Definition 3.2. A fuzzy implication is a function I : [0, 1] → [0 , 1] 2 satisfying the following conditions: I1: If x ≤ z then I (x, y ) ≥ I (z , y ) for all x, y, z ∈ [0, 1] I2: If y ≤ z then I ( x, y ) ≤ I ( x , z ) for all x, y, z ∈ [0, 1] I3: I (0 , y ) = 1 (falsity implies anything) for all y ∈ [0, 1] I4: I (x, 1) = 1 (anything implies tautology) for all x ∈ [0, 1] I5: I (1, 0 ) = 0 (Booleanity). The following properties could be important in some applications: I6: I (1, x ) = x (tautology cannot justify anything) for all x ∈ [0, 1] I7: I (x , I ( y , z )) = I ( y , I ( x , z )) (exchange principle) for all x, y, z ∈ [0, 1] 83 I8: x ≤ y if and only if (implication defines ordering) for all x, y, z ∈ [0, 1] I9: I (x, 0 ) = N (x ) for all x ∈ [0, 1] is a strong negation I10: I ( x, y ) ≥ y for all x, y ∈ [0, 1] I11: I ( x, x ) = 1 (identity principle) for all x ∈ [0, 1] I12: I (x , y ) = I ( N ( y ), N ( x )) for all x, y, z ∈ [0, 1] and a strong negation N I13: I is a continuous function. The most important families of implications are given (Czogala & Leski, 2001) by Definition 3.3. A S -implication associated with a t-conorm S and a strong negation N is defined by I S ,N (x , y ) = S (N (x ), y ) A R-implication associated with a t-norm I T ( x , y ) = sup{z ∈ [0 , 1] / T ( x , z ) ≤ y} ∀x , y ∈ [0 , 1] T is defined by A QL-implication is defined by I T ,S ,N ( x , y ) = S ( N ( x ),T ( x , y )) A t-norm implication associated with a t-norm T is defined by I T (x , y ) = T (x , y ) . Generally, QL -implications violates property I1; conditions under which I1 is satisfied by a QL -implication can be found in (Fodor, 1991). Although the t-norm implications do not verify the properties of material implication they are used as model of implication in many applications of fuzzy logic. The most important implications, obtained for N ( x ) = 1 − x , are the following (Czogala & Leski, 2001): • Kleene-Dienes: I (x, y ) = max(1 − x , y ) , which is a S -implication for S ( x , y ) = max( x , y ) and QL -implication for T ( x , y ) = max(0, x + y − 1) and S (x , y ) = min(1, x + y ) 84 • Reichenbach: I (x, y ) = 1 − x + xy , which is a S -implication for S ( x , y ) = x + y − xy • Lukasiewicz: I (x, y ) = min(1 − x + y , 1) , which is S -implication for S (x , y ) = min(1, x + y ) , a R -implication for T ( x , y ) = max(0, x + y − 1) and a QL -implication for T (x , y ) = min(x , y ) and S (x , y ) = min(1, x + y ) • ⎧1 if x ≤ y Rescher-Gaines: I (x, y ) = ⎨ , ⎩0 otherwise which is a R -implication for T (x , y ) = min(x , y ) • ⎧1 if x ≤ y , Godel: I ( x, y ) = ⎨ ⎩ y otherwise which is a R -implication for T (x , y ) = min(x , y ) • Goguen: I (x, y ) = min( y / x , 1) , which is a R -implication for T ( x , y ) = xy • Zadeh: I (x, y ) = max(1 − x , min(x , y )) , which is a QL -implication for T (x , y ) = min(x , y ) and S ( x , y ) = max( x , y ) . • if x ≤ y ⎧1 Fodor: I (x, y ) = ⎨ , ⎩max(1 − x , y ) otherwise which is a R -implication for T = min0 , a S -implication for S = max0 and a QL -implication for T = min and S = max0 , where 85 if x + y ≤ 1 ⎧0 min0 (x , y ) = ⎨ , ⎩min(x , y ) if x + y > 1 if x + y ≥ 1 ⎧1 max0 (x , y ) = ⎨ ⎩max(x , y ) if x + y < 1 The Lukasiewicz’s implication verifies all the properties I1-I13 and the Fodor’s implication verifies the properties I1-I12. Typical examples of t-norm implications are Mamdani: I ( x , y ) = min(x , y ) and Larsen: I (x , y ) = xy . 3.4 RULES REPRESENTATION Let X and Y be two variables whose domains are U and V , respectively. A causal link from X to Y is represented as a conditional possibility distribution (Zadeh, 1978; 1979) π Y / X which restricts the possible values of Y for a given value of X . For the rule if X is A then Y is B we have ∀u ∈ U , ∀v ∈ V π Y / X (v ,u ) = I (μ A (u ), μ B (v )) = μ A (u ) → μ B (v ) . Frequently, the causal link from X to Y is described as a set of rules of the form: if X is Ai then Y is Bi , i ∈ {1, 2 , L , m} . 86 The information given by these rules can be combined thus (Dubois & Prade, 1985b; 1987): ∀u ∈ U ,∀v ∈ V : π Y / X (v ,u ) = min μ Ai → Bi (u , v ) . i∈{1,L,m} A natural consistency condition for these rules is (Lebailly, MartinClouaire & Prade, H., 1987): ( ) A1 ∩ L ∩ Am ≠ ∅ ⇒ h(B1 ∩ B2 ∩ L ∩ Bm ) = sup min μ B1 (v ), L , μ Bm (v ) = 1 v∈V . A rule with multiple consequent can be treated as a set of rules with a single conclusion; for instance, the rule if antecedent then C1 and C 2 and …and C n is equivalent to the rules if antecedent then C1 if antecedent then C 2 ……………………….. if antecedent then C n . A rule with multiple premise can be broken up into simple rules (Demirli & Turksen, 1992) when the rules are represented with any S - implication or any R -implication and the observations are normalized fuzzy sets. There are various forms in order to represent a rule “if X is A then Y is B ”, depending on the type of the sets A and B . For instance (Lebailly, Martin-Clouaire & Prade, H., 1987): • for a non-fuzzy condition (represented by a rectangle or by a point) and an arbitrary conclusion (represented by a λ -trapezium) if u ∉ A ⎧1 ∀u ∈ U , ∀v ∈ V : π Y / X (v ,u ) = ⎨ ⎩μ B (v ) if u ∈ A 87 • for a fuzzy and certain condition (represented by a trapezoidal distribution) and a non-fuzzy, but eventually uncertain, conclusion (represented by a λ -rectangle) ∀u ∈ U , ∀v ∈ V : π Y / X (v ,u ) = max(μ B (v ) ,1 − min(λ , μ A (u ))) , where λ is the certainty degree associated to the rule • for a fuzzy condition (represented as a trapezoidal distribution) and a fuzzy conclusion (represented by a trapezoidal or λ trapezoidal distribution) if μ A (u ) ≤ μ B (v ) ⎧1 ∀u ∈ U , ∀v ∈ V : π Y / X (v ,u ) = ⎨ ⎩μ B (v ) otherwise If p and q are non-fuzzy propositions, then the necessity measure Nec( p → q ) corresponds to the degree for which it is sufficient to have p in order to infer that q is true. Likewise, the necessity measure Nec(¬p → ¬q ) → Nec(q → p ) evaluates to what extent it is necessary that p to be true for having q true. Thus, from Nec( p → q ) ≥ a ∈ [0, 1] and Nec(¬p → ¬q ) ≥ a' ∈ [0, 1] one can represent the rule “if p then q”, where p =” X is A ” and q =”Y is B ”, A and B being non-fuzzy, by a conditional possibility distribution (Martin-Clouaire & Prade 1985): ⎧1 ⎪1 − a ⎪ ∀u ∈ U , ∀v ∈ V , π Y / X (v ,u ) = ⎨ ⎪1 − a' ⎪⎩1 A rule 88 if if if if u ∈ A, v ∈ B u ∈ A, v ∉ B u ∉ A, v ∈ B u ∉ A, v ∉ B . if X is A then Y is B with A non-fuzzy set, for which is known the degree of sufficiency s is equivalent with if X is A then Y is B' with μ B' (v ) = max(μ B (v ),1 − s ) . Similarly, if the degree of necessity n is known then μB' (v ) = max(μ B (v ),1 − n ) . When A is fuzzy, the rule “if X is A then Y is B with the degree of certainty s” can be represented by π Y / X (v ,u ) = max(μ B (v ), 1 − min(s , μ A (u ))) . In some cases the expert expresses the uncertainty by a number from [0,1] ; for instance in DIABETO (Buisson, Farreny. & Prade, 1986) the number 0.8 is used in order to express the epithet “possible”. In other systems, the uncertainty is given by linguistic terms. Using support pairs (Baldwin, 1987), the uncertainty is expressed by two numbers n , p ∈ [0 , 1] , n ≤ p , under the form if X is A then Y is B : [n , p ] and has the following interpretation: if the premise is true then the conclusion is necessarily supported to degree n and not conclusion is necessarily supported to degree 1 − p (or is possible supported with the degree p ). 89 4 REASONING WITH IMPRECISE AND/OR UNCERTAIN RULES 4.1 GENERALIZED MODUS PONENS RULE Zadeh (1979) introduced the theory of approximate reasoning in order to work with imprecise and uncertain information. The basic problem of approximate reasoning is to find the membership function of the consequence C from the rule base {R1 , L , Rn } and the fact A : R1 : if x is A1 then y is C1 R2 : if x is A2 then y is C 2 ……………………………. Rn : if x is An then y is C n x is A Fact ________________________________ y is C Consequence In fuzzy logic and approximate reasoning, the most important fuzzy implication inference rule is the Generalized Modus Ponens (GMP), based on the compositional rule of inference suggested by Zadeh (1973). Definition 4.1. (compositional rule of inference) Rule if x is A then y is B Fact x is A' _____________________________________ Consequence y is B' where the consequence B' is determined as a composition of the fact and the fuzzy implication operator B' = A' o( A → B ) that is μ B' (v ) = sup min(μ A' (u ), μ A→ B (u ,v )) u∈U where A and A' are fuzzy subsets of the universe U and B and B' are fuzzy subsets of the universe V . In many practical cases instead of sup − min composition one use sup− T composition, where T is a t-norm: B' = A' o T ( A → B ) that is μ B' (v ) = sup T (μ A' (u ), μ A→ B (u ,v )) . u∈U Suppose that A, B and A' are fuzzy numbers. The Generalized Modus Ponens should satisfy some rational properties. Property 4.1. Basic property: if x is A then y is B x is A' _____________________ y is B 92 A = A' B' = B Property 4.2. Total indeterminance if x is A then y is B x is ¬A _________________________ y is unknown A B A' B' Property 4.3. Subset if x is A then y is B x is A' ⊂ A _____________________ y is B A B' = B A' 93 Property 4.4. Superset if x is A then y is B x is A' _______________________ y is B' ⊃ B A A' B B' Not any combination (t-norm, implication) satisfies all four properties listed above; for instance the combination (min, Mamdani) do not verifies the total indeterminance and superset properties. Suppose we are given a set of fuzzy rules R1 : if x is A1 then y is B1 R2 : if x is A2 then y is B2 ………………………………………. Rn : if x is An then y is Bn x is A' Fact ________________________________ y is C Consequence The i -th fuzzy rule Ri : if x is Ai then y is Bi is implemented by a fuzzy implication I i defined as I i (u , v ) = μ Ai → Bi (u , v ) = μ Ai (u ) → μ Bi (v ) . 94 There are two main approaches to determine the membership function of consequence C . If the combination operator is denoted by ⊕ ∈ {min, max} ( or, more generally, ⊕ ∈ {T , S } ) we can: • combine the rules first: μ R (u ,v ) = μ A1→ B1 (u ,v ) ⊕ L ⊕ μ An → Bn (u ,v ) μ C (v ) = sup T (μ A' (u ), μ R (u ,v )) u∈U • fire the rules first: μ B'k (v ) = sup T (μ A' (u ) , μ Ak → Bk (u , v )), k ∈ {1, 2,L n} u∈U μ C (v ) = μ B '1 (v ) ⊕ μ B '2 (v ) ⊕ L ⊕ μ B 'n (v ) . A question appears: the two methods give the same result? The answer is given by the following theorems (Fullér, 1995): Theorem 4.1. If T = min or T (x , y ) = xy and ⊕ = max the two method give the same result. Theorem 4.2. If ⊕ = min and T is an arbitrary t-norm then the conclusion inferred combining the rules first is included in the conclusion inferred firing the rules first. An analysis of the conclusion inferred by GMP reasoning when between the premise “ X is A ” and the observation “ X is A' ” there is one of the relations A ⊆ A' , A' = A , A ⊇ A' , A and A' have a partial overlapping is presented in (Iancu, 1998c; 2008a; 2008b), where one works with fuzzy if-then rules with a single input single output and tnorm t ( x , y ) = max((1 + λ )( x + y − 1) − λxy ) , λ ≥ −1 as a composition operation, using the following set of operators. implication Reichenbach : I R (u , v ) = 1 − μ A (u ) + μ A (u )μ B (v ) 95 Willmott: I W (u , v ) = max(1 − μ A (u ) , min(μ A (u ) , μ B (v ))) Mamdani: I M (u , v ) = min(μ A (u ) , μ B (v )) ⎧1 if μ A (u ) ≤ μ B (v ) ⎩0 otherwise Rescher-Gaines: I RG (u , v ) = ⎨ Kleene-Dienes: I KD (u , v ) = max(1 − μ A (u ) , μ B (v )) if μ A (u ) ≤ μ B (v ) ⎧1 Brouwer-Gödel: I BG (u , v ) = ⎨ ⎩μ B (v ) otherwise ⎧1 ⎪ Goguen: I G (u , v ) = ⎨ μ B (v ) ⎪ μ (u ) ⎩ A if μ A (u ) ≤ μ B (v ) otherwise Lukasiewicz: I L (u , v ) = min(1 − μ A (u ) + μ B (v ) , 1) if μ A (u ) ≤ μ B (v ) ⎧1 ⎩max(1 − μ A (u ), μ B (v )) otherwise Fodor: I F (u , v ) = ⎨ Working with the rule if X is A then Y is B and the observation X is A' , where A and A' are fuzzy subsets of the universe U and B is fuzzy subset of the universe V , the following results are obtained Theorem 4.3. If the premise contains the observation (i. e. μ A' (u ) ≤ μ A (u ) ∀u ∈ U ) then 1) μ B ' (v ) = μ B (v ) for every of the cases 1.1 I = I R and λ ≥ 0 1.2 I = I R , λ < 0 and μ B (v ) ≥ 1.3 I = I W and λ ≥ 0 96 λ λ −1 1.4 I = I W , λ < 0 and μ B (v ) ≥ − λ 4 1 .5 I = I M 1.6 I ∈ {I KD , I L } and λ ≥ 0 1.7 I = I KD , λ < 0 and μ B (v ) ≥ − 1.8 I ∈ {I BG , I G } 1.9 I = I F , λ ≥ 0 or μ B (v ) ≥ − λ 4 λ 4 2) μ B ' (v ) ≤ μ B (v ) for I = I RG 3) μ B ' (v ) < − λ 4 for I ∈ {I W , I KD , I F }, λ < 0 and μ B (v ) < − 4) μ B ' (v ) ≤ − (μ B (v )(1 + λ ) − λ ) 4λ (1 − μ B (v )) λ 4 2 for I = I R , λ < 0 and μ B (v ) ≤ λ λ −1 − λ (1 + μ B (v )) + 4(1 + λ )μ B (v ) 5) μ B ' (v ) < 4 for I = I L and λ < 0. 2 Theorem 4.4. If the premise and the observation are identical then: 1) μ B' (v ) = μ B (v ) for the cases 1.1 ∀I ∈ {I R , I W , I M , I RG , I KD , I BG , I G , I L } and λ ≥ 0 1.2 I = I F λ ≥ 0 or μ B (v ) ≥ 0.5 2) for λ < 0 we have a) μ B' (v ) = μ B (v ) for a1 ) I ∈ {I M , I RG , I BG , I G } a 2 ) I = I R and λ λ −1 ≤ μ B (v ) 97 ⎛ λ ⎞ b) μ B ' (v ) = max⎜ − , μ B (v )⎟ for the cases ⎝ 4 ⎠ b1 ) I ∈ {I W , I KD } b2 ) I = I F and μ B (v ) < 0.5 [ ( 1 + λ )μ B (v ) − λ ] 2 c )μ B' (v ) = − 4λ (1 − μ B (v )) for I = I R and μ B (v ) ≤ λ λ −1 − λμ B2 (v ) + 2μ B (v )(λ + 2) − λ 4 for I = I L d) μ B ' (v ) = Theorem 4.5. If the observation contains the premise (i. e. μ A' (u ) ≤ μ A (u ) ∀u ∈ U ) then a) μ B' (v ) = μ B (v ) if I = I M b) μ B' (v ) ≥ μ B (v ) if I ∈ {I R , I W , I RG , I KD , I BG , I G , I L , I F }. Theorem 4.6. If A and A' have a partial overlapping then 1) μ B' (v ) = 1 if core( A' ) ⊄ support ( A) and I ∈ {I R , I W , I RG , I KD , I BG , I G , I L } 2) μ B' (v ) ≥ μ B (v ) if core( A' ) ⊆ support ( A) and I ∈ {I KD , I BG , I G , I L } or I = I W and μ B (v ) ≤ 0.5 or I = I R and λ ≥ 0 3) μ B' (v ) ≤ μ B (v ) if I = I M 4) μ B ' (v ) ∈ [μ B (v ), − λ + (1 + λ )μ B (v )) if core( A' ) ⊆ support ( A) , I = I R and λ < 0 5) μ B ' (v ) ∈ [0, 1] if core( A' ) ⊆ support ( A) and 98 I = I RG 6) μ B ' (v ) ≥ 0.5 if core( A' ) ⊆ support ( A) , and μ B (v ) > 0.5 7) μ B' (v ) = 1 if core( A' ) ⊄ Aμ B (v ) and I = I F I = IW μ B ' (v ) ≥ μ B (v ) if core( A' ) ⊂ Aμ B (v ) and I = I F where Aα denotes the α -cut of A . Remark 4.1. If the observation is more precise than the premise of the rule then it gives more information than the premise. However, it does not seem reasonable to think that the generalized modus ponens allows to obtain a conclusion more precise than that of the rule. The result of the inference is valid if μ B ' (v ) = μ B (v ) ∀v ∈V . Sometimes, the deduction operation allows the reinforcement of the conclusion, as in the example (Mizumoto & Zimmerman, 1982). Rule: If the tomato is red then the tomato is ripe. Observation: This tomato is very red. If we know that the maturity degree increases with respect to color, we can infer this tomato is very ripe. On the other hand, in the example Rule: If the melon is ripe then it is sweet Observation: The melon is very ripe we do not infer that the melon is very sweet because it can be so ripe that it can be rotten. 99 Remark 4.2. This examples show that the expert must choose the deduction operation depending on the knowledge base. If he has not supplementary information about the connection between the variation of the premise and the conclusion, he must be satisfied with the conclusion μ B' (v ) = μ B (v ), ∀v ∈ V . The Theorem 4.3 says that for this we can choose λ ≥ 0 . Remark 4. 3. When the observation and the premise of the rule coincide then the convenient behavior of the fuzzy deduction is to obtain an identical conclusion. But, the Theorem 4.4, in the case of λ < 0 , can give a different conclusion. This fact indicates the appearance of an uncertainty in the conclusion, that is totally unreasonable. In order to avoid this possibility we suggest to use the value λ ≥ 0 . Remark 4.4. The result obtained by Theorem 4.5 is very general and it does not offer enough information about the conclusion inferred. The result of inference depends on the compatibility between the observation and the premise of the rule. To express this compatibility, the following quantities (Despres, 1986) are frequently used: (a) D.I = sup{u∈U / μ A (u )=0} μ A' (u ) , named uniform degree of non-determination; it appears when the support of the premise does not contain the support of the observation; (b) I = sup{u∈U / μ A' (u ) ≥ μ A (u )} (μ A' (u ) − μ A (u )) . The uncertainty propagated is expressed with the help of D.I and I and it corresponds to value μ B' on the set {v ∈V / μ B (v ) = 0} . Theorem 4.7. If μ A' (u ) ≥ μ A (u ) ∀u ∈ U then the uncertainty propagated during the inference is μ B ' (v ) < I if λ > 0 and the implication is in the set {I R , I W , I KD , I L , I F } 100 μ B' (v ) = I if λ = 0 and the implication is in the set {I R , I W , I KD , I L , I F } μ B' (v ) > I if λ < 0 and the implication is in the set {I R , I W , I KD , I L , I F } μ B' (v ) > D.I if the implication is in the set {I RG , I BG , I G } μ B' (v ) > I if the implication is I M . A study of Generalized Modus Ponens reasoning using t-norms with threshold was initiated in (Iancu, 2009c), where the t-norm product with a single threshold ⎧1 − k xy if x ≤ k and y ≤ k ⎪ Tk ( x , y ) = ⎨ k ⎪⎩min( x , y ) if x > k or y > k and the Fodor’s implication are used. A comparison with the case when the standard t-norm product is used shows that the t-norm with threshold gives better results. 4.2 UNCERTAIN GENERALIZED MODUS PONENS REASONING We consider the case when the rules and the facts are uncertain, the uncertainty being expressed by belief degrees or by linguistic variables. The reasoning with belief degrees has the form if x is A then y is B : α x is A' : β _______________________ y is B' : γ where α , β, γ ∈ [0, 1] represent the belief degrees corresponding to the rule, the fact and the conclusion, respectively. The conclusion is obtained 101 as B' = A' o T ( A → B ) and its associated belief degree is γ = ∗T (α , β ) , where o T and ∗T are t-norms operators. For multiple premises, the inferred schema becomes An ∧ L ∧ A1 → C : α A' n : βn ……………………………… A'1 : β1 ______________________ C' : γ where the conclusion C' is obtained with the formula C' = A'1 o T (L o T ( A' n o T ( An ∧ L ∧ A1 → C )) ...) and the belief degree is γ = ∗T (α ∗T (β 1 , L , β n )) . If the same conclusion is obtained from m different rules, with the belief degrees γ 1 , γ 2 , L , γ m , then the global belief degree is computed with the formula γ = ∗S (γ 1 , γ 2 , L, γ m ) where ∗S is a t-conorm. Now, consider the case of linguistic uncertainty (Leung & Lam, 1989; Leung, Wong & Lam, 1989). Suppose there is a rule and a fact if X is A then Y is B : FN R : FN F X is A' ____________________________________ Y is B' : FN C FN R , FN F and FN C are fuzzy numbers denoting the uncertainty of the rule, of the fact and of the conclusion, respectively. If the object X is non-fuzzy, A and A' must be the same atomic symbol in order to apply this rule. Therefore, B' = B and the fuzzy uncertainty FN C is calculated using fuzzy number multiplication of FN R and FN F : 102 FN C = FN R ⊗ FN F . If X and Y are fuzzy objects, the conclusion B' is obtained using Generalized Modus Ponens and the uncertainty FN C is FN C = (FN R ⊗ FN F ) × D where D is the degree of matching between A and A' . If X is fuzzy and Y is not-fuzzy then B' = B and the uncertainty FN C is computed as FN C = (FN R ⊗ FN F ) × M where M is the “similarity” between the fuzzy sets F and F' of A and A' , respectively: if Ν (F ; F' ) > 0.5 then M = Π (F ; F' ) else M = ( N ( F ; F ' ) + 0 .5 ) × Π ( F ; F ' ) , Π (F ; F' ) = max(min(μ F (u ), μ F (v ))) , Ν (F ; F' ) = 1 − Π (¬F ; F' ) . Because a rule with multiple consequence can be treated as multiple rules with a single conclusion, only the problem of multiple propositions in the antecedent and a single proposition in the consequence need to be considered. If the object in the consequent proposition is nonfuzzy, no special treatment is needed. However, if the consequent proposition is fuzzy, the fuzzy set of the value B' in the conclusion is calculated using the following two basic algorithms (Mizumoto, 1985): a) rule: if A1 and A2 then Y is B fact: conclusion: A'1 , A' 2 Y is B' algorithm: the fuzzy set representing B' in the conclusion is obtained by union of the fuzzy sets F1 and F2 , where the fuzzy set F1 is obtained by GMP from the rule “ if A1 then Y is B ” and the fact A'1 , while the 103 fuzzy set F2 is obtained from the rule “ if A2 then Y is B ” and the fact A' 2 . b) rule: fact: if A1 or A2 then Y is B A'1 , A' 2 Y is B' conclusion: algorithm: the same as above except fuzzy intersection rather than union should be applied on the fuzzy sets F1 and F2 . The above two algorithms can be applied repeatedly to handle any combination of antecedent propositions. For instance Rule: IF ( the productivity is high OR the production-cost is low) AND the sales are high THEN the performance of the company should be good Facts: The productivity is very high The production-cost is low The sales are rather high where “low”, “high”, “very low” and “rather high” are fuzzy concepts. Let F1 be the fuzzy set obtained by making an inference from the rule IF the productivity is high THEN the performance of the company should be good and the fact The productivity is very high; F2 be the fuzzy set obtained by making an inference from the rule IF the production-cost is low THEN the performance of the company should be good and the fact The production-cost is low; F3 be the fuzzy set obtained by making an inference from the rule IF the sales are high THEN the performance of the company should be good 104 and the fact The sales are rather high. The fuzzy set F representing the fuzzy value of the object ”the performance of the company” in the conclusion is determined as follows: F = fuzzy union between F12 and F3 , where F12 is the fuzzy intersection between F1 and F2 . As a result, F will indicate the fuzzy concept “good” and the conclusion “the performance of the company is good” is drown. The fuzzy uncertainty of the conclusion deduced from rules with multiple antecedent propositions is calculated by employing fuzzy number arithmetic operators in the formulae used by MYCIN’s CF model. For example, for: rule: if A1 and A2 then C : FN R fact: conclusion: we have : FN F1 , FN F 2 A'1 , A' 2 C' : FN C' ( ) FN C' = min _fn FN F1 , FN F2 ⊗ FN R where FN R , FN F1 , FN F 2 and FN C' represent the uncertainty of the rule, the facts and the conclusion, respectively, ⊗ is the fuzzy multiplication and the minimum of two fuzzy numbers is given by μ min_ fn ( A ,B ) ( z ) = max min(μ A ( x ) , μ B ( y )) . z = min ( x,y ) Similarly, the maximum is defined as μ max_ fn ( A ,B ) ( z ) = max max(μ A ( x ) , μ B ( y )) . z = max ( x,y ) If logical OR is used, the calculation is the same except that the fuzzy maximum is taken rather than the minimum. For the combination of antecedent propositions, the two calculations can be applied repeatedly to handle fuzzy uncertainties corresponding to the matched facts and the rule. 105 In some cases, there is more than one rule with the same consequent proposition. Each of these rules can be treated as giving contributed evidence towards the conclusion. The conclusion CR is obtained from the evidence contributed by these rules and facts. For instance: rules facts r1 : if A1 then B r2 : if A2 then B A'1 , A' 2 conclusion C R obtained from C1 : FN C1 and C 2 : FN C2 , where C1 and C 2 are the conclusions obtained from r1 & A'1 and r2 & A' 2 , respectively, and FN C1 , FN C2 represent the uncertainties of the conclusions. If the object involved in the consequent proposition is fuzzy then the fuzzy set corresponding to the combined conclusion C R is obtained by taking the fuzzy intersection between the fuzzy sets corresponding to C1 and C 2 . The two uncertainties FN C1 and FN C2 can be combined, to obtain the overall uncertainty, using a formula similar to the evidence combination from MYCIN’s CF model: FN C = FN C1 ⊕ FN C2 ΘFN C1 ⊗ FN C2 . 4.3 UNCERTAIN HYBRID REASONING Further on we present an Uncertain Reasoning System which is a generator of expert systems that knows to process uncertain and imprecise knowledge. Uncertainty is measured in three ways: 1. Using the support logic programming technique of Baldwin(1987), where a conclusion is supported by a certain degree of evidence and its negation is also supported to certain 106 degree, and these dual measures of uncertainty are named support pair. A support pair (n , p ) comprises a necessary and possible support and it is interpreted as an interval in which lies the probability. A voting interpretation is also useful: the lower (necessary) support, n , represents the proportion of a population voting in favor of a proposition and proportion voting against; ( p − n) (1 − p ) represents the represents the proportion abstaining. This method is used in PROSUM system (Iancu, 1997c). 2. Using the method of linguistic variables. For each fact which does not contain a linguistic term as an argument, the uncertainty is expressed either by a linguistic variable or by a probability of occurrence of the respective fact. The system translates the linguistic variables and the probabilities into fuzzy numbers. This technique is used in RESYFU system (Iancu, 1997d). 3. A mixture of both previous methods, the uncertainty associated to the answers of the queries being established by the user, such as in UNRESY system (Iancu, 2000). The method used for the management of uncertainty is established by a dialogue user-system and it requires some techniques for the following: • Propagation imprecision and uncertainty in deductive inferences from the condition part to the conclusion part of the rule • Evaluating an approximate matching between an imprecise rule condition and an imprecise fact • Combining items of information issued from different sources. The system works with two types of unification, which uses generalized belief functions. Knowledge is represented as Prolog clauses with addition of a uncertainty A : − B1 , B2 ,..., Bn : unc 107 where A is an atom, B1 ,B2 ,..., Bn are literals and unc is either a linguistic variable var or a support pair [n , p ] . The previous representation has the following interpretation: for each assignment of each variable occurring in the clause, if B1 , B2 , ..., Bn are all true then A is true with degree var or A is necessarily supported to degree n and ¬A is necessarily supported to degree 1 − p . If the body of the clause is empty we have an unit clause, represented as A : unc and having an interpretation that is immediately deduced from one of the rules. The linguistic variable is an element of the set L = { impossible , extremely unlikely , verylowchance , small chance , it may , meaningful chance , most likely , extremelylikely , certain } Each element in the above set represents a statement of linguistic ~ = (m ,α , β ) probability and their semantic is provided by fuzzy number m ⎧ m− x ⎪⎪1 − α μ m~ (x ) = ⎨ ⎪1 − x − m ⎪⎩ β for x ≤ m , α > 0 for x ≥ m , β > 0 where m is the mean value and α and β are the left and right spreads. We obtain the following representation for the semantics of the proposed term set L: impossible extremely unlikely very low chance small chance it may meaningful chance most likely extremely likely certain 108 = (0, 0, 0) = (0.015, 0.01, 0.05) = (0.14, 0.06, 0.05) = (0.29, 0.05, 0.06) = (0.495, 0.09, 0.07) = (0.715, 0.05, 0.06) = (0.85, 0.06, 0.05) = (0.985, 0.05, 0.01) = (1, 0, 0). If a fact has as uncertainty a probability m , we transform it into a fuzzy ~ = ( m , m 18, m 18 ). If the user of UNRESY number (Singer, 1990): m system desires that the answers of the queries to have the uncertainty expressed as a support pair then a) each linguistic variable is translated in a support pair as follows: impossible extremely unlikely very low chance small chance it may meaningful chance most likely extremely likely certain = [0, 0] = [0, 0.1] = [0.1, 0.2] = [0.2, 0.4] = [0.4, 0.6] = [0.6, 0.8] = [0.8, 0.9] = [0.9, 1] = [1, 1]. b) if a predicate of a rule contains a linguistic term defined as a fuzzy set A then the uncertainty associated to such predicate is estimated as a pair [Bel ( A), Pls( A)] . c) a probability m interpreted as a pair [m , m] . If the uncertainty of the answer must be a fuzzy number then: i) the pair [n , p ] is interpreted as a term Li = (mi , α i , β i ) from L for which mi is the nearest to (n + p ) 2 ii) if a predicate contains a linguistic term defined as a fuzzy set A then the probability of the event represented by this predicate is Bel ( A) ; after that this probability is transformed in a fuzzy number as above. The system works with two types of unification: a) syntactic unification, in which matching of two terms occurs if substitutions can be made to make the two terms equivalent symbol by symbol; 109 b) semantic unification, which is used if a predicate contains as argument a linguistic term defined as a fuzzy set. We consider the rule If ( X finished secondary-school and its result of mathematics is good and its result of computer science is very good and its result of English is fair to good) then my recommendation for the choice of an university faculty should be computer science with belief unc which is represented in the form faculty(X, computer-science):-school(X, finished), math(X, good_), comp-sci(X,very_good_), English(X,fair_to_good_): unc The terms X, computer-science and finished are purely syntactic terms which are matched appropriately by the standard unification procedure of Prolog; very_good_, good_ and fair_to_good_ are semantic terms. The character "_" at the end of a term says that it is a linguistic term. If we obtained an unification X = john in the predicate school ( X , finished ) then, in order to compute the uncertainty for the conclusion faculty ( john , computer − science) , it is necessary to determine the uncertainties for math( john , good _ ) , comp − sci ( john , very _ good _ ) english( john , fair _ to _ good _ ) . and The uncertainty for math( john , good _ ) , for instance, is computed in this way: 1. if math ( john , good _ ) is a conclusion of a rule or a fact from the knowledge base, this predicate is used further on with the uncertainty brought by it; 2. Let Y be the linguistic variable with the name " result " , V (Y ) the set of names of linguistic values of math ( john , val _ ): unc , with val _ ∈ V (Y ) 110 Y. If we obtained we use this piece of information in order to compute the uncertainty for math ( john , good _ ) and this calculus depends on the form of unc : (a) if unc = [a , b] , then for fuzzy set good_ we consider the focal sets in the form: val _ with m(val _ ) = p ∈ [a , b] and ¬val _ with m(¬val _ ) = 1 − p . Because Bel (good _ ) and Pls ( good _ ) are monotonous functions with respect to p , we compute the support pair c, d for math( john , good _ ) as follows: - from the focal sets val _ with m(val _ ) = a and ¬val _ with m(¬val _ ) = 1 − a we obtain a support pair c1 , d1 ; - from the focal sets val _ with m(val _ ) = b and ¬val _ with m(¬val _ ) = 1 − b we obtain a support pair c2 , d 2 ; - c = min{c1 , c2 } and d = max{d1 , d 2 }. (b) If unc is a linguistic variable translated into a fuzzy number (n , α , β ) then we compute Bel (good _ ) using for fuzzy set good _ the focal sets val _ with m(val _ ) = n and ¬val _ with m(¬val _ ) = 1 − n ; Afterward, the value computed is transformed into a fuzzy number. 3. If the knowledge base does not contain information about math ( john , good _ ) , the system asks the fuzzy set good _ and their focal sets and afterwards computes Bel (good _ ) or the pair [Bel (good _ ), Pls(good _ )] . Given a program as a set of program clauses representing facts and rules, the programming system has to calculate uncertainty associated with solutions queries to the system. A proof path for a final solution is determined in the normal Prolog style and the uncertainty is determined for each branch in the proof path. If more than one proof path is available, then the uncertainties are combined from different proof paths to obtain the overall uncertainty for the conclusion. The determination of 111 uncertainty value for one proof path is done by determining the uncertainties for disjunction, conjunction and negation statements as well as combining those values when combining values from different proof paths supporting the same conclusion. Using the dialogue with the system, the user can choose the wanted formulas from those implemented or can propose other in order to manage the uncertainty. If we present to UNRESY system the rule performance(comp_sci,good):-result(math,very_good_), result(English,good):certain and we want to work with uncertainty expressed as pair then the system generates the following clauses of a Prolog program: conj(N1,P1,N2,P2,N,P):-N=……, P=…….. rule(N1,P1,N2,P2,N,P):-N=…., P=…. performance(comp_sci,good,N,P):go(result,[math,very_good_],N01,P01), result(English,good,N02,P02), conj(N01,P01,N02,P02,N03,P03), rule(N03,P03,1,1,N,P). …………………………………………………………………………………………………………………………………… go(X,L,N,P):-cf(X,L,N,P),!. go(X,L,N,P):cf(X,L1,N1,P1),pr_match(X,L,L1,N1,P1,N,P),! go(X, L, N, P):-pr_bel([X|L],N,P). ………………………………………………………………………………………………………………………………… include “pr_claus.pro” The rules with the head go( X , L , N , P ) solve the semantic unification; pr_claus.pro represents the clauses for computing the generalized belief functions. The system can process any Prolog statement and can decide if two rules with the same head are or not in conflict. 112 4.4 FUZZY LOGIC CONTROL Conventional controllers are derived from control theory techniques based on mathematical models of the open-loop process, called system, to be controlled. Fuzzy control provides a formal methodology for representing, manipulating and implementing human’s heuristic knowledge about how to control a system. Fuzzy logic control is the result of converting the linguistic control strategy based on expert knowledge into control rules and of combining fuzzy logic theory with inference processes. This fuzzy logic control is very useful when the needed models are not known or when they are too complex for analysis with conventional quantitative techniques. In fuzzy logic controller (FLC) the dynamic behavior of a fuzzy system is characterized by a set of linguistic description rules based on expert knowledge. The expert knowledge is usually of the form IF (a set of conditions are satisfied ) THEN (a set of consequences can be inferred). Because the antecedents and the conditions are associated with fuzzy concepts such a rule is called fuzzy conditional statement. A fuzzy control rule is a fuzzy conditional statement in which the antecedent is a condition in its application domain and the consequent is a control action for the system under control. Fuzzy logic control systems usually consist of four major parts: Fuzzification interface, Fuzzy rule-based, Fuzzy inference machine and Defuzzification interface. 113 Fuzzifier crisp x in U fuzzy set in U Fuzzy Fuzzy Inferrence Rule Base Engine fuzzy set in V Defuzzifier crisp y in V Figure 4.1: Fuzzy logic controller A fuzzification operator has the effect of transforming crisp data into fuzzy set. In most of cases one use fuzzy singletons as fuzzifiers: fuzzifier ( x0 ) = x0 where x0 is a crisp input value from a process. x0 1 x0 Figure 4.2: Fuzzy singleton as fuzzifier For a system with two inputs and a single output, the fuzzy rule-base has the form R1: if x is A1 and y is B1 then z is C1 R2 : if x is A2 and y is B2 then z is C 2 ……………………………………………… Rn : 114 if x is An and y is Bn then z is C n Ai ⊂ U , Bi ⊂ V , Ci ⊂ W where ∀i ∈ {1, 2, L , n} and the rules are aggregated by union or intersection. If the crisp inputs x0 and y0 are presented then the control action is obtained as follows: • The firing level of i -th the rule is determined by is computed by α i = min(μ A (x0 ), μ B ( y0 )) ; evidently, any t-norm can be used i i instead of min • The output of i -th μ C' (w) = α i → μ C (w) ∀w ∈ W i • the rule i The overall system output, C , is obtained from the individual rule outputs C' i by an aggregation operation μ C (w) = Agg (μ C '1 (w), L , μ C 'n (w)) ∀w ∈ W . Defuzzification methods The output of the inference process is a fuzzy set that specifies the possibility distribution of control action. In the on-line control, a crisp control action is usually required. Consequently, one must defuzzify the fuzzy control action inferred, namely z 0 = defuzzifier (C ) . The most used defuzzification operators are • Center of Area. The defuzzified value of a fuzzy set C is defined by ∫ zμ (z )dz C z0 = W ∫ μ (z )dz C or z 0 = ∑ z μ (z ) ∑ μ (z ) j C C j j W depending on the form of membership function μ C : continuous or discrete 115 • First of maxima. The defuzzified value of a fuzzy set C is the smallest maximizing element, i. e. { } z0 = min z / μC ( z ) = max μC (w) • W Middle of Maxima. The defuzzified value of a discrete fuzzy set C is defined as a mean of all values {z1 , L , z n }of the universe of discourse having maximal membership grades: z0 = 1 n ∑ zi . n i =1 If C is not discrete then z0 = ∫ zdz G ∫ dz G where G denotes the set of maximizing element of C . • Max Criterion. This method chooses an arbitrary value, from the set of maximizing elements of C : { } z0 ∈ z / μC ( z ) = max μC (w) . W Inference mechanisms We present the most important inference mechanisms in fuzzy logic control systems and, for simplicity, we consider two fuzzy control rules of the form R1: if x is A1 and y is B1 then z is C1 R2 : if x is A2 and y is B2 then z is C 2 Fact: x is x0 and y is y 0 _____________________________________ z is C Consequence 116 Mamdani’s model. The fuzzy implication is modeled by Mamdani’s minimum operator, for the aggregation of rules is used the max operator, the conjunction operator is min and the t-norm from Generalized Modus Ponens rule is min . The firing levels of the rules are α 1 = min(μ A1 ( x0 ), μ B1 ( y 0 )), α 2 = min(μ A2 ( x0 ), μ B2 ( y 0 )) Using the GMP rule, the conclusion given by first rule is μ C '1 (w) = sup min(min(μ x0 (u ), μ y0 (v )), min(min(μ A1 (u ), μ B1 (v )), μ C1 (w))) (u ,v )∈U ×V Because μ x0 (u ) = 0 ∀u ≠ x0 and μ y0 (v ) = 0 ∀y ≠ y 0 the supremum becomes minimum and therefore it results μ C '1 (w) = min(α 1 , μ C1 (w)) , μ C '2 (w) = min(α 2 , μ C2 (w)). The overall system output is computed as μ C (w) = max(μ C '1 (w), μ C '2 (w)). Finally, to obtain a deterministic control action, any defuzzification strategy can be employed. Tsukamoto’s model. All linguistic terms are supposed to have monotonic membership functions. The firing levels of the rules are computed such as in the Mamdani’s model. The individual crisp control actions z1 and z 2 are computed from the equations α 1 = μ C ( z1 ) , α 2 = μ C ( z 2 ) 1 2 And the overall crisp control action is expressed using the discrete center of area 117 z0 = α1 z1 + α 2 z 2 . α1 + α 2 Sugeno’s model. Sugeno and Takagi used the following architecture (Takagi & Sugeno, 1985). R1: if x is A1 and y is B1 then z1 = a1 x + b1 y R2 : if x is A2 and y is B2 then z 2 = a 2 x + b2 y Fact: x is x0 and y is y 0 _____________________________________________ Consequence z0 The firing levels of the rules are computed as in the previous models, the individual rule outputs are z1∗ = a1 x0 + b1 y 0 , z 2∗ = a 2 x0 + b2 y 0 and the crisp control action is expressed as z0 = α 1 z1* + α 2 z *2 . α1 + α 2 Larsen’s model. The fuzzy implication is modeled by Larsen’s product operator, rules aggregation is made by union and the conjunctive operator is min . The firing levels are computed as α 1 = min(μ A1 ( x 0 ), μ B1 ( y 0 )) , α 2 = min(μ A2 ( x 0 ), μ B2 ( y 0 )) and the conclusion is given by μ C (w) = max(α 1 μ C1 (w), α 2 μ C2 (w)) . In order to obtain a deterministic control action, is used a defuzzification strategy. Example 4.1. As an example, we illustrate Sugeno’s reasoning method. 118 R1: if x is BIG and y is SMALL then z1 = x + 2 y R2 : if x is MEDIUM and y is BIG then z 2 = 3 x − 2 y Fact: x is 3 and y is 2 _________________________________________________ Consequence 1 z0 0.9 0. 3 u α1 = 0.3 v 1 x + 2y = 7 0.8 0.5 3 α2 = 0.5 u 2 v 3x − 2 y = 5 min Figure 4.3: Sugeno’s inference mechanism According to the previous figure, we have μ BIG ( x 0 ) = μ BIG (3) = 0.9 , μ SMALL ( y0 ) = μ SMALL (2) = 0.3 and μ MEDIUM (x0 ) = μ MEDIUM (3) = 0.5 , μ BIG ( x0 ) = μ BIG (2) = 0.8 . The firing levels of the rules are α1 = min{μ BIG ( x0 ), μ SMALL ( y0 )} = min{0.9, 0.3} = 0.3 119 α 2 = min{μ MEDIUM (x0 ), μ BIG ( y0 )} = min{0.5, 0.8} = 0.5 The individual rule outputs are computed as z1∗ = x0 + 2 y 0 = 3 + 4 = 7 , z 2∗ = 3x0 − 2 y 0 = 9 − 4 = 5 so the crisp control action is z0 = 7 × 0.3 + 5 × 0.5 = 5.75 0.3 + 0.5 4.5 EXTENDED MAMDANI FUZZY LOGIC CONTROLLER An extension of the Mamdani model in order to work with interval inputs is presented in (Liu, Geng & Zhang, 2005), where the fuzzy sets are represented by triangular fuzzy numbers and the firing level of the conclusion is computed as the product of firing levels from the antecedent. In our papers (Iancu, 2009a; 2009b) we proposed a fuzzy reasoning system characterized by: • the linguistic terms (or values), that are represented by trapezoidal fuzzy numbers • the inputs, which can be crisp data, intervals and/or linguistic terms • various implication operators, which are used to represent the rules • the crisp action of a rule, computed by Middle-of-Maxima method • the overall crisp actions, computed by discrete Center-of-Gravity. The following implications are used: Reichenbach: I R ( x, y ) = 1 − x + xy Willmott: I W ( x, y ) = max(1 − x, min( x, y ) 120 Mamdani: I M ( x, y ) = min( x, y ) ⎧1 if x ≤ y Rescher-Gaines: I RG ( x, y ) = ⎨ ⎩0 otherwise Kleene-Dienes: I KD ( x, y ) = max (1 − x, y ) ⎧1 if x ≤ y Brouwer-Gödel: I BG ( x, y ) = ⎨ ⎩ y otherwise ⎧1 ⎪ Goguen: I G ( x, y ) = ⎨ y ⎪⎩ x if x≤ y otherwise Lukasiewicz: I L ( x, y ) = min(1 − x + y,1) if x ≤ y 1 ⎧ Fodor: I F ( x, y ) = ⎨ ⎩max(1 − x, y ) otherwise. It is sufficiently to work with rules with a single conclusion because a rule with multiple consequent can be treated as a set of such rules. 4.5.1 THE PROPOSED MODEL We assume that the facts are also given by intervals or linguistic values and a rule is characterized by • a set of linguistic variable A , having as domain an interval I A = [a A , b A ] • n A linguistic values A1 , A2 ,..., An for each linguistic variable A A • membership function for each value Ai is μ A0i ( x) , where i ∈ {1,2,..., n A } and x ∈ I A . 121 According to the structure of a FLC system, the following steps are necessary in order to work with our system (Iancu, 2009a). Step 1. Fuzzification We consider an interval input [a, b] with a A ≤ a < b ≤ bA . The membership function of Ai is modified ([16]) by membership function of [a, b] ⎧1 ⎩0 μ[ a , b ] ( x) = ⎨ if x ∈ [a, b] otherwise as follows μ Ai ( x) = min( μ A0i ( x), μ[ a , b ] ( x)), ∀x ∈ I A ; it is obvious that, any t-norm T can be used instead of min . The firing level, generated by interval input [a, b] , for the linguistic value Ai is μ Ai = max{μ Ai ( x)/x ∈ [a, b]}. According to the previous formula, for a linguistic value A with the membership function represented as a trapezoidal fuzzy number N A = (m A , m A , α A , β A ) and an interval input [a, b] the firing level μ A is computed as ⎧1 if [a, b]I [m A , m A ] ≠ ∅ ⎪ ⎪ m A + β A − a if a ∈ [m A , m A + β ] A ⎪ βA μA = ⎨ ⎪ b − m A + α A if b ∈ [m − α , m ] A A A ⎪ αA ⎪0 otherwise ⎩ 122 The same technique is used to compute the firing level μ Ai generated by a linguistic input value A' i ; in this case μ Ai ( x) = min( μ A0i ( x), μ A 'i ( x)), ∀x ∈ I A . For a crisp input x0 the firing level is μ A i = μ A0 i ( x0 ) . Step 2. Fuzzy inference We consider a set of fuzzy rules Ri : if X1 is Ai1 and ... and X r is Air then Y is Ci where the variables X j , j ∈ {1,2,..., r}, and Y have the domains U j and V, respectively. The firing levels of the rules, denoted by {α i } , are computed by α i = T (α i1 ,..., α ir ) where T is a t-norm and α i j is the firing level for Ai j , j ∈ {1,2,..., r} . The causal link from X1,..., X r to Y is represented using an implication operator I . It results that the conclusion inferred from the rule Ri is C i′(v) = I (α i , C i (v)), ∀v ∈ V . The formula C ′(v) = I (α , C (v)) gives the following results, depending on the implication I : 123 Reichenbach : C ′(v) = I R (α , C (v)) = 1 − α + αC (v) C' 1−α C α Figure 4.4: Conclusion obtained with Reichenbach implication Willmott : C ′(v) = I W (α , C (v)) = max(1 − α , min(α , C (v))) C C' 1− α 0.5 α Figure 4.5: Conclusion obtained with Willmott implication C α C' 0.5 1− α Figure 4.6: Conclusion obtained with Willmott implication 124 Mamdani : C ′(v) = I M (α , C (v)) = min(α , C (v)) C C' α Figure 4.7: Conclusion obtained with Mamdani implication ⎧1 Rescher-Gaines: C ′(v) = I RG (α , C (v)) = ⎨ ⎩0 if α ≤ C (v) otherwise C C' α Figure 4.8: Conclusion obtained with Rescher-Gaines implication Kleene-Dienes: C ′(v) = I KD (α , C (v)) = max(1 − α , C (v)) C' 1− α α C Figure 4.9: Conclusion obtained with Kleene-Dienes implication 125 if α ≤ C (v) ⎧ 1 Brouwer-Gödel : C ′(v) = I BG (α , C (v)) = ⎨ otherwise ⎩C (v) C C' α Figure 4.10: Conclusion obtained with Brouwer-Gödel implication if α ≤ C (v) ⎧ 1 ⎪ Goguen : C ′(v) = I G (α , C (v)) = ⎨ C (v) otherwise ⎪⎩ α C C' α Figure 4.11: Conclusion obtained with Goguen implication Lukasiewicz : C ′(v) = I L (α , C (v)) = min(1 − α + C (v),1) C' 1− α α C Figure 4.12: Conclusion obtained with Lukasiewicz implication 126 if α ≤ C (v) ⎧1 ⎪ Fodor : C ′(v) = I F (α , C (v)) = ⎨max(1 − α , C (v)) ⎪ otherwise ⎩ C 1− α α C' Figure 4.13: Conclusion obtained with Fodor implication C α C' 1− α Figure 4.14: Conclusion obtained with Fodor implication Step 3. Defuzzification The fuzzy output Ci′ of the rule Ri is transformed in a crisp output zi using the Middle-of-Maxima operator according with the following algorithm. The crisp value z0 associated to a conclusion C ′ inferred from a rule having the firing level α and the conclusion C , represented by the fuzzy number (m C , m C , α C , β C ) , is: z0 = mC + mC for implication I ∈ {I R , I KD } 2 127 m C + m C + (1 − α )( β C − α C ) for 2 I ∈ {I M , I RG , I BG , I G , I L , I F } or ( I = I W and α ≥ 0.5) z0 = z0 = aV + bV if I = I W , α ≤ 0.5 and V = [aV , bV ]. 2 In the last case, in order to remain inside the support of C , we can choose a value according to Max-Criterion; for instance z0 = m C + m C + α (β C − α C ) . 2 The overall crisp action corresponding to an implication is computed by the discrete Center-of-Gravity method: if the number of fired rules is N then the final control action N z0 = ∑α i zi i =1 N ∑α i i =1 where α i is the firing level and zi is the crisp output of the i -th rule. Finally, we combine the results obtained with various implication operators in order to obtain the overall output of the system. For this we N (I ) use the "strength" of an implication, given by the ratio λ ( I ) = , 13 where N (I ) is the number of properties (from the list I1 to I13 , see Definition 3.2) verified by the implication I . According to [12] we have: N ( I R ) = 11 , N (IW ) = 6 , N (I M ) = 4 , N ( I RG ) = 11 , N ( I KD ) = 11 , N ( I BG ) = 10 , N ( I G ) = 10 , N ( I L ) = 13 , N ( I F ) = 12 . Then, the overall crisp action of the system is computed as 128 ∑I λ ( I ) z 0 ( I ) ∑I λ (I ) z0 = where z0 ( I ) is the overall control action given by the implication I . A variant of this model is presented in (Iancu, 2009b), where a) the firing level, generated by the interval input [a ,b] corresponding to the linguistic value Ai is computed as a ratio, which is the area defined by μ A i divided by area defined by μ Ao . This ratio can be written as i b μ Ai = ∫ μ A i (x )dx a b ∫ a μ A0 i (x )dx . If the input is a fuzzy set B then the firing level corresponding to the linguistic value Ai is computed using the previous technique with ( ) μ A i ( x ) = min μ A0 i ( x ) , μ B ( x ) . The firing level generated by a crisp input x0 is computed as μ A i = μ A0 i ( x0 ) . b) The overall crisp control of the system is computed using an OWA (Ordered Weighted Averaging) operator), given by Definition 4.2 An OWA operator of dimension n is a maping F : R n → R that has an associated n vector w = (w1 , w2 ,L , wn ) such as t n wi ∈ [0 , 1], 1 ≤ i ≤ n , ∑ wi = 1 . i =1 The aggregation operator of the values {a1 , a 2 ,L , a n } is 129 n F (a1 , a 2 ,L , a n ) = ∑ w j b j j =1 where b j is the j -th largest element from {a1 , a 2 ,L , a n }. The weight corresponding to an implication I is computed as N (I ) Np , where N (I ) is the number of the properties (see Definition 3.2) verified by the implication I and Np is the sum of properties verified by all implications used in the system. The values to aggregate are the crisp control values given by the set of implications used by the system. 4.5.2. AN APPLICATION In order to show how the proposed system works, we consider an example inspired from (Liu, Geng & Zhang, 2005). We consider rules with two inputs and one output. The input variables are quality (Q) and price (P ) ; the output variable is satisfaction score (S ) . The fuzzy rulebase consist of R1 : if Q is Poor and P is Low then S is Middle R2 : if Q is Poor and P is Middle then S is Low R3 : if Q is Poor and P is High then S is Very Low R4 : if Q is Average and P is Low then S is High R5 : if Q is Average and P is Middle then S is Middle R6 : if Q is Average and P is High then S is Low R7 : if Q is Good and P is Low then S is Very High R8 : if Q is Good and P is Middle then S is High R9 : if Q is Good and P is High then S is Middle There are three linguistic values for the variable price 130 {Low, Middle, High} and five linguistic values for the variable quality {Poor , Below Average, Average, AboveAverage, Good }. We consider the universes of discourse [0,800] for price and [0,10] for quality. The membership functions corresponding to the linguistic values are represented by the following trapezoidal fuzzy numbers: Low = (0,100, 0, 200) Middle = (300, 500,100,100) High = (700, 800, 200, 0) Poor = (0,1, 0, 2) Below Average = (2, 3,1,1) Average = (4, 6, 2, 2) Above Average = (7, 8,1,1) Good = (9,10, 2, 0). The satisfaction score has following linguistic values {Very Low, Low, Middle, High, VeryHigh}. For the universe [0,10] we consider the following membership functions: Very Low = (0,1,0,1) Low = (2, 3,1, 1.5) Middle = (4, 6,1,1) High = (7, 8,1, 2) Very High = (9,10,1,0). These membership functions are presented in the next figures. 131 Figure 4.15: The membership function of the input variable price Figure 4.16: The membership function of the input variable quality 132 Figure 4.17: The membership function of the output variable satisfaction score We consider a person interested to buy a computer with price = 400-600 EUR and quality = AboveAverage . The positive firing levels corresponding to the linguistic values of the input variable price are μ Middle = 1, μ High = 0.5 and the positive firing levels corresponding to the linguistic values of the input variable quality are: μ Average = 2/3, μ Good = 2/3. The fired rules and their firing levels, computed with t-norm Product, are: R5 with firing level α 5 = 2/3 , R6 with firing level α 6 = 1/3 , R8 with firing level α 8 = 2/3 and R9 with firing level α 9 = 1/3 . Working with I L implication the fired rules give the following crisp values as output: 133 z5 = 5, z6 = 8/3, z8 = 23/3, z9 = 5; the overall crisp control action for I L is z0 ( I L ) = 5.5 Working with I R implication the firing rules give the following crisp values as output: z5 = 5, z6 = 2.5, z8 = 7.5, z9 = 5; its overall crisp action is z 0 ( I R ) = 5.416. Because λ ( I R ) = 11/13 and λ ( I L ) = 1 , the overall crisp action given by system is z 0 = 5.4615 The Mamdani model is characterized by: • Mamdani's minimum operator is used in order to compute the firing levels of the rules and to model the fuzzy implication • the maximum operator is used to compute the overall system output from individual rules outputs. Applying this model for our example one obtain the following results: • the firing levels are: α 5 = 2/3, α 6 = 0.5 , α 8 = 2/3 , α 9 = 0.5 • the crisp rules outputs are: z5 = 5, z6 = 5.25/2, z8 = 23/3, z9 = 5. • the overall crisp action is: z0 = 23/3 = 7.66 In we use the Center-of-Gravity ( instead of maximum operator) to compute the overall crisp action we obtain z0 = 5.253 We observe an important difference between these two results and also between these results and those given by our method. An explanation consists in the small value of the "strength" of Mamdani's implication in comparison with the values associated with Reichenbach and Lukasiewicz 134 implications; the strength of an implication is a measure of its quality. From different implications, different results will be obtained if separately implications will be used. Our system offers a possibility to avoid this difficulty, by aggregation operation which achieves a "mediation" between the results given by various implications. 4.6 FUZZY CLASSIFIER SYSTEM In a fuzzy classifier system each fuzzy if-then rule is treated as an individual classifier. A heuristic method for generating fuzzy if-then rules from training patterns and a fuzzy reasoning method that assigns a class label to each unseen pattern using the generated fuzzy rules are presented ( ) (Nakashima, 2000). We assume that m real vectors x p = x1p , x 2p , L , x np are given as patterns from c classes ( c ≤ m ). The pattern space is [0,1] n and therefore attribute values of each pattern are x ip ∈ [0, 1] n for p ∈ {1, L , m} and i ∈ {1, L , n}. We use rules of the following form R j : if x1 is A1j and L and xn is A nj then Class C j with CF j where R j is the label of the j -th rule, A1j , …, A nj are antecedent fuzzy sets on the unit interval [0 , 1] , C j is the consequent class and CF j is the grade of certainty of the fuzzy rule R j . In computer simulations one use a typical set of linguistic values as antecedent fuzzy sets. The membership function of each linguistic value is specified by homogeneously partitioning the domain interval [0 , 1] of each attribute into the symmetric triangular fuzzy sets. The consequent class C j and the grade of certainty CF j are determined by the following heuristic procedure (Ishibuchi, 135 Nozaki & Tanaka, 1992; Ishibuchi, Nozaki, Yamamoto & Tanaka 1995; Ishibuchi, Murata & Türkşen, 1997) Determination of C j and CF j Step 1. Calculate the compatibility grade of each training pattern ( x p = x1p , x 2p , L , x np ) with the fuzzy if-then rule R j by the following operation: μ j (x p ) = μ 1j (x1p )× L × μ nj (x np ) Step 2. For each class, calculate the sum of the compatibility grades of the training patterns with the fuzzy if-then rule R j : β class h (R j ) = ∑ x p ∈class h μ j (x p ), h = 1, 2, L , c Step 3. Find class ĥ j that has the maximum value of β class h (R j ) : β class ĥ (R j ) = max{β class h1 (R j ), L , β class hc (R j )}. j If two or more classes take the maximum value (i. e. the consequent class can not be determined uniquely) or if there is no training pattern compatible with the fuzzy rule R j (i. e. β class h (R j ) = 0 for h = 1, 2 , L , c ) the consequent class C j will be ∅ . If a single class takes the maximum value, let C j be the class ĥ j . Step 4. If the consequent class C j is ∅ then the grade of certainty of the rule R j will be CF j = 0 . Otherwise it is determined as follows: CF j = β class ĥ (R j ) − β j ∑ β class h j (R j ) c h =1 136 , β = ( ) 1 c ∑ β class h R j c − 1 h =1 h≠h j The fuzzy if-then rules with certainty grades are different from the standard ones that are usually used in control problems and functions approximation problems. The following traditional type of fuzzy if-then rules is also applicable to pattern classification problems: R j : if x1 is A1j and L x n is A nj then y1 is β 1j and L y c is β cj where y k is the possibility grade of the occurrence of class k and β jk is a consequent fuzzy set. Instead of the consequent fuzzy set β jk one can use a singleton fuzzy set (i.e., a real number b kj ) or a linear function of input variables (i. e. b kj ,0 + b kj ,1 x1 + L + b kj ,n xn ). Fuzzy reasoning When the antecedent of each fuzzy rule is given one can determine the consequent class and the grade of certainty by the heuristic procedure above presented. An input pattern x is classified by a fuzzy reasoning method based on a single winner rule. The winner rule R ˆj is determined as μ ˆj ( x ) ⋅ CFˆj = max{μ j ( x ) ⋅ CF j / R j ∈ S } where S is the set of fuzzy if-then rules. If many fuzzy if-then rules have the same maximum product but different consequent classes for the input pattern x the classification is rejected. The classification is also rejected if no fuzzy rule is compatible with the input pattern x (i.e. μ j (x) = 0 for ∀R j ∈ S ). When many fuzzy if-then rules with the same consequent class have the same value, we assume that the rule with the smallest index j is the winner. 137 Fuzzy-genetic classifier system The system is based on the heuristic rule generation procedure and on genetic operations used for generating a combination of antecedent fuzzy sets for each fuzzy if-then rule. The outline of the system is as follows: Step 1. Generate an initial population of fuzzy if-then rules Step 2. Evaluate each fuzzy if-then rule in the current population Step 3. Generate new rules by genetic operations Step 4. Replace a part of the current population with the newly generated rules Step 5. Terminate the algorithm if a pre-specified stopping condition is satisfied, otherwise return to Step 2. ) Coding of fuzzy if-then rules Because the consequent class and the grade of certainty of each rule are determined by the heuristic procedure, only the antecedent fuzzy sets are altered by genetic operations. The system uses five linguistic values and “don’t care”, denoted by the following six symbols (1,2,3,4,5, #): →1 S: small MS: medium small → 2 M: medium →3 ML: medium large → 4 L: large →5 →6 DC: don’t care For example, the string “1#3#” denotes the following fuzzy if-then rule for a four-dimensional pattern classification problem: If x1 is small and x2 is don’t care and x3 is medium and x4 is don’t care then class C j with CF j . 138 Because the conditions with “don’t care” can be omitted, this rule is rewritten as follows: If x1 is small and x3 is medium then class C j with CF j . Initial population The initial population of N pop rules are generated by randomly selecting their antecedent fuzzy set from the six symbols corresponding to the five linguistic values and “don’t care”. Each symbol is randomly selected with the probability of 1 6 . The consequent class C j and the grade of certainty CF j of each fuzzy if-then rule are determined by the heuristic procedure. Evaluation of each rule An unit reward is assigned to the winner rule when a training pattern is correctly classified by that rule. After all the training patterns are examined, the fitness value of each rule is defined by the total reward assigned to that rule: fitness (R j ) = NCP (R j ) , where NCP (R j ) is the number of training patterns that are correctly classified by R j . Generating new population of rules In order to generate new fuzzy if-then rules, a pair of rules is selected from the current population. The selection is made using the probability based on the roulette wheel selection with linear scaling: P (R j ) = fitness (R j ) − fitness min (S ) ∑ ( fitness(R ) − fitness (S )) Ri ∈S j min 139 where fitness min (S ) is the minimum fitness value of the fuzzy rules in the current population S . From the selected pair, two rules are generated by uniform crossover for the antecedent fuzzy sets as in the following example * * ⇒ 1 # 4 1 5 1 # 3 2 5 # 2 4 1 # # 2 3 2 # where * denotes the crossover position. The mutation operator changes a value by another, randomly selected: * 1 2 3 2 # ⇒ 1 2 3 5 # The consequent class and the grade of certainty of each of the newly generated fuzzy if-then rules are determined by the heuristic procedure. These genetic operations are iterated until a pre-specified number of rules are newly generated; let N rep be this number. The worst N rep rules (with the smallest fitness values) are removed from the current population and the newly generated rules are added. In this way a population of rules are generated. Final solution Frequently, the number of generation is used as a stopping condition for terminating the execution of fuzzy classifier system. The final solution is the rule set with the maximum classification rate for training pattern over all generations. This fuzzy classifier system can be written as the following algorithm: Step1. Generate initial population of N pop fuzzy if-then rules by randomly specifying antecedent fuzzy sets of each rule. The consequent class and the grade of certainty are determined by the heuristic procedure. 140 Step 2. Classify all the given training patterns by the fuzzy if-then rules in the current population, calculating the fitness value of each rule. Step 3. Generate N rep fuzzy if-then rules from the current population by selection, crossover and mutation operators. The consequent class and the grade of certainty are determined by the heuristic procedure. Step 4. Replace the worst N rep fuzzy if-then rules with the newly generated rules. Step 5. Terminate the algorithm if a pre-specified stopping condition is satisfied, otherwise return to Step 2. Numerical example An example for a classification problem with two attributes (i.e., x1 and x2 ) and two classes is presented in (Nakashima, 2000). class 1 class 2 1.0 x 2 0.5 0.0 0.5 x1 1.0 The rules have the form R j : If x1 is A1j and x2 is A 2j then class C j with CF j 141 where the antecedent fuzzy sets are selected from the five linguistic values from the following figure and “don’t care”: 1 S MS M ML L 1 0 ⎧1 if 0 ≤ x ≤ 1 ⎩0 otherwise μ don' t care (x ) = ⎨ The total number of fuzzy if-then rules is 36 and the problem is to find a compact rule set with high classification performance. From the following parameters Population size: N pop = 1, 2 , L Crossover probability: 1.0 Mutation probability: 0.1 Number of replaced rules in each population N rep = 1 Stopping condition: 1000 generations the solution obtained is If x1 is medium small then class 1 with CF = 0.90 If x1 is medium and x2 is small then class 1 with CF = 0.85 If x1 is medium and x2 is medium small then class 1 with CF = 0.79 If x1 is large then class 2 with CF = 1.0 If x1 is medium large then class 2 with CF = 0.87 If x1 is medium and x2 is medium large then class 2 with CF = 0.81 142 REFERENCES Alsina, C., Trillas, E. & Valverde, L. (1980). On non-distributive logical connectives for fuzzy sets theory. BUSEFAL, 3, 18-29. Baldwin, J. F. (1987). Evidential Support Logic Programming. Fuzzy Sets and Systems, 24, 1-26. Bauer, M. (1996). Approximations for Decision Making in the DempsterShafer Theory of Evidence, UAI, 73-80 Bellman, R. E. & Giertz, M. (1973). On the analytic formalism of the theory of fuzzy sets. Information Sciences, 5, 149-156. Buisson, J. C. Farreny, H. & Prade, H. (1986). Dealing with Imprecision and Uncertainty in the Expert System DIABETO III. In Proc. of the 2nd Int. Conf. on Artificial Intelligence (pp. 705-721). Paris, France: Hermes. Bourke, M. M. (1995). Self-learning predictive control using relationalbased fuzzy logic, Thesis, University of Alberta, Edmonton, Alberta De Campos, L. M., Lamata, M. T. & Moral, S. (1990). The concept of conditional fuzzy measure. International Journal of Intelligent Systems, 5, 237-246. Czogala, E. & Leski, L. (2001). On equivalence of approximate reasoning results using different interpolations of fuzzy if-then rules. Fuzzy Sets and Systems, 117, 279-296. Demirly, K. & Turksen, I. B. (1992). Rule Break up with Compositional Rule of Inference. In IEEE International Conference on Fuzzy Systems (pp. 949-956). San Diego, CA. Dempster, A. P.(1967). Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Statist. 38, 325-339. 143 Desprès, S.(1986). Un apport à la conception des systémes à base de connaisances: Les opérations de deduction floues. Thèse d’Université, Université Paris VI, France. Dubois, D, & Prade, H. (1979). Fuzzy real algebra: Some results. Fuzzy Sets and Systems, 2, 327-348. Dubois, D, & Prade, H. (1980). Fuzzy Sets and Systems: Theory and Applications. San Diego, USA: Academic Press, Inc. Dubois, D. & Prade. H. (1982). On several representations of an uncertain body of evidence. In M. M. Gupta & E. Sanchez (Eds.), Fuzzy Information and Decision Processes (pp. 167-181), Amsterdam, Holland: North-Holland. Dubois, D. (1983). Modeles Mathématique de l’Impresis and l’Incertain en Vue d’Applications aux Techniques d’Aide a la Decision. Thèse d’Etat, Grenoble. Dubois, D. & Prade. H. (1985a). Evidence measures based on fuzzy information. Automatica, 21(5), 547-562. Dubois, D. & Prade. H. (1985b). The generalized modus ponens under sup-min composition. A theoretical study. In M. M. Gupta, A. Kandel, W. Bandler & J. B. Kiszka (Eds.), Approximate Reasoning in Expert Systems (pp. 217-232). North-Holland. Dubois, D. & Prade. H. (1986a). On the unicity of Dempster rule of combination. International Journal of Intelligent Systems, 1, 133142. Dubois, D.& Prade, H.(1986b). A Set-Theoretic View of Belief Functions. International Journal of General Systems, 12, 193-226, Vol.12. Dubois, D.& Prade, H.(1987). Théorie des Possibilites. Applications à la representation des connaisances en informatique. Paris, Masson. 144 Dubois, D.& Prade, H.(1988). Representation and combination of uncertainty with belief functions and possibility measures. Computational Intelligence, 4, 244-264. Fagin, R. & Halpern, J. Y. (1989). A new approach to updating beliefs. (Research report n 0 RJ 7222) San Jose, USA: IBM Research Division, Almaden Research Center Fodor, J. C. (1991). On fuzzy implication operators. Fuzzy Sets and Systems, 42, 293-300. Fullér, R. (1995). Neural Fuzzy Systems. Åbo, Finland: Åbo Akademi University. Fullér, R. (1998). Fuzzy Reasoning and Fuzzy Optimization. Turku, Finland: TUCS General Publications. Garibba, S. F. & Servida, A. (1988). Evidence Aggregation in Expert Juggements. IN Proc. of 2nd Int. Conf on Information Processing and Management of Uncertainty in Knowledge-Based Systems – IPMU (pp. 385-400). Urbino, Italy: Springer Verlang. Graham, B.P., & Newel, R.B.(1989). Fuzzy Adaptive Control of a FirstOrder Process. Fuzzy Sets and Systems, Vol. 31, pp. 47-65. Fullér, R. (2000). Introduction to Neuro-Fuzzy Systems. Berlin, Germany: Springer-Verlang . Iancu, I. (1997a). T-norms with threshold. Fuzzy Sets and Systems, 85, 83-92. Iancu, I. (1997b). Introduction of a double threshold in uncertainty management. In I. Plander (Ed.), Proc. of Seventh Int. Conf. on Artif. Intelligence and Information-Control Syst. of Robots (pp. 10-14), Smolenice Castle, Slovakia: World Scientific, Singapore. Iancu, I. (1997c). PROSUM – Prolog System for Uncertainty Management. Int. Journal of Intelligent Systems, 12(9), 615-627. 145 Iancu, I. (1997d). Reasoning System with Fuzzy Uncertainty. Fuzzy Sets and Systems, 92, 51-59. Iancu, I. (1998a). A method for constructing t-norms. Korean J. Comput. & Appl. Math., 5(2), 407-414. Iancu, I. (1988b). Some applications of Pedrycz’s operator. Computers and Artificial Intelligence, 17(1), 83-97. Iancu, I. (1998c). Propagation of uncertainty and imprecision in knowledge-based systems, Fuzzy Sets and Systems. Int. J. of Soft Computing and Intelligence 94, 29-43 Iancu, I. (1999a). Fuzzy connectives with applications in uncertainty management. In Proc. of the 3-rd annual meeting of the Romanian Society of Math. Science (pp. 40-47), Craiova: University of Craiova. Iancu, I. (1999b). On a family of t-operators. Annals of the Univ. of Craiova. Mathematics-Computer Science Series, XXVI, 84-92. Iancu, I. (2000). Uncertain Reasoning System. In A. Kent, J. G. Williams (Eds). Encyclopedia of Computer Science and Technology , vol 48 (28) (pp. 359-372), Marcel Deker, Inc., New York Iancu, I. (2003). On a Representation of an Uncertain Body of Evidence, Annals of the Univ. of Craiova, Mathematics-Computer Science Series, XXX( 2) , 100-108. Iancu, I. (2005). Operators with n-thresholds for uncertainty management. J. Appl. Math. & Computing. An Int. Journal, 19(1-2), 1-17. Publisher: Springer Berlin / Heidelberg Iancu, I. (2008a) Generalized Modus Ponens Using Fodor's Implication and a Parametric T-norm, WSEAS Transaction on Systems 6(vol. 7) 738-747 146 Iancu, I. (2008b). Generalized Modus Ponens Reasoning for Rules with Partial Overlapping Between Premise and Observation, European Computing Conference, Malta, pp. 37-43 Iancu, I. (2008c). An Approximation of Basic Assignment Probability in Dempster-Shafer Theory, Proc. of 8-th Int. Conf. on Artificial Intelligence and Digital Communications, Craiova, September 2008, pp. 116-123 Iancu, I. (2009a) Extended Mamdani Fuzzy Logic Controller, The Fourth IASTED International Conference on Computational Intelligence ~CI 2009~ August 17 - 19, 2009 Honolulu, Hawaii, USA, ACTA Press, pp. 143-149 Iancu, I. & Colhon, M. (2009b). Mamdani FLC with various implications, 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing - SYNASC 09 - Timisoara, Romania, September 26-29, 2009, will be published by IEEE Computer Society's Conference Publishing Service Iancu, I. (2009c). Generalized Modus Ponens Using Fodor’s Implication and T-norm Product with Threshold, Int. J. of Computers, Communications & Control, Vol. IV (2009), No. 4, pp. 330-343 Inuiguchi, M., Fu, K. S. & Ichihashi, H. (1990). Properties of possibility and necessity measures constructed by Gödel implication. In Third Int. Conf. IPMU – Information Processing and Management Uncertainty in Knowled1ge-Based Systems (pp. 358-360), Paris. Ishibuchi, H., Nozaki, K. & Tanaka, H. (1992). Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy Sets and Systems, 52(1), 21–32. Ishibuchi, H., Nozaki, K., Yamamoto, N. & Tanaka, H. (1995). Selecting fuzzy if-then rules for classification problems using genetic algorithms. IEEE Trans. on Fuzzy Systems, 3( 3), pp. 260–270. 147 Ishibuchi, H., Murata, T. & Türkşen, I. B. (1997). Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification problems. Fuzzy Sets and Systems, 89(2) pp. 135–149. Jager, R. (1995). Fuzzy Logic in Control, Thesis Technische Universiteit Delft Karr, C. (1991a). Genetic Algorithms for Fuzzy Controllers. AI Expert, 2, pp. 26-33 Karr, C.(1991b). Applying Genetics to Fuzzy Logic. AI Expert, 3, pp. 38-43. Klir, C. J. & Yuan, B.(1995). Fuzzy Sets and Fuzzy Logic. Theory and Applications. New Jersey, USA: Prentice Hall PTR. Lefevre, E., Colot, O. & Vannoorenberghe, P.(2000). Belief functions combination and conflict management. Information Fusion Journal, 3(2), 149-162. Lebailly, J., Martin-Clouaire, R. & Prade, H. (1987). Use of Fuzzy Logic in a Rule Based System in Petroleum Geology. In E. Sanchez & L. A. Zadeh (Eds.), Approximate Reasoning in Intelligent Systems, Decision and Control (pp. 125-144). Oxford, UK: Pergamon Press. Leung, K. S. & Lam, W. (1989). A Fuzzy Expert System Shell Using Both Exact and Inexact Reasoning. Journal of Automated Reasoning, 5, 207-233. Leung, K. S., Wong, W. S. F. & Lam, W. (1989). Applications of a novel fuzzy expert system shell. Expert Systems, 6(1), 2-10. Ling, C. H. (1965). Representation of associative functions. Publ. Math. Debrecen, 12, 189-212. Liu, F., Geng, H., Zhang, YQ (2005) Interactive Fuzzy Interval Reasoning for smart Web shopping. Applied Soft Computing 5, 433-439 148 Lowrance, J., Garvey, T. & Strat, T. (1986). A Framework for EvidentialReasoning Systems, In Proc. of the 5th Nat. Conf. of the American Association forArtificial Intelligence, 896-903 Mamdani, E.H. & Assilian, S.(1975). An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller. International Journal of Man-Machine Studies, Vol. 7, pp. 1-13. Martin-Clouaire, R. & Prade, H. (1985). SPII-1: A simple inference engine capable of accommodating both imprecision and uncertainty. In G. Mitra (Ed.), Computer_Assisted Decision Making (pp. 117-131). Amsterdam, Holland: North-Holland. Mizumoto, M. (1985). Extended fuzzy reasoning. In M. M. Gupta et al.(Eds.), Approximate Reasoning in Expert Systems (pp. 71-85). Amsterdam, Holland: North-Holland. Mizumoto, M & Zimmerman, H.-J.(1982): Comparison on fuzzy reasoning methods, Fuzzy Sets and Systems, 8, 253-283. Murphy, C.K.(2000). Combining belief functions when evidence conflicts. Decision Support Systems, 29, 1-9. Nakashima, T. (2000). Fuzzy Genetics-Based Machine Learning for Pattern Classification. Doctoral dissertation, Osaka Prefecture University, Japan. Orponen, P. (1990). Dempster' Rule of Combination is # P-complete, Artificial Intelligence, 44, 245-253 Pacholczyc, D. (1987). Introduction d’un seul dans le calcul de l’incertitude en logique floue. BUSEFAL, 32, 11-18. Planchet, B. (1989). Credibility and Conditioning. Journal of Theoretical Probability, 2, 289-299. Pedrycz, W. (1983). Some Applicational Aspects of Fuzzy Relational Equations in Systems Analysis. International Journal of General Systems, Vol. 9, pp. 125-132. 149 Prade, H. (1985). A computational approach to approximate and plausible reasoning with applications to expert systems. IEEE Trans. Pattern Analysis & Machine Intelligence, 7(3), 260-183. Schweizer, B. & Sklar, A. (1960). Statistical metric spaces. Pacific J. Math., 10, 313-334. Singer, D. (1990). A fuzzy set approach to fault tree and reliability analysis. Fuzzy Sets and Systems, 34, 145-155. Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton, USA: Princeton Univ. Press. Sugeno, M.(1974). Theory of Fuzzy Integral and its Applications. Ph. D. Thesis, Inst. of Technology, Tokyo. Smarandache, F. & Dezert, J. (Eds). (2004). Advances and Applications of DSmT for Information Fusion. Rehoboth, USA: American Research Press. Smets, P. (1981). The degree of belief in a fuzzy event. Information Sci., 25, 1-19. Smets, P. (1988). Belief Functions versus Probability Functions, in B. Buchon, L. Saitta, R. Yager (Eds.): Uncertainty and Intelligent Systems, vol. 313 of Lecture Notes in Computer Science, Springer, 17-24 Smets, P. (1993). Belief functions: the disjunctive rule of combination and the generalized Bayesian theorem. International Journal of Approximate reasoning, 9, 1-35. Smets, P.(2000). Data Fusion in the Transferable Belief Model. In Proceedings of the 3rd International Conference on Information Fusion (pp PS21-PS33). Paris, France. Smets, P. & Kennes, R.(1994). The transferable belief model. Artificial Intelligence, 66(2), 191-234. 150 Suppes, P. & Zanotti, M. (1977). On using random relations to generate upper and lower probabilities. Syntese, 36, 427-440. Takagi, T. & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybernet., 116-132. Tessem, B. (1993). Approximations for efficient computation in the theory of evidence, Artificial Intelligence, 61, 315-329. Trillas, E. (1979). Sobre funciones de negacion en la teoria de conjuntos difusos. Stochastica, 3(1), 47-59. Voorbraak, F. (1989). A Computationally Efficient Approximation Dempster-Shafer Theory, Int. Journal of Man-Machine Studies, 30, 525-535. Yager, R. R. (1983a). Some relationships between possibility, truth and certainty. Fuzzy Sets and Systems, 11, 151-156. Yager,R. R. (1983b). Hedging in the combination of evidence. Journal of Information and Optimization Science, 4(1), 73-81. Yager, R. R.(1985). On the relationships of methods of aggregation of evidence in expert systems. Cybernetics and Systems, 16, 1-21. Yager, R.R.(1987). On the Dempster-Shafer framework and new combination rules. Information Sciences, 41, 93–138. Yen, J. (1992). Computing generalized belief functions for continuous fuzzy sets. International Journal of Approximate Reasoning, 6, 131. Zadeh, L. A. (1965). Fuzzy Sets. Information and Control, 8, 338-353. Zadeh, L. A. (1973). Outline of a new approach to the analysis of complex systems and decision processes. IEEE Transanctions on Systems, Man and Cybernetics, 3, 28-44. 151 Zadeh, L. A. (1975a, 1975b, 1975c). The concept of linguistic variable and its application in approximate reasoning. Inform. Sci., 8, 199249; Inform. Sci., 8, 301-357; Inform. Sci., 9, 43-80. Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1, 3-28. Zadeh, L. A. (1979). A theory of approximate reasoning. In J. E. Hayes, D. Michie & L. I. Mikulich (Eds.). Machine Intelligence, 9 (pp. 149-194). New York: Elsevier. Zimmermann, H.-J. & Roubens. J. (Eds.) (1990). Progress in Fuzzy Sets and Systems. Boston, USA: Kluwer Academic Publishers. Zimmermann, H.-J. (1991). Fuzzy Sets Theory – and its Applications. Boston, USA: Kluwer Academic Publishers. 152 CALCUL EVOLUTIV 1 INTRODUCERE ÎN CALCULUL EVOLUTIV 1.1 Specificul calculului evolutiv În matematică optimizarea este înţeleasă ca însemnând găsirea unei soluţii optime. În acest scop s-au obţinut rezultate importante în calculul diferenţial, calculul variaţional, controlul optimal, cercetări operaţionale. Drumul de la rezultatele teoretice, referitoare la teoreme de existenţă, unicitate, caracterizare a soluţiei, etc., la optimizarea efectivă este de multe ori prea lung, fie datorită complexităţii prea mari a problemelor reale faţă de modelul matematic utilizat, fie datorită complexităţii (timp, memorie) algoritmilor utilizaţi. La mijlocul anilor '70, odată cu creşterea performanţelor calculatoarelor şi, implicit, a complexităţii problemelor reale ce se puteau rezolva cu ajutorul calculatorului, au devenit frecvente situaţiile în care modelele clasice de optimizare nu mai conduceau la soluţii acceptabile pentru probleme modelabile pe calculator. Tot mai 2 frecvent, probleme din biologie, climatologie, chimie, mecanică, analiza datelor, etc., probleme ale căror modele includ sute sau mii de variabile, ale căror funcţii de optimizat prezintă multiple optime locale şi neregularităţi nestudiate din punct de vedere numeric, rămâneau nerezolvate sau cu soluţii aproximate grosier. Studiindu-se excelenta adaptare a fiinţelor vii, în ceea ce priveşte forma, structura, funcţiile şi stilul de viaţă, numeroşi cercetători au ajuns la concluzia că natura oferă soluţii optimale la problemele sale, soluţii superioare oricăror performanţe tehnologice. S-a demonstrat, chiar matematic, optimalitatea unor sisteme biologice: raportul diametrilor ramificaţiilor arterelor, poziţia punctelor de ramificare a vaselor de sânge, valoarea hematocritului (procentul volumului particolelor solide din sânge). În consecinţă, au apărut primele încercări de imitare a procesului de evoluţie naturală. Încă din perioada anilor 1950, oameni de ştiinţă precum Turing şi von Neumann au fost interesaţi în modelarea şi înţelegerea fenomenelor biologice în termeni de procesări naturale ale informaţiei. Începutul erei calculatoarelor a promovat tendinţa de simulare a proceselor şi a modelelor naturale şi a condus la dezvoltarea unor modele evolutive artificiale. În anul 1970 profesorii Hans-Paul Schwefel (Dortmund) şi Ingo Rechenberg (Berlin) având de rezolvat o problemă de mecanica fluidelor, referitoare la optimizarea formei unui corp ce se deplasează într-un fluid, au căutat o nouă tehnică de optimizare deoarece metodele cunoscute până în acel moment nu conduceau la o soluţie acceptabilă. Ideea lor a întruchipat conjectura lui Rechenberg, rămasă 3 până azi justificarea fundamentală a aplicării tehnicilor evolutive: ''Evoluţia naturală este, sau cuprinde, un proces de optimizare foarte eficient, care, prin simulare, poate duce la rezolvarea de probleme dificil de optimizat''. Modelul de simulare propus de Rechenberg şi Schwefel [71, 76, 77] este cunoscut azi sub numele de ''strategii evolutive'' şi iniţial se aplica doar problemelor de optimizare de variabilă continuă. Soluţiile candidat x se reprezintă în virgulă mobilă iar individul i , căruia i se aplică procesul evolutiv, constă din această reprezentare şi dintr-un parametru al evoluţiei, notat , reprezentat tot in virgulă mobilă: i x, . La fiecare pas soluţia curentă este modificată pe fiecare componentă conform lui şi în cazul unei îmbunătăţiri este înlocuită cu cea nou obţinută. Parametrul joacă rolul pasului din metodele iterative clasice şi este astfel folosit încât să fie respectat principiul "mutaţiilor mici". Pentru strategiile evolutive au fost dezvoltate scheme de adaptare a parametrilor de control ( autoadaptare ). A doua direcţie de studiu s-a conturat la Universitatea San Diego; punctul de pornire a fost tot simularea evoluţiei biologice iar structura de date aleasă a fost maşina cu număr finit de stări. Urmând această abordare, Fogel [25, 28] a generat programe simple, anticipând "programarea genetică". Populaţia este reprezentată de programe care candidează la rezolvarea problemei. Există diverse 4 reprezentări ale elementelor populaţiei, una dintre cele mai utilizate fiind aceea în care se utilizează o structură arborescentă. În anumite aplicaţii, cum este regresia simbolică, programele sunt de fapt expresii. De exemplu, programul expresie " a b * c " poate fi descris prin a b c . O astfel de structură poate fi uşor codificată în Lisp, astfel că primele implementări din programarea genetică foloseau acest limbaj. După mai mulţi ani de studiere a simulării evoluţiei, John Holland de la Universitatea Michigan a propus în 1975 [44] conceptul de "algoritm genetic". Au fost abordate probleme de optimizare discretă iar structura de date aleasă a fost şirul de biţi. Într-o accepţiune strictă, noţiunea de algoritm genetic se referă la modelul studiat de Holland şi de studentul său De Jong. Într-un sens mai larg, algoritm genetic este orice model bazat pe ideea de populaţie şi care foloseşte operatori de selecţie şi recombinare pentru a genera noi puncte în spaţiul de căutare. O altă direcţie o constituie „programarea evolutivă”. Iniţial, ea a avut ca obiectiv dezvoltarea unor structuri automate de calcul printr-un proces de evoluţie în care operatorul principal este cel de mutaţie. Bazele domeniului au fost puse de către Fogel [28]. Ulterior, programarea evolutivă a fost orientată către rezolvarea problemelor de 5 optimizare având aceeaşi sferă de aplicabilitate ca şi strategiile evolutive. Calculul evolutiv foloseşte algoritmi ale căror metode de căutare au ca model câteva fenomene naturale: moştenirea genetică şi lupta pentru supravieţuire. Cele mai cunoscute tehnici din clasa calculului evolutiv sunt cele amintite anterior: algoritmii genetici, strategiile evolutive, programarea genetică şi programarea evolutivă. Există şi alte sisteme hibride care încorporează diferite proprietăţi ale paradigmelor de mai sus; mai mult, structura oricărui algoritm de calcul evolutiv este, în mare măsură, aceeaşi. Calculul evolutiv este un domeniu al calculului inteligent în care rezolvarea unei probleme este văzută ca un proces de căutare în spaţiul tuturor soluţiilor posibile. Această căutare este realizată prin imitarea unor mecanisme specifice evoluţiei în natură. În scopul găsirii soluţiei, se utilizează o populaţie de căutare. Elementele acestei populaţii reprezintă soluţii potenţiale ale problemei. Pentru a ghida căutarea către soluţia problemei, asupra populaţiei se aplică transformări specifice, inspirate din evoluţia naturală, precum: Selecţia. Elementele populaţiei care se apropie de soluţia problemei sunt considerate adecvate şi sunt favorizate în sensul că au mai multe şanse de a supravieţui în generaţia următoare precum şi de a participa la generarea de “descendenţi”. 6 Încrucişara. La fel ca în înmulţirea din natură, pornind de la două sau mai multe elemente ale populaţiei (numite părinţi), se generează noi elemente (numite descendenţi). În funcţie de calitatea acestora (apropierea de soluţia problemei) descendenţii pot înlocui părinţii sau alţi indivizi din populaţie. Mutaţia. Pentru a asigura diversitatea populaţiei se aplică, la fel ca în natură, transformări cu caracter aleator asupra elementelor populaţiei, permiţând apariţia unor trăsături (gene) care doar prin încrucişare şi selecţie nu ar fi apărut în cadrul populaţiei. În continuare, un algoritm de rezolvare bazat pe aceste idei va fi numit algoritm evolutiv. Principalele caracteristici ale algoritmilor evolutivi, comparativ cu cei tradiţionali sunt: • sunt algoritmi probabilişti ce îmbină căutarea dirijată cu cea aleatoare; • realizează un echilibru aproape perfect între explorarea spaţiului stărilor şi găsirea celor mai bune soluţii; • în timp ce metodele clasice de căutare acţionează la un moment dat asupra unui singur punct din spaţiul de căutare, algoritmii evolutivi menţin o mulţime (numită populaţie) de soluţii posibile; • algoritmii evolutivi nu acţionează direct asupra spaţiului de căutare ci asupra unei codificări a lui; • sunt mai robuşti decât algoritmii clasici de optimizare şi decât metodele de căutare dirijată; 7 • sunt simplu de utilizat şi nu cer proprietăţi importante ale funcţiei obiectiv precum continuitate, derivabilitate, convexitate, ca în cazul algoritmilor clasici; • furnizează, cu o mare probabilitate, o soluţie apropiată de cea exactă. 1.2 Noţiuni de bază Principalele noţiuni care permit analogia între rezolvarea problemelor de căutare şi evoluţia naturală sunt următoarele: Cromozomul este o mulţime ordonată de elemente, numite gene, ale căror valori determină caracteristicile unui individ. În genetică, poziţiile pe care se află genele în cadrul cromozomului se numesc loci, iar valorile pe care le pot lua se numesc alele. În calculul evolutiv cromozomii sunt, de obicei, vectori ce conţin codificarea unei soluţii potenţiale şi sunt numiţi indivizi. Astfel, genele nu sunt altceva decât elementele acestor vectori. Populaţia. O populaţie este constituită din indivizi care trăiesc într-un mediu la care trebuie să se adapteze. În calculul evolutiv, un individ este de cele mai multe ori identificat cu un cromozom şi reprezintă un element din spaţiul de căutare asociat problemei de rezolvat. 8 Genotipul este ansamblul tuturor genelor unui individ sau chiar a întregii populaţii. În calculul evolutiv genotipul reprezintă codificările corespunzătoare tuturor elementelor populaţiei. Fenotipul este ansamblul trăsăturilor determinate de către un anumit genotip. În calculul evolutiv fenotipul reprezintă valorile obţinute prin decodificare, adică valori din spaţiul de căutare. Generaţia este o etapă în evoluţia unei populaţii. Dacă vedem evoluţia ca un proces iterativ în care o populaţie se transformă în altă populaţie atunci generaţia este o iteraţie în cadrul acestui proces. Selecţia. Procesul de selecţie naturală are ca efect supravieţuirea indivizilor cu grad ridicat de adecvare la mediu (fitness mare). Acelaşi scop îl are şi mecanismul de selecţie de la algoritmii evolutivi şi anume de a favoriza supravieţuirea elementelor cu grad mare de adecvare. Aceasta asigură apropierea de soluţia problemei întrucât se exploatează informaţiile furnizate de către cele mai bune elemente ale populaţiei. Unul dintre principiile teoriei evoluţioniste este acela că selecţia este un proces aleator şi nu unul determinist. Acest lucru este întâlnit în majoritatea mecanismelor de selecţie utilizate de către algoritmii evolutivi. Reproducerea este procesul prin care, pornind de la populaţia curentă, se construieşte o nouă populaţie. Indivizii noii populaţii (generaţii) moştenesc caracteristici de la părinţii lor, dar pot dobândi şi caracteristici noi ca urmare a unor procese de mutaţie care au un caracter întâmplător. În cazul în care în procesul de reproducere 9 intervin cel puţin doi părinţi, caracteristicile moştenite de descendenţi se obţin prin combinarea (încrucişarea) caracteristicilor părinţilor. Mecanismele de încrucişare şi mutaţie asigură explorarea spaţiului soluţiilor prin descoperirea de noi configuraţii. Fitnessul sau adecvarea. În evoluţia naturală fiecare individ al populaţiei este adaptat mai mult sau mai puţin mediului iar unul dintre principiile teoriei evoluţioniste este acela că supravieţuiesc doar cei mai buni indivizi. Fitnessul (adecvarea) este o măsură a gradului de adaptare a individului la mediu. Scopul evoluţiei este ca toţi indivizii să ajungă la o adecvare cât mai bună la mediu, ceea ce sugerează legătura între un proces de evoluţie şi unul de optimizare. În calculul evolutiv, gradul de adecvare al unui element al populaţiei este o măsură a calităţii acestuia în raport cu problema care trebuie rezolvată. Dacă este vorba de o problemă de maximizare atunci gradul de adecvare va fi direct proporţional cu valoarea funcţiei obiectiv (un element este cu atât mai bun cu cât valoarea acestei funcţii este mai mare). Noţiunile de adecvare şi evaluare sunt folosite în general cu acelaşi sens; totuşi se poate face o distincţie între ele. Funcţia de evaluare, sau funcţia obiectiv, reprezintă o măsură a performanţei în raport cu o mulţime de parametri, în timp ce funcţia de adecvare transformă această măsură a performanţei în alocare de facilităţi reproductive. Evaluarea unui şir reprezentând o mulţime de parametri este independentă de evaluarea altor şiruri. Adecvarea unui şir este, 10 însă, definită în raport cu alţi membri ai populaţiei curente; de exemplu, prin fi , unde f i este evaluarea asociată şirului i iar f este f evaluarea medie a tuturor şirurilor populaţiei. La aplicarea unui algoritm evolutiv pentru rezolvarea unei probleme concrete trebuie alese în mod adecvat: modul de codificare a elementelor, funcţia de adecvare şi operatorii de selecţie, încrucişare şi mutaţie. Unele dintre aceste elemente sunt strâns legate de problema de rezolvat, iar altele mai puţin. Structura unui algoritm evolutiv este următoarea Procedure AE begin t : 0 iniţializare P (t ) evaluare P (t ) while (not condiţie terminare) do begin t : t 1 selectare P (t ) din P (t 1) modificare P (t ) evaluare P (t ) end end. 11 1.3. Domenii de aplicabilitate Sistemele evolutive se utilizează atunci când nu există altă strategie de rezolvare a problemei şi este acceptat un răspuns aproximativ. Se utilizează în special atunci când problema poate fi formulată ca una de optimizare, însă nu numai. Algoritmii evolutivi sunt utilizaţi în diverse domenii, precum: Planificare. Majoritatea problemelor de planificare (de exemplu, alegerea traseelor optime ale unor vehicule, rutarea mesajelor într-o reţea de telecomunicaţii, planificarea unor activităţi, etc.) pot fi formulate ca probleme de optimizare cu sau fără restricţii. Multe din acestea sunt de tip NP, necunoscându-se algoritmi de rezolvare care să aibă complexitate polinomială. Pentru astfel de probleme algoritmii evolutivi oferă posibilitatea obţinerii, în timp rezonabil, a unor soluţii sub-optimale de calitate acceptabilă. Proiectare. Algoritmii evolutivi au fost aplicaţi cu succes în proiectarea circuitelor digitale, a filtrelor dar şi a unor structuri de calcul cum sunt reţelele neuronale. Ca metode de estimare a parametrilor unor sisteme ce optimizează anumite criterii, algoritmii evolutivi se aplică în diverse domenii din inginerie cum ar fi: proiectarea avioanelor, proiectarea reactoarelor chimice, proiectarea structurilor în construcţii, etc. 12 Simulare şi identificare. Simularea presupune să se determine modul de comportare a unui sistem pornind de la un model al acestuia. Identificarea este sarcina inversă, a determinării structurii sistemului pornind de la modul de comportare. Algoritmii evolutivi sunt utilizaţi atât în simularea unor probleme din inginerie dar şi din economie. Identificarea unui model este utilă în special în efectuarea unor predicţii în diverse domenii ( economie, finanţe, medicină, ştiinţele mediului, etc.). Control. Algoritmii evolutivi pot fi utilizaţi pentru a implementa controlere on-line asociate sistemelor dinamice (de exemplu pentru a controla roboţii mobili ). Clasificare. Se poate considera că din domeniul calculului evolutiv fac parte şi sistemele de clasificare. Un sistem de clasificare se bazează pe o populaţie de reguli de asociere (reguli de producţie) care evoluează pentru a se adapta problemei de rezolvat (calitatea unei reguli se stabileşte pe baza unor exemple). Evoluţia regulilor are acelaşi scop ca şi la învăţarea în reţelele neuronale. Algoritmii evolutivi sunt aplicaţi cu succes în clasificarea imaginilor, în biologie (pentru determinarea structurii proteinelor) sau în medicină (pentru clasificarea electrocardiogramelor). 2 Operatori de evolutie în A. G. 2.1. Selecția Selecţia are ca scop determinarea populaţiei intermediare ce conţine părinţii care vor fi supuşi operatorilor de încrucişare şi mutaţie precum şi determinarea indivizilor ce vor face parte din generaţia următoare. Criteriul de selecţie se bazează pe gradul de adecvare al indivizilor la mediu, exprimat prin valoarea funcţiei de adecvare. Nu este obligatoriu ca atât părinţii cât şi supravieţuitorii să fie determinaţi prin selecţie, fiind posibil ca selecţia să fie folosită într-o singură etapă. Metoda ruletei Principiul ruletei reprezintă cea mai simplă metodă de selecţie, fiind un algoritm stochastic ce foloseşte următoarea tehnică: • indivizii sunt văzuţi ca segmente continuie pe o dreaptă, astfel încât fiecare segment asociat unui individ este egal cu fitnessul său; • se generează un număr aleator şi individul al cărui segment conţine numărul respectiv va fi selectat • se repetă pasul anterior până când este selectat numărul dorit de indivizi. Această tehnică este similară cu cea a ruletei, în care fiecare segment este proporţional cu fitnessul unui individ. Exemplul 2.1. [98]Tabelul următor arată probabilitatea de selecţie pentru 11 indivizi, folosind aranjarea liniară şi presiunea selectivă 2. Individul 1 este cel mai bun şi ocupă intervalul cel mai mare, în timp ce individul 10 este penultimul în şir şi ocupă intervalul cel mai mic; individul 11, ultimul din şir, având fitnessul 0 nu poate fi ales pentru reproducere. nr 4 individ nr 2 nr 6 nr 5 nr 1 nr 3 Probabilitatea în acest tabel se calculează ca fiind raportul dintre fitnessul unui individ şi suma fitnessurilor tuturor indivizilor. Număr 1 2 3 4 5 6 7 8 9 10 2.0 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0.18 0.16 0.15 0.13 0.11 0.09 0.07 0.06 0.03 0.02 individ Fitness Probabilitate selecţie Dorind să selectăm 6 indivizi, se generează 6 numere aleatoare uniform distribuite în intervalul 0, 1 . Fie acestea 0.81 , 0.32 , 0.96 , 0.01 , 0.65 , 0.42 . Aşezând cei 10 indivizi pe o dreaptă, în ordinea 1, 2, ... ,10, individul 1 va ocupa segmentul cuprins între 0.0 şi 0.18, individul 2 segmentul dintre 0.18 şi 0.34, individul 3 segmentul dintre 0.34 şi 0.49, etc. Deoarece individul 6 corespunde segmentului delimitat de 0.73 şi 0.82 iar primul număr generat (= 0.81) cade în acest segment, individul 6 va fi selectat pentru reproducere. În mod similar se selectează indivizii 1, 2, 3, 5, 9; deci, populaţia selectată este formată din indivizii 1, 2, 3, 5, 6, 9. 2.2. Încrucişarea Încrucişarea permite combinarea informaţiilor provenite de la doi sau mai mulţi părinţi pentru generarea unuia sau mai multor urmaşi. 2.2.1. Încrucişarea binară Acest operator crează descendenţi prin combinarea unor părţi alternative ale părinţilor. 2.2.1.1. Încrucişarea simplă Dacă N este numărul poziţiilor binare ale unui individ, poziţia de încrucişare k 1, 2, , N 1 este selectată uniform şi aleator iar valorile situate la dreapta acestui punct sunt schimbate între cei doi indivizi, rezultând doi descendenţi. părinţi descendenţi Figura 4.1 Ca exemplu [98], considerăm următorii părinţi, având 11 biţi fiecare, şi poziţia de încrucişare 5 p1: 01110011010 p2: 10101100101 Rezultă descendenţii : d1: 0 1 1 1 0| 1 0 0 1 0 1 d2: 1 0 1 0 1| 0 1 1 0 1 0 2.2.1.2. Încrucişarea multiplă În acest caz, se folosesc m 1 puncte de încrucişare ki 1, 2, , N 1 alese aleator, diferite între ele şi sortate crescător. Valorile aflate între puncte de încrucişare consecutive sunt schimbate între cei doi părinţi pentru a obţine doi descendenţi. părinţi descendenţi Figura 4.2 De exemplu [98], indivizii p1: 01110011010 p2: 10101100101 cu 3 puncte de încrucişare date de poziţiile : 2, 6, 10 generează descendenţii d1: 0 1| 1 0 1 1| 0 1 1 1| 1 d2: 1 0| 1 1 0 0| 0 0 1 0| 0 Ideea încrucişării multiple este aceea că părţi ale cromozomilor care contribuie la îmbunătăţirea performanţei unui individ nu este necesar să fie conţinute în subşiruri adiacente. Mai mult, încrucişarea multiplă pare să încurajeze explorarea în spaţiul de căutare mai degrabă, decât să favorizeze convergenţa rapidă către indivizii de elită. 2.2.1.3. Algoritmul de încrucişare Fie P (t ) populaţia curentă şi P s populaţia aleasă în urma operaţiei de selecţie. Asupra indivizilor din P s se aplică operatorul de încrucişare cu probabilitatea p c (indicele c provine de la termenul crossover, folosit în limba engleză pentru încrucişare ). Algoritmul P1. Pentru fiecare individ din P s : • • se generează un număr aleator q 0, 1 dacă q p c atunci individul respectiv este reţinut pentru încrucişare; în caz contrar nu participă la această operaţie. P2. Fie m numărul indivizilor reţinuţi la pasul P1. m perechi. 2 • dacă m este impar atunci, în mod aleator, se şterge un individ • dacă m este număr par, se formează aleator selectat sau se adaugă unul nou la P s , apoi se formează perechile. P3. Perechile formate anterior sunt supuse operaţiei de încrucişare. • pentru fiecare pereche se stabilesc aleator încrucişare k i , 1 k i l , este lungimea unde l punctele de unui cromozom. • se execută încrucişarea pentru perechea curentă, descendenţii devenind membri ai generaţiei urmatoare P (t 1) , iar părinţii se şterg din Pt • se adaugă la P (t 1) indivizii rămaşi în P (t ) . Observaţie Probabilitatea de încrucişare are valori mici, de regulă în intervalul [0.2, 0.95]; p c 0.3 înseamnă că 30% din indivizi vor suferi încrucişări. 2.3. Mutaţia Mutaţia este cel mai simplu operator genetic şi constă în schimbarea aleatoare a unor valori ale cromozomului pentru a introduce noi soluţii. Scopul său este de a împiedica pierderea ireparabilă a diversităţii, evitând, astfel, convergenţa prematură. Diversitatea permite explorarea unor zone largi din spaţiul de căutare. Mutaţia foloseşte ca parametru probabilitatea de mutaţie p m , care ia valori mici; de obicei în intervalul 0.001, 0.01 . 2.3.1. Mutaţia binară Dacă n este dimensiunea populaţiei iar l este lungimea unui cromozom atunci numărul mediu de biţi ce vor suferi mutaţie este N n l p m . Mutaţia binară poate fi implementată sub mai multe forme [16]. 2.3.1.1. Mutaţia tare Pentru fiecare poziţie a fiecărui cromozom se execută paşii: P1: se generează un număr aleator q 0, 1 P2: dacă q p m atunci se schimbă poziţia respectivă, în caz contrar nu se efectuează nimic. 2.3.1.2. Mutaţia slabă Pentru fiecare poziţie a fiecărui cromozom se execută paşii: P1: se generează un număr aleator q 0, 1 P2: dacă q p m atunci se alege aleator una din valorile 0 sau 1 şi se atribuie poziţiei curente, în caz contrar nu se efectuează nimic. 2.3.1.3. Funcționarea algoritmilor genetici Explicăm funcţionarea algoritmilor genetici pentru o problemă de maximizare deoarece minimizarea funcţiei f este echivalentă cu maximizarea funcţiei g f . În plus, presupunem că funcţia obiectiv f ia valori pozitive, în caz contrar putându-se aduna o constantă pozitivă C şi maximizându-se f C . Presupunem că se doreşte maximizarea unei funcţii de k variabile f : Rk R ; fiecare variabilă x i ia valori într-un domeniu Di ai , bi R şi f x1 , , xk 0 pentru xi Di . Cerând o precizie de p zecimale pentru valorile variabilelor, fiecare interval Di va trebui divizat în bi ai 10 p subintervale egale. Fie l i cel mai mic întreg astfel încât bi ai 10 p 2l i 1 . Atunci, o reprezentare a variabilei x i ca un şir binar de lungime l i va satisface precizia dorită. În plus, are loc formula xi ai zecimalstring2 bi ai 2 li 1 unde zecimalstring2 reprezintă valoarea zecimală a şirului binar string . k Acum fiecare cromozom este reprezentat printr-un şir binar de lungime l li ; primii l1 i 1 biţi reprezintă o valoare din a1 , b1 , următorii l 2 biţi reprezintă o valoare din a2 , b2 şi aşa mai departe. Populaţia iniţială constă din n cromozomi aleşi aleator. Totuşi dacă avem informaţii despre optimul potenţial, ele pot fi utilizate pentru a genera populaţia iniţială. În continuare algoritmul funcţionează astfel: • • se evaluează fiecare cromozom al fiecărei generaţii, utilizând funcţia f se selectează noua populaţie conform distribuţiei de probabilitate bazată pe fitness • se modifică cromozomii din noua populaţie prin operatori de mutaţie şi încrucişare • după un număr de generaţii, când nu mai sunt observate îmbunătăţiri substanţiale, cel mai bun cromozom este selectat ca soluţie optimă; deseori algoritmul se opreşte după un număr finit de iteraţii. Pentru procesul de selecţie vom utiliza tehnica ruletei: • se calculează fitnessul evalvi pentru fiecare cromozom vi , i 1, 2, , n se găseşte fitnessul total • n F evalvi i 1 se calculează probabilitatea de selecţie p i pentru fiecare cromozom vi , i 1, 2, , n : • pi evalvi F se calculează probabilitatea cumulată q i pentru fiecare cromozom vi , i 1, 2, , n : • qi i pj j 1 Procesul de selecţie constă în folosirea ruletei de n ori; de fiecare dată se selectează un singur cromozom astfel: se generează un număr aleator a 0, 1 • dacă a q1 , se selectează primul cromozom; altfel se selectează cromozomul vi , 2 i n , • astfel încât qi 1 a qi . Este evident că unii cromozomi vor fi selectaţi de mai multe ori, cromozomii cei mai buni generând mai multe copii. După selecţie se aplică operatorul de încrucişare. Folosind probabilitatea de încrucişare pc se determină numărul pc n de cromozomi supuşi încrucişării. Se procedează astfel, pentru fiecare cromozom din noua populaţie: • se generează un număr aleator a 0, 1 • dacă a pc , cromozomul curent se selectează pentru încrucişare. În continuare se împerechează aleator cromozomii şi pentru fiecare pereche se generează un numar aleator întreg pos 1, l 1 , l fiind lungimea unui cromozom iar pos este poziţia de încrucişare. Doi cromozomi b1b2 b pos b pos 1 bl şi c1c 2 c pos c pos 1 cl sunt înlocuiţi prin descendenţii b1b2 b pos c pos 1 cl şi c1c 2 c pos b pos 1 bl . Se aplică, apoi, operatorul de mutaţie. Probabilitatea de mutaţie p m dă numărul p m l n al biţilor ce vor suferi mutaţie. Pentru fiecare bit al fiecărui cromozom au loc operaţiile: • se generează un număr aleator a 0, 1 • dacă a p m , bitul va suferi mutaţie. În urma operaţiilor de selecţie, încrucişare şi mutaţie, noua populaţie este gata pentru următoarea evaluare. 3 STRATEGII EVOLUTIVE 3.1 Generalităţi Strategiile evolutive (SE) au apărut din necesitatea de a rezolva probleme de optimizare de tipul: “se cere x* D R n cu f x* f x x D , unde f : D R n R şi D este o regiune mărginită determinată de restricţiile impuse asupra lui x ”. Pentru o astfel de problemă strategiile evolutive sunt mai potrivite decât algoritmii genetici deoarece nu necesită codificarea binară a datelor, care are dezavantajul că limitează precizia. Strategiile evolutive reprezintă indivizii sub forma unor vectori cu componente reale şi folosesc mutaţia ca principal operator de evoluţie. În strategiile evolutive avansate reprezentarea indivizilor încorporează în ea şi parametrii de control ai strategiei. Un individ se reprezintă ca o pereche v x, , unde vectorul x este un element din spaţiul de căutare iar 2 este vectorul dispersie; i2 reprezintă dispersia perturbaţiei pe care o suferă componenta xi a vectorului x 14 în procesul de mutaţie. Vectorul reprezintă parametrul de control al strategiei. Primul algoritm de tip strategie evolutivă a fost propus în 1964, la Universitatea Tehnică din Berlin, de către Rechenberg şi Schwefel în scopul rezolvării unei probleme care cerea amplasarea unei conducte flexibile într-o zonă de o anumită formă astfel încât costul să fie cât mai mic. Ideea rezolvării era: pornind de la o aproximaţie curentă se genera alta printr-o perturbaţie aleatoare bazată pe o repartiţie normală şi se alegea cea mai bună dintre cele două. Această strategie, numită 1 1 , nu opera cu populaţii ci urmărea adaptarea unui singur individ sub acţiunea mutaţiei. Limitele acestui model au dus la căutarea unor mecanisme care să implice în evoluţie mai mulţi indivizi [5, 6, 7, 8, 9, 15, 16, 81, 82]. Dintre acestea amintim: • Strategia , : pornind de la cei indivizi ai populaţiei curente, se generează prin încrucişare şi mutaţie indivizi noi, iar dintre aceştia se aleg , care vor forma noua populaţie; dacă 1 , se alege individul cel mai bun ( cel pentru care funcţia obiectiv este cea mai mică). • Strategia : pornind de la cei indivizi ai populaţiei curente, se generează prin încrucişare şi mutaţie un număr de indivizi noi care se adaugă la populaţia curentă; din populaţiile reunite se selectează indivizi care vor forma noua populaţie. Dacă 1 , 15 se foloseşte doar mutaţia pentru a genera indivizi noi, fiind evident că încrucişarea nu poate fi utilizată. Dacă 1 , selecţia urmăreşte eliminarea celui mai slab individ (cel care dă cea mai mare valoare a funcţiei obiectiv) din populaţiile reunite. Este evident că dacă individul nou generat este mai slab decât toate elementele populaţiei curente atunci aceasta rămâne nemodificată. • Strategia , k , , p : este o extindere a strategiei , şi se caracterizează prin: a) k 1 reprezintă durata de viaţă a indivizilor, măsurată în generaţii. Valoarea lui k se micşorează cu 1 la fiecare nouă generaţie iar un individ va fi selectat doar dacă k 0 . b) operaţiile şi încrucişare sunt controlate de de mutaţie probabilităţile pm şi respectiv pc , la fel ca în cazul algoritmilor genetici. c) numărul părinţilor folosiţi în operaţia de încrucişare este p d) se pot folosi şi operatori de încrucişare specifici algoritmilor genetici, ca de exemplu încrucişarea multiplă. • Strategia evolutivă rapidă: este o variantă, încadrată de autorii ei la programarea evolutivă, caracterizată prin: a) foloseşte o specifică mutaţie bazată pe perturbarea elementelor cu valori generate în conformitate cu distribuţia Cauchy a cărei funcţie de densitate este: ( x ) 1 , x 2 2 0 16 Acest tip de perturbaţie prezintă avantajul că poate genera descendenţi îndepărtaţi de părinţi cu o probabilitate mai mare decât perturbaţia normală (asigurând o probabilitate mai mare de evadare din minime locale şi o accelerare a procesului de găsire a optimului). Diferenţa dintre cele două funcţii de densitate (normală şi Cauchy) este ilustrată în figura 8.1. b) perturbaţiile sunt independente iar parametrii i sunt determinaţi prin autoadaptare (cu perturbare log-normală) la fel ca în cazul strategiilor evolutive clasice. c) nu se foloseşte încrucişare nici asupra componentelor propriu-zise nici asupra parametrilor de control. d) selecţia este de tip turneu. Figura 2.1: Funcţiile de distribuţie ale repartiţiilor normală (linie întreruptă) şi Cauchy (linie continuă) 17 3.2 Operatori de evoluţie 3.2.1 Încrucişarea Operatorul de încrucişare selectează, cu o probabilitate uniformă, cei p părinţi. Cel mai des se utilizează cazurile p 2 şi p . Datorită similarităţii dintre reprezentarea indivizilor în strategii evolutive şi în algoritmi genetici, există similaritate şi între tipurile de încrucişare. 3.2.1.1 Încrucişarea discretă În cazul p 2 fie x1 şi x 2 părinţii selectaţi aleator; pentru fiecare componentă i 1, 2, , n se generează un număr aleator qi 0, 1 , urmând distribuţia uniformă. Componenta yi a descendentului va fi x1i dacă qi 0.5 yi xi2 dacă qi 0.5 În cazul când se încrucişează p părinţi yi a descendentului x x1, , x p y este x , , x , componenta 1 componenta ales aleator. În cazul p xi a părintelui p , părintele care dă componenta yi a descendentului y este ales din întreaga populaţie; de aceea încrucişarea se numeşte globală. 18 3.2.1.2 Încrucişarea intermediară În acest caz componenta yi a descendentului y este o combinaţie convexă a componentelor corespunzătoare din cei p părinţi x1, , x p , aleşi aleator: yi p j xij i 1, 2, , n j 1 şi p j 1, j 1 j 0. În general se lucrează cu doi părinţi x1 , x 2 : yi x1i 1 xi2 , i 1, 2, , n cu 0, 1 ales ca fiind valoarea unei variabile aleatoare cu distribuţie uniformă şi păstrând aceeaşi valoare pentru toate componentele. Încrucişarea intermediară poate fi extinsă schimbând părinţii şi parametrul pentru fiecare componentă a descendentului. Toate tipurile de încrucişare se aplică similar şi pentru dispersie sau alţi parametri de control. 19 Experimental, s-a observat că se obţin rezultate bune dacă pentru vectorii de poziţie se utilizează încrucişarea discretă iar pentru parametrii strategiei se foloseşte încrucişarea intermediară. 3.2.2 Mutaţia Mutaţia este cel mai important operator al strategiilor evolutive. Am văzut că un individ este o pereche x, , unde vectorul indică modul în care vectorul de poziţie x se transformă prin mutaţie. Parametrul de control este supus şi el operaţiei de mutaţie. Individul x, se transformă prin mutaţie în x' , ' : x' , ' mutx, . Vectorul x' se obţine după legea x' x N 0, 1 sau, pe componente, x'i xi 'i N i 0, 1 , i 1, 2, , n unde N 0,1 şi N i 0, 1 sunt variabile aleatoare de medie 0 şi dispersie 1. Mutaţia asupra lui acţionează diferit, după cum 1 sau 1 , în modelele , şi . 20 3.3. Funcţionarea strategiilor evolutive Structura generală a unei strategii evolutive este cea a unui algoritm evolutiv, apărând, în funcţie de strategie, unele diferenţe la nivelul operatorilor. 3.3.1 Strategia (1 1) Modelul iniţial al lui Rechenberg consideră populaţia formată dintr-un singur individ supus operatorului de mutaţie. După obţinerea descendentului, cei doi membri ai populaţiei sunt comparaţi între ei prin intermediul funcţiei de adecvare şi este reţinut individul cel mai bun. Folosim următoarele notaţii: • k = un număr de generaţii consecutive; de obicei se ia k 10n , unde n este dimensiunea spaţiului de căutare • s (k ) = numărul mutaţiilor de succes din ultimele k generaţii consecutive • p(k ) s (k ) = frecvenţa mutaţiilor de succes din ultimele k k generaţii • C= constantă, aleasă (la sugestia lui Schwefel [79]) ca fiind 0.817, această valoare fiind obţinută pentru modelul sferei. 21 Algoritmul SE(1+1) begin t : 1 , P(t ) x(t ), (t ) evaluează f (x ) while not cond ( P(t )) do begin calculează p(k ) calculează t nc t 1 mut t t n t c dacă dacă dacă 1 5 1 p(k ) 5 1 p k 5 p(k ) calculează xt 1 mutxt xt t 1 N 0, 1 evaluează f xt 1 if f xt 1 f xt {selecţia} then Pt 1 xt 1, t 1 else P (t 1) : P(t ) t : t 1 end end 22 Observaţii. 1) cond ( P(t )) reprezintă condiţia de oprire, dată de obicei prin numărul de generaţii 2) Schwefel a propus [79] o altă versiune a mutaţiei pentru parametrul t c t n t c t dacă dacă dacă 1 5 1 p(k ) 5 1 pk 5 p(k ) 3) Strategia (1+1) lucrează cu populaţii formate dintr-un singur individ şi nu foloseşte încrucişarea. Regulile, folosite anterior, pentru modificarea parametrului sunt versiuni ale “regulii de succes 1 5 ” propusă de Rechenberg, care afirmă că: • raportul dintre numărul mutaţiilor de succes şi numărul mutaţiilor totale trebuie să fie 1 5 • dacă acest raport este mai mare decât 1 5 atunci valoarea lui trebuie să crească • dacă raportul este mai mic decât 1 5 atunci valoarea lui trebuie să descrească. 23 3.3.2 Strategia ( 1) Strategia (1+1) poate fi generalizată prin mărirea numărului părinţilor fiecărui descendent şi/sau a numărului descendenţilor unui părinte. În strategia 1, 1 părinţi vor genera un singur descendent folosind încrucişarea şi mutaţia. Încrucişarea se aplică atât vectorului de poziţie cât şi dispersiei şi se poate folosi oricare din operatorii prezentaţi anterior. Mutaţia se aplică urmând principiul strategiei 1 1 , dar nu există o metodă de control a dispersiei, regula 1 5 nemaifiind aplicabilă în acest caz. Acest dezavantaj face ca strategia 1 să fie puţin folosită. 3.3.3 Strategii multidescendent Aceste strategii au apărut din dorinţa de a folosi metode mai robuste şi mai generale pentru a controla parametrii mutaţiei. Din această categorie fac parte strategiile şi , care lucrează cu populaţii formate din 1 părinţi şi descendenţi, ceea ce determină o creştere a vitezei de convergenţă. Cei indivizi ai noii generaţii se selectează din : • populaţia intermediară obţinută reunind generaţiei curente cu cei indivizi ai cei descendenţi ai ei, în cazul strategiei 24 • din cei descendenţi ai populaţiei curente, în cazul strategiei , . Deoarece strategia , îşi schimbă complet populaţia la fiecare nouă generaţie, ea nu este elitistă. Această calitate permite ieşirea din zona unui optim local şi evoluţia către optimul global. În schimb, strategia permite supravieţuirea unor indivizi neperformanţi din vechea generaţie. În felul acesta procesul de căutare tinde să favorizeze minimele locale în defavoarea celor globale. De aceea se recomandă folosirea strategiei , . Încrucişarea. Iniţial strategiile şi , au fost aplicate folosind doar operatorul de mutaţie. Ulterior s-a constatat că se obţin rezultate mai bune dacă se foloseşte şi încrucişarea, aplicată înaintea mutaţiei. În mod empiric, s-a ajuns la concluzia că folosirea încrucişării discrete pentru vectorii de poziţie şi a celei convexe pentru parametrii strategiei conduce la cele mai bune rezultate. Operatorul de încrucişare se aplică de ori populaţiei de părinţi, obţinându-se o populaţie intermediară de indivizi. Un descendent se obţine prin încrucişarea a p părinţi, 1 p ; de obicei se ia p 2 sau p . Mutaţia. Considerăm indivizii reprezentaţi prin perechi de forma x, , unde x R n este vectorul de poziţie iar 2 este vectorul dispersie. Notăm cu N 0, 1 un număr aleator ce urmează 25 distribuţia normală de medie 0 şi dispersie 1. Mutaţia standard înlocuieşte individul x, cu x' , ' obţinut după regulile: 'i i e p1 N (0, 1) p 2 N i (0, 1) x'i xi 'i N i 0, 1 cu i 1, 2, , n. Parametrii p1 şi p2 controlează mărimea pasului mutaţiei şi respectiv schimbările individuale. Schwefel [77] a propus pentru aceşti parametri următoarele valori p1 c1 şi p2 2n c2 2 n unde c1 şi c2 iau de obicei valoarea 1. Mutaţia, aşa cum rezultă din formulele anterioare, acţionează întâi asupra dispersiei şi apoi asupra vectorului de poziţie. Se poate lucra cu vectorul dispersie având toate componentele egale; fie valoarea lor comună. În acest caz mutaţia funcţionează după regulile ' e pN ( 0 ,1 ) x'i xi ' N i 0,1 adică toate componentele vectorului de poziţie se modifică folosind aceeaşi dispersie; parametrul p ia valoarea p c . n Strategiile multidimensionale funcţionează după următorul algoritm 26 begin t : 0 iniţializează populaţia P (t ) evaluează indivizii din P (t ) while not cond ( P(t )) do begin P' t : încrucişare ( P (t )) P" t : mutaţie ( P ' (t )) evaluează P" ( t ) P(t 1) : selecţie P" t M t : t 1 end end Populaţia iniţială se construieşte alegând aleator , cu o probabilitate uniformă, puncte x i R n , i 1, 2, , . Dacă se cunoaşte un punct x aflat în vecinătatea celui de optim atunci x va fi unul dintre indivizi iar ceilalţi 1 se obţin prin mutaţii asupra lui. O adaptare eficientă a parametrilor strategiei necesită o diversitate suficient de mare a populaţiei de părinţi. Deci, numărul trebuie să fie mai mare ca 1 iar raportul dintre numărul descendenţilor şi cel al părinţilor trebuie să fie în favoarea descendenţilor. Se recomandă un raport 7 şi, în mod frecvent, se foloseşte strategia 15, 100 . 27 Kursawe [54] a arătat că asigurarea convergenţei este influenţată de operatorul de încrucişare folosit; acesta depinde de forma funcţiei obiectiv, de dimensiunea spaţiului de căutare şi de numărul de parametri ai strategiei. Condiţia de oprire cond se referă, de obicei, la numărul maxim de generaţii dar pot fi luate în consideraţie şi alte criterii, printre care cele de mai jos: 1) diversitatea populaţiei a scăzut sub o anumită limită, semn că ne aflăm în vecinătatea unui optim global; diversitatea poate fi măsurată prin diferenţa calităţilor asociate celui mai bun individ şi celui mai neperformant. 2) nu se mai obţin îmbunătăţiri semnificative ale funcţiei obiectiv Mulţimea M poate lua una din valorile: M P (t ) pentru strategia M pentru strategia , . 3.3.4 Utilizarea mutaţiilor corelate Fiecare individ din populaţie este caracterizat nu numai de vectorul x al variabilelor ci şi de parametrul al strategiei. Reprezentarea poate fi extinsă [72] introducând valorile de varianţă cii i2 ( 1 i n) şi valorile de covarianţă c ij 28 ( 1 i n 1, i 1 j n ) ale distribuţiei normale n dimensionale având densitatea de probabilitate p( z ) det( A) 2 n 1 z T Az e 2 Pentru a asigura pozitiv-definirea matricei de covarianţă A1 cij se utilizează unghiurile de rotaţie j în locul coeficienţilor c ij . Acum un individ se va reprezenta sub forma a x, , unde • xR • n este vectorul variabilelor obiectiv Rn este vectorul deviaţiilor standard ale distribuţiei normale; 1 n n • , n este vectorul unghiurilor ce definesc mutaţiile n corelate ale lui x ; n n n 1 . 2 Vectorul permite căutarea în orice direcţie; altfel erau favorizate direcţiile paralele cu axele sistemului de coordonate. Parametrii şi ai strategiei se modifică prin mutaţie iar tipul acestei operaţii depinde de valorile lui n şi n . Astfel, pentru • n 1 şi n 0 se obţine mutaţia standard, când o singură valoare a deviaţiei standard controlează mutaţia tuturor componentelor lui x . 29 • n n şi n 0 se obţine mutaţia standard, când valorile 1 , , n controlează mutaţia componentelor corespunzatoare ale vectorului x • n n • n 2 şi n n(n 1) se obţin mutaţiile corelate 2 şi n n 1 : valoarea 12 este folosită pentru a efectua căutarea într-o direcţie arbitrară iar 22 este utilizată pentru toate direcţiile perpendiculare pe aceasta. Mutaţia asupra unui individ a x1 , , xn , 1 , , n , 1 , , n acţionează astfel: 'i i e ' N (0, 1) N i (0, 1) , 1 i n ' j j N j (0, 1) , 1 j n x' x cov( , ) unde vectorul cov este calculat astfel: cov Tz , z z1 , , z n , zi N 0, 'i 2 , T n 1 n Tpq ' j , p 1 q p 1 30 j 1 2n p p 1 2n q . 2 Matricea de rotaţie T pq ' j este matricea unitate exceptând t pp tqq cos j şi t pq tqp sin j . Pentru factorii , ' şi , Schwefel a sugerat următoarele valori: c1 2 n , ' c2 , 0.0873 ( 5o ) 2n 3.4. Analiza convergenţei Folosindu-se instrumente din teoria lanţurilor Markov s-au obţinut condiţii suficiente de convergenţă în sens probabilist pentru strategiile evolutive. Condiţii suficiente (nu şi necesare) simplu de verificat sunt: i) repartiţia utilizată pentru mutaţie are suport infinit ( este satisfăcută atât de repartiţia normală cât şi de repartiţia Cauchy); ii) selecţia este elitistă (strategiile de tip ( ) satisfac această proprietate); iii) recombinarea se aplică cu o anumită probabilitate pr. Din punct de vedere practic convergenţa în timp infinit nu este de mare folos ci mai degrabă testează abilitatea algoritmului de a găsi elemente din ce în ce mai bune prin trecerea de la o generaţie la alta 31 (algoritmul progresează în procesul de căutare). O situaţie nedorită este aceea în care acest progres este stopat. Există două manifestări ale acestui fapt: • Convergenţa prematură. Algoritmul se blochează într-un optim local datorită faptului că populaţia nu mai este suficient de diversă pentru a susţine procesul de explorare. • Stagnare. Algoritmul s-a blocat în condiţiile în care populaţia este încă diversă, însă mecanismele evolutive nu sunt suficient de puternice pentru a susţine explorarea. Soluţionarea acestor probleme se bazează pe alegerea adecvată a operatorilor şi a parametrilor de control; încă nu există rezultate teoretice care să furnizeze soluţii de evitare a situaţiilor de convergenţă prematură sau stagnare. Studiul teoretic al vitezei de convergenţă se bazează pe estimarea unor rate de progres în cazul unor funcţii test simple (funcţia sferă şi perturbaţii ale acesteia). Prin estimarea ratei de progres s-au obţinut informaţii referitoare la alegerea parametrilor de control astfel încât rata să poată fi maximizată. Există diverse abordări şi diferite măsuri ale progresului strategiilor evolutive. Din punct de vedere practic este util faptul că strategiile evolutive au cel mult viteză liniară de convergenţă. În absenţa unei teorii complete a domeniului, multe dintre proprietăţile şi regulile de proiectare sunt deduse pornind de la studii 32 experimentale. Acestea se efectuează pe probleme de optimizare construite în aşa fel încât să ridice dificultăţi metodelor de rezolvare (de exemplu cu multe minime locale sau cu un minim global greu de atins din cauza prezenţei unor "platouri"). Multe dintre funcţiile de test utilizate în analiza strategiilor evolutive au fost construite pentru a testa metode tradiţionale de optimizare şi au ridicat dificultăţi pentru acestea. Cum strategiile evolutive implică prezenţa unor elemente aleatoare, la rulări diferite ale algoritmului se vor obţine rezultate diferite. Din acest motiv studiul experimental nu poate fi decât unul statistic caracterizat prin faptul că se vor efectua mai multe rulări independente şi se va determina frecvenţa situaţiilor în care strategia a avut succes. Se consideră că strategia a avut succes atunci când cel mai bun element întâlnit de-a lungul generaţiilor (sau cel mai bun element din ultima generaţie) este suficient de apropiat de optim. Studiile statistice sunt folosite pentru a analiza influenţa operatorilor şi a parametrilor de control asupra eficacităţii strategiei evolutive. Valoarea lor este limitată datorită faptului că rezultatele obţinute pe funcţii test nu pot fi extrapolate pentru orice problemă de optimizare. Coroborate însă cu rezultatele teoretice (obţinute pentru funcţii test simple, cum este modelul sferei) au condus la criterii euristice care au un oarecare succes în practică. Din punct de vedere statistic prezintă interes valoarea medie şi dispersia valorii optime descoperite la fiecare rulare. 33 3.5. Domenii de aplicabilitate Câteva dintre aplicaţiile strategiilor evolutive sunt: • Biologie şi biotehnologie: simularea evoluţiei proteinelor, proiectarea lentilelor optice, optimizarea parametrilor unui model al transmiterii semnalelor genetice bazat pe transcriere a ADN-ului, optimizarea proceselor de fermentare. • Chimie şi inginerie chimică: determinarea compoziţiei optimale de electroliţi în procesele de galvanizare, minimizarea energiei clusterilor în moleculele gazelor rare, estimarea parametrilor modelelor de analiză cinetică a spectrelor de absorbţie, identificarea benzilor în spectrele obţinute prin rezonanţă magnetică nucleară. • Proiectare asistată de calculator: determinarea parametrilor unui amortizor pneumatic de şocuri, optimizarea volumului unor construcţii în vederea minimizării instabilităţii, determinarea formei optimale a unor dispozitive, adaptarea parametrilor unor modele de tip element finit pentru proiectarea optimală a structurilor, optimizarea eficienţei, senzitivităţii şi lărgimii de bandă a convertoarelor cu ultrasunete, proiectarea optimală a arcelor utilizate în dispozitivele de suspensie de la vehicule. • Fizică şi analiza datelor: determinarea configuraţiei optime a defectelor în materialele cristaline, estimarea parametrilor în probleme 34 de dinamica fluidelor, determinarea stărilor stabile în sistemele disipative. • Procese dinamice, modelare şi simulare: optimizarea unui sistem socio-economic complex, identificarea parametrilor unui model de răspândire a unei infecţii virale. • Medicină şi inginerie medicală: controlul optimal al protezelor, identificarea parametrilor modelelor folosite în farmacologie. • Inteligenţă artificială: controlul inteligent al vehiculelor autonome, determinarea ponderilor reţelelor neuronale. 3 PROGRAMARE EVOLUTIVĂ ŞI PROGRAMARE GENETICĂ Programarea evolutivă şi Programarea genetică lucrează cu populaţii care nu mai sunt reprezentate prin şiruri binare sau reale, ca în cazul algoritmilor genetici şi al strategiilor evolutive, ci prin structuri mai complicate: programe, automate finite, etc. Din punct de vedere al operatorilor folosiţi, programarea evolutivă este mai apropiată de strategiile evolutive (foloseşte mutaţia ca operator principal, încrucişarea fiind foarte rar sau deloc folosită) în timp ce programarea genetică este mai apropiată de algoritmii genetici (operatorul principal este cel de mutaţie). 3.1. Programare evolutivă 3.1.1. Generalităţi Programarea evolutivă a fost iniţiată de Fogel [26, 28] cu scopul de a genera un comportament inteligent pentru un sistem artificial. Comportamentul inteligent se referă la abilitatea sistemului 216 de a realiza predicţii asupra mediului informaţional în care se află. Sistemele sunt modelate prin automate Turing iar mediul informaţional este reprezentat printr-o succesiune de simboluri de intrare. Un automat Turing este un automat finit înzestrat cu o bandă de ieşire cu următoarea funcţionare: aflându-se în starea p şi citind simbolul de intrare x , va trece într-o altă stare q şi va înscrie pe banda de ieşire un simbol y . Prin acest mecanism, automatul Turing transformă o secvenţă de simboluri de intrare într-o secvenţă de simboluri de ieşire. Comportamentul automatului este considerat inteligent dacă poate prezice simbolul următor. Populaţia este dată de diagramele de tranziţie ale automatului iar gradul de adecvare al unui individ este cu atât mai mare cu cât şirul de simboluri produs de automat este mai aproape de un şir „ţintă”. Diagrama de tranziţie a unui automat finit determinist este reprezentată printr-un multigraf orientat şi etichetat, în care nodurile sunt etichetate cu stările automatului iar arcele reprezintă tranziţiile şi sunt etichetate cu simbolul de intrare şi cel de ieşire corespunzător. Ca exemplu, să considerăm automatul din Figura 9.1, care verifică dacă un şir de biţi conţine un număr par sau impar de poziţii egale cu 1. Alfabetul de intrare este 0, 1 iar ieşirea unei tranziţii va fi 0 sau 1, după cum şirul conţine un număr par respectiv impar de cifre 217 egale cu 1. Mulţimea stărilor este mulţimea par, impar iar starea iniţială este par . 1/1 0/0 par 0/1 impar 1/0 Figura 3.1 3.1.2. Funcţionarea automatului Turing Populaţia este formată din 1 indivizi, fiecare fiind un automat Turing. Să considerăm exemplul din figura următoare [9]. 0/c B 0/b 0/b 1/a A 1/c 1/b Figura 3.2 C 218 Automatul are stările S A, B, C , alfabetul de intrare I 0, 1, alfabetul de ieşire O a, b, c. Tranziţia între două stări este data de funcţia : S I S O definită printr-o etichetă de forma i / o care apare pe o latură între două stări s k şi s l , însemnând că sk , i sl , o ; adică dacă maşina este în starea s k şi primeşte la intrare simbolul i I atunci ea trece în starea s l şi produce la ieşire simbolul o O . Prin acest mecanism, automatul transformă un şir format din simboluri de intrare ( interpretat ca mediul maşinii) într-un şir format din simboluri de ieşire. Performanţa automatului în raport cu mediul poate fi măsurată pe baza capacităţii predictive a ei: se compară fiecare simbol de ieşire cu următorul simbol de intrare şi se măsoară valoarea predicţiei cu ajutorul unei funcţii de câştig. Paradigma programării evolutive a fost implementată de Fogel lucrând cu o populaţie de 1 părinţi care generează descendenţi prin mutaţii asupra fiecărui părinte. Mutaţia a fost implementată ca o schimbare aleatoare a componentelor automatului; o schimbare se poate realiza în cinci moduri: schimbarea unui simbol de ieşire, modificarea unei tranziţii, adăugarea/eliminarea unei stări, modificarea stării iniţiale. Pentru fiecare individ al populaţiei se alege uniform şi aleator unul din cei cinci operatori de mutaţie. Este, însă, posibil ca asupra 219 aceluiaşi individ să se aplice mai mulţi operatori de mutaţie, numărul mutaţiilor putând să fie fix sau ales conform unei distribuţii de probabilitate. După evaluarea descendenţilor se selectează cei mai buni indivizi dintre părinţi şi descendenţi; deci se efectuează o selecţie de tip . Fogel nu a folosit încrucişarea, de aceea mulţi cercetători din domeniul algoritmilor genetici au criticat metoda lui, considerând că nu e suficient de puternică. Totuşi, rezultatele teoretice şi empirice din ultimii 30 de ani au arătat că rolul mutaţiei în algoritmi genetici a fost subestimat iar cel al încrucişării a fost supraestimat [3, 19, 21, 52]. 3.1.3. Optimizare folosind programarea evolutivă Variantele curente de programare evolutivă folosite în probleme de optimizare cu parametri continui au multe lucruri în comun cu strategiile evolutive, în special în privinţa reprezentării indivizilor, al modului de efectuare a mutaţiei şi al autoadaptării parametrilor [97]. Iniţial programarea evolutivă a lucrat cu spaţii mărginite n ui , vi R n , cu ui vi . i 1 Mai târziu domeniul de căutare a fost extins la I R n , un individ fiind un vector a x I . În [22] se introduce conceptul de 220 metaprogramare evolutivă, care presupune un mecanism de autoadaptare similar celui de la strategii evolutive. Pentru a încorpora v Rn , spaţiul indivizilor este extins la vectorul varianţelor I R n Rn . Funcţia de evaluare a se obţine din funcţia obiectiv f (x ) prin scalare la valori pozitive şi, eventual, prin impunerea unor modificări aleatoare k ale parametrilor; deci (a) f x, k , unde este functia de scalare. În cazul programării evolutive standard mutaţia transformă pe x în x' , x' mut1,, n ,1,, n ( x) , după regula x'i xi i N i (0, 1) i i x i unde constantele de proporţionalitate i şi i sunt alese în funcţie de problema de rezolvat; totuşi, de obicei se consideră i 1 şi i 0 , astfel că mutaţia devine x'i xi ( x) N i (0, 1) . În cazul meta-programării evolutive individul a ( x, v) se transformă prin mutaţie în a' mut (a) ( x' , v' ) astfel: x'i xi vi N i (0, 1) v'i vi vi N i 0, 1 221 unde are rolul de a asigura că v'i primeşte o valoare pozitivă. Totuşi, dacă varianţa devine negativă sau zero atunci i se atribuie o valoare mică 0 . Se consideră că în programarea evolutivă se codifică mai degrabă specii decât indivizi; şi cum încrucişarea nu acţionează la nivelul speciilor, programarea evolutivă nu foloseşte acest tip de operator. După crearea a descendenţi din părinţi, prin aplicarea mutaţiei o singură dată asupra fiecărui părinte, se selectează indivizi din mulţimea părinţilor P (t ) reunită cu cea a descendenţilor P ' (t ) . Se utilizează o variantă stochastică a selecţiei turneu cu parametrul q 1 care constă în: pentru fiecare individ ak P(t ) P' (t ) se selectează aleator q indivizi din P (t ) P ' (t ) şi se compară evaluarea lor cu cea a lui a k . Numărul wk 0, 1, , q al indivizilor mai puţin performanţi decât a k constituie scorul lui a k . Formal, acesta se poate scrie 1 dacă ai a i wi j 10 altfel q unde indicii j 1, 2, , 2 sunt valori aleatoare uniforme, calculate pentru fiecare comparare. După ce se efectuează această operaţie pentru toţi cei 2 indivizi, ei se ordonează descrescător după scorul wi , 1 i 2 , şi se aleg cei mai buni indivizi care vor forma generaţia următoare P (t 1) . Rezultă următorul algoritm 222 begin t : 0 iniţializează P(0) : a1 (0), , a 0 I , unde I R n Rn , ai ( xi , vi ) i 1, , n a1(0), , a 0 unde a j 0 f x j 0, k j evaluează P(0) : while ( T Pt true ) do begin aplică mutaţia: a'i (t ) : mut ai t i 1, , a'1 (t ), , a' t cu a' t f x' t , k evaluează P' (t ) : a'1 (t ), , a' t calculând i i i selectează P(t 1) : turnq P(t ) P' (t ) t : t 1 end end Am evidenţiat anterior similaritatea dintre strategiile evolutive şi programarea evolutivă. Există, totuşi, şi diferenţe, cele mai evidente fiind la nivelul: • codificării: strategiile evolutive codifică indivizi iar programarea evolutivă codifică specii 223 • selecţiei: strategiile evolutive aleg cei mai buni indivizi dintre părinţi şi descendenţi, pe când programarea evolutivă face această alegere dintr-un număr de indivizi selectaţi anterior din populaţia curentă reunită cu cea a descendenţilor. Programarea evolutivă are numeroase aplicaţii, dintre care amintim: optimizarea numerică continuă, dezvoltarea sistemelor de clasificare, antrenarea reţelelor neuronale, proiectarea sistemelor de control ce pot fi modelate prin automate finite, controlul deplasării roboţilor. 3.2. Programare genetică Programarea genetică reprezintă o nouă direcţie în cadrul calculului evolutiv, dezvoltată de către J. Koza [52] în jurul anilor 1990. Programarea genetică este, de fapt, o variantă a algoritmilor genetici care operează cu populaţii constituite din „structuri de calcul”, din acest punct de vedere fiind similară programării evolutive. Structurile care constituie populaţia sunt programe care atunci când sunt executate sunt soluţii candidat ale problemei. Programarea genetică a fost dezvoltată iniţial cu scopul de a genera automat programe care să rezolve (aproximativ) anumite probleme. Ulterior aria de aplicaţii s-a extins către proiectarea evolutivă, un domeniu aflat la intersecţia dintre calculul evolutiv, proiectarea 224 asistată de calculator şi biomimetism (subdomeniu al biologiei care studiază procese imitative din natură). Programarea genetică urmează structura generală a unui algoritm genetic, folosind încrucişarea ca operator principal şi mutaţia ca operator secundar. Particularităţile programării genetice sunt legate de modul de reprezentare a indivizilor, fapt ce necesită şi alegerea adecvată a operatorilor. 3.2.1 Reprezentarea indivizilor În programarea genetică indivizii sunt văzuţi nu ca o succesiune de linii de cod ci ca arbori de derivare asociaţi „cuvântului” pe care îl reprezintă în limbajul formal asociat limbajului de programare utilizat. În practică se lucrează cu limbaje restrânse, bazate pe o mulţime mică de simboluri asociate variabilelor şi o mulţime de operatori; în aceste condiţii, orice program este de fapt o expresie în sens general. Alegerea simbolurilor şi a operatorilor este strâns legată de problema de rezolvat. Această alegere determină esenţial rezultatele ce se vor obţine; nu există, însă, reguli generale care să stabilească legătura dintre o problemă de rezolvat şi mulţimile de simboluri şi operatori folosite, rolul important revenindu-i programatorului. Koza a propus ca modalitate de reprezentare scrierea prefixată a expresiilor, care corespunde parcurgerii în preordine a arborelui de structură al expresiei. Pentru a simplifica descrierea, considerăm că se operează cu „programe” care sunt expresii ce conţin operatori 225 aritmetici, relaţionali şi logici precum şi apeluri ale unor funcţii matematice. În acest caz, limbajul formal asociat este independent de context şi fiecărui cuvânt (expresie) i se poate asocia un arbore de descriere. Nodurile interioare ale arborelui sunt etichetate cu operatori sau nume de funcţii iar cele terminale sunt etichetate cu nume de variabile sau constante. De exemplu, expresia max( x y, x 5 y ) va fi reprezentată prin max * x + y x * 5 y Figura 3.3 Mulţimea nodurilor interioare se numeşte mulţimea funcţiilor F { f1 , f 2 ,, f n f } ; în exemplul nostru F max, , . Fiecare funcţie f i F are aritatea (numărul argumentelor) cel puţin 1. Funcţiile din F pot fi de diferite tipuri: 226 • aritmetic: , , , / • matematic: sin, cos, exp, log • boolean: AND, OR, NOT • condiţional: if then else • repetitiv: for, while, repeat Mulţimea nodurilor terminale din arborele de derivare se numeşte muţimea terminalelor T t1 , t 2 , , t nt ; în exemplul nostru T x, y, 5. Mulţimile F şi T pot fi reunite într-un grup uniform C F T , dacă se consideră că terminalele sunt funcţii de aritate zero. Pentru ca programarea genetică să funcţioneze eficient, mulţimile F şi T trebuie să respecte două condiţii [102]: • cea de închidere, adică fiecare funcţie din F este aptă să accepte ca argument orice valoare sau tip de dată ce poate fi returnat de orice funcţie din C ; această proprietate previne erorile din timpul rulării • cea de suficienţă, care cere ca funcţiile din C să poată exprima soluţiile problemei, adică cel puţin o soluţie aparţine mulţimii tuturor compunerilor posibile ale funcţiilor din C . Câteva exemple de mulţimi C închise sunt: • C AND, OR, NOT , x, y, true, unde x şi y sunt variabile booleene • C , , , x, y, 1, 0, cu x şi y variabile întregi • C , , sin, cos, exp, x, y, cu x şi y variabile reale. 227 Există mulţimi pentru care proprietatea de închidere nu este verificată; de exemplu: • C , , , /, x, y, 1, 0, cu x şi y variabile reale, nu este închisă deoarece se pot genera împărţiri prin zero, cum sunt • x x y , , etc 0 xx C , , sin, cos, log, x, y, cu x şi y variabile întregi, nu este închisă deoarece se pot genera valori negative pentru logaritm; de exemplu, log( x) . Mulţimea C , , , x poate fi închisă sau nu, în funcţie de domeniul valorilor lui x . Închiderea poate fi forţată prin folosirea funcţiilor „protejate”. Astfel de funcţii sunt: dacă y 0 x 1 , operaţie ce va fi notată în continuare cu y x y dacă y 0 • div • • dacă x 0 0 log(x) log x altfel x x . Dacă nu vrem să folosim funcţii de protecţie atunci trebuie redusă foarte mult adecvarea (fitness-ul) expresiilor nevalide; problema este similară funcţiilor de penalizare din cazul problemelor de optimizare. Suficienţa este garantată numai pentru unele probleme, când teoria sau alte metode ne spun că o soluţie poate fi obţinută 228 combinând elementele lui C . De exemplu, logica ne spune că C AND, OR, NOT , x, y permite implementarea oricărei funcţii booleene, deoarece conţine o multime completă de conectori. Dacă C nu este suficientă, programarea genetică poate doar să dezvolte programe care să realizeze o aproximare cât mai bună. De exemplu, mulţimea C , , , /, x, 0, 1, 2 nu poate să dea decât o aproximare pentru exp(x) , ştiind că aceasta este o funcţie transcedentală (nu poate fi aproximată exact cu ajutorul unor expresii algebrice finite). În acest caz programarea genetică nu poate decât să realizeze aproximări algebrice finite de tipul exp( x) 1 exp( x) 1 x exp( x) 1 x x 2 exp( x) 1 x x2 1 x3 2 1 1 2 2 3.2.2 Populaţia iniţială Arborii iniţiali sunt generaţi alegând aleator funcţii din C . De exemplu, pentru C , , , /, x, y, 0, 1, 2, 3 putem avea 229 * * + 2 x 2 3 x * * - x / / 2 3 x - 1 x + y 0 y Figura 3.4 Dimensiunea şi forma programelor iniţiale sunt controlate selectând noduri din F şi T , în funcţie de adâncimea lor în arbore. Arborii pot fi reprezentaţi şi ca liste de liste. De exemplu, pentru primii doi arbori anteriori avem 2 x , * x 3 2. Din acest motiv, iniţializarea populaţiei în programarea genetică este bazată, de obicei, pe proceduri recursive care returneaza liste. Metoda „full” selectează noduri din F dacă adâncimea lor nu depăşeşte o valoare maximă şi noduri din T în caz contrar. Această tehnică duce la obţinerea unei populaţii iniţiale cu frunzele la acelaşi nivel (Figura 3.5). 230 * * * + * + * + + 2 2 y * y * + + / / 2 x y 2 / y x 3 Figura 3.5 Metoda „grow” selectează noduri din C dacă adâncimea lor este mai mică dacât o valoare minimă şi din T în caz contrar. Cum C conţine şi elemente terminale, această metodă produce arbori iniţiali de diferite forme şi adâncimi, ca în Figura 3.6 * * x * x * + * + x 1 x + 1 y Figura 3.6 Metoda „ramped half and half” combină cele două metode anterioare, pentru a da o mai mare diversitate populaţiei iniţiale. Ea lucrează după următorul algoritm 231 for i 1 to max _ adâncime do begin 50 generează max _ adâncime 1 % din populaţie folosind metoda „full” cu adâncimea maximă i 50 generează max _ adâncime 1 % din populaţie folosind metoda „grow” cu adâncimea maximă i end Metodele „full” şi „grow” pot fi implementate cu următoarea procedură recursivă [102] Generează_expresie( F , T , max_adâncime, metodă) begin if max _ adâncime 0 then begin selectează t T inserează t în arbore end else begin 232 if metoda = full then selectează f F else selectează f F T inserează f în arbore if f F then begin n : aritatea lui f for i 1 to n do generează_expresie( F , T , max_adâncime-1, metodă) end end end La apelul generează _ expresie / , x y 0 1 2 3, 3, full se poate genera expresia x 1 2 0 / y 3 2 x care corespunde arborelui următor 233 * + / - * x 1 2 - + 0 y 3 2 Figura 3.7 La apelul generează - expresie 1, x y 0 1 2 3, 3, grow se generează expresia 3 x 2 / 1 y care corespunde arborelui Figura 3.8 x 234 3.2.3 Operatori de evoluţie Încrucişarea constă în selectarea aleatoare a punctului de încrucişare (nod sau arc în arbore) în fiecare părinte şi schimbarea subarborilor care încep de la punctul de încrucişare. Este indicat ca punctele de încrucişare să nu fie selectate uniform aleator ci să fie favorizate nodurile interioare; de exemplu, în 90% din cazuri se aleg noduri neterminale. Este posibil ca numărul încrucişărilor să fie controlat de probabilitatea de încrucişare. Un exemplu de încrucişare este dat în figura următoare. (a*b)*sin(c) a*sin(c) * * * a a sin sin c b c a*(b+c) (a*b)*(b+c) * * a + b * c părinţi a + b descendenţi Figura 3.9 b c 235 Deşi este un operator secundar, mutaţia permite modificarea structurilor arborescente în moduri în care încrucişarea nu o poate face. În funcţie de efectul pe care îl are asupra structurii, există trei variante de mutaţie: • mutaţia simplă: modifică eticheta unui nod selectat aleator Figura 3.10 • mutaţia de expandare: constă în înlocuirea unui nod terminal cu un subarbore, construit după aceleaşi reguli dar care nu face parte neapărat din populaţia curentă, aşa cum se întâmplă în cazul încrucişării + a (a+b)*sin(c) (a+b)*sin(c+b) * * sin b c sin + a b + c Figura 3.11 b 236 • mutaţia de reducere: constă în înlocuirea unui subarbore cu un nod terminal (a*b)*sin(c) a*sin(c) * * * a sin b a c sin c Figura 3.12 În general se poate alege un nod al arborelui şi subarborele dominat de acel nod este înlocuit cu altul generat aleator. Astfel, mutaţia poate fi văzută ca încrucişarea cu un arbore generat aleator. 3.2.4 Rularea programelor în Programarea genetică În programarea genetică programele sunt reprezentate prin arbori sau liste; întotdeauna este posibil să transformăm o astfel de structură într-un cod C, C++, Java, Lisp, etc. Dar, în majoritatea cazurilor o asemenea operaţie este ineficientă deoarece • iniţial, majoritatea programelor au un fitness foarte mic şi vor supravieţui puţine generaţii 237 • multe programe (bune sau mai puţin bune) vor fi schimbate prin încrucişare sau mutaţie şi vor trebui recompilate. O abordare mai bună este de a interpreta programele, în loc de a le compila. Unele limbaje de programare, cum este Lisp, au deja un interpretor în mediul de dezvoltare a programelor. Pentru a-l utiliza trebuie să ne asigurăm că sintaxa este corectă; de exemplu, să utilizăm paranteze rotunde în locul celor pătrate. Pentru toate celelalte limbaje de programare este necesară construcţia unui astfel de interpretor. A interpreta un program înseamnă a-l parcurge în adâncime şi a evalua toate nodurile începând cu frunzele. Parcurgerea în adâncime permite evaluarea nodurilor numai după ce valorile argumentelor lor sunt cunoscute. De exemplu, 5 x 1 -1 4 - 2 * * 2 0 + 1 -1 2 1 x 2 - - x x 2 Figura 3.13 În urma interpretării, valoarea nodului rădăcină este valoarea programului. Aplicaţiile posibile ale programării genetice sunt multe şi variate, singura problemă fiind definirea unei funcţii fitness 238 corespunzătoare. Tehnicile de scalare şi penalizare folosite în algoritmi genetici sunt aceleaşi, cu diferenţa că pentru a decide dacă un program este bun sau nu, trebuie executat odată sau de mai multe ori cu date de intrare diferite sau în diferite contexte. O clasă de probleme în care programarea genetică s-a dovedit foarte utilă este regresia simbolică. Aceasta este o tehnică folosită foarte des în interpretarea datelor şi constă în găsirea coeficienţilor unei funcţii, obţinute ca o combinaţie de funcţii elementare astfel încât aceasta să aproximeze cât mai bine o funcţie cunoscută prin valorile ei în puncte date. Termenul „simbolic” semnifică faptul că nu suntem interesaţi în găsirea parametrilor optimi (numere) ci a funcţiilor optime (expresii, reprezentări simbolice). Pentru a folosi programarea genetică în rezolvarea problemelor de regresie simbolică este necesar: • să avem o mulţime de puncte date, unde fiecare punct reprezintă valorile luate de anumite variabile la un anumit moment • să selectăm variabilele pe care le considerăm dependente de altele • să definim o funcţie fitness care să măsoare capacitatea fiecărui program de a determina valorile variabilelor independente când sunt date valorile celor dependente • să selectăm mulţimile adecvate de funcţii şi terminale; terminalele trebuie să includă toate variabilele dependente şi poate şi altele iar funcţiile trebuie selectate pe baza cunoştintelor despre domeniu. Ca exemplu [102] să găsim expresia simbolică ce aproximează cel mai bine mulţimea de date 239 xi , yi 1.0, 0.0, 0.9, 0.1629, 0.8, 0.2624, , 1.0, 4.0 care a fost generată folosind funcţia y f x x x 2 x 3 x 4 , x 1, 1 . Parametrii programului genetic sunt • dimensiunea populaţiei = 1000 • mulţimea funcţiilor = , , *, log, exp, sin, cos, div • mulţimea terminalelor = x • adâncimea maximă = 4 • metoda de generare a populaţiei iniţiale = full • număr de generaţii = 50 • probabilitatea de încrucişare = 0.7 • pobabilitatea de mutaţie = 0 • funcţia fitness = yi eval prog, xi i Câteva din programele optime obţinute la diverse generaţii sunt [102]: • la generatia 1 [+ [- [log [exp x] ] [+ [sin x] [- x x] ] ] [+ [exp [log x] ] [sin [log x] ]]] cu fitnessul 8.20908 • la generaţia 2 [* [+ [+ [+ x x] [div x x] ] x ] [log [exp [* x x] ] ] ] cu fitnessul 7.0476 • la generaţia 3 [* [log [- [sin x] [exp x] ] ] [+ [cos [* x x] ] [+ [+ x x] [cos x] ] ] ] cu fitnessul 4.74338 240 • la generaţia 6 [* [+ [+ [+ x [exp [log x] ] ] [div x x] ] x ] [log [exp x] ] ] cu fitnessul 2.6334 Se observă că fitnessul descreşte, ceea ce înseamnă că programul tinde spre găsirea combinaţiei optime de funcţii; de exemplu, la iteraţia 26 se obţine fitnessul 0.841868. Applied Soft Computing 4 (2004) 65–77 Extracting rules from trained neural network using GA for managing E-business A. Ebrahim Elalfi a,∗ , R. Haque b , M. Esmel Elalami a a Department of Computer Instructor Preparation, Faculty of Specific Education, Mansoura University, Mansoura, Egypt b High Tech. International.com, Montreal, Que., Canada Received 23 September 2002; received in revised form 13 August 2003; accepted 19 August 2003 Abstract The ability to intelligently collect, manage and analyze information about customers and sellers is a key source of competitive advantage for an e-business. This ability provides an opportunity to deliver real time marketing or services that strengthen customer relationships. This also enables an organization to gather business intelligence about a customer that can be used for future planning and programs. This paper presents a new algorithm for extracting accurate and comprehensible rules from databases via trained artificial neural network (ANN) using genetic algorithm (GA). The new algorithm does not depend on the ANN training algorithms also it does not modify the training results. The GA is used to find the optimal values of input attributes (chromosome), Xm , which maximize the output function ψk of output node k. The function ψk = f(xi , (WG1)i,j , (WG2)j,k ) is nonlinear exponential function. Where (WG1)i,j , (WG2)j,k are the weights groups between input and hidden nodes, and hidden and output nodes, respectively. The optimal chromosome is decoded and used to get a rule belongs to classk . © 2003 Elsevier B.V. All rights reserved. Keywords: E-business; Artificial neural network; Genetic algorithms; Personalization; Online shopping; Rule extraction 1. Introduction E-commerce has evolved from consumers conducting basic transactions on the Web, to a complete retooling of the way partners, suppliers and customers transact. Now one can link dealers and suppliers online, reducing both lag time and paperwork. One can move procurement online by setting up an extranet that links directly to vendors, cutting inventory carrying costs and becoming more responsive to his/her customers. Also, you can streamline your ∗ Corresponding author. E-mail address: ael alfi@hotmial.com (A.E. Elalfi). financial relationships with customers and suppliers by Web-enabling billing and payment systems. Recent literature suggests that Internet and WWW as a business transaction tool provides both firms and consumers with various benefits including lower transaction cost, lower search cost, and greater selection of goods [1]. The ability to provide content and services to individuals on the basis of knowledge about their preferences and behavior has become important marketing tool [2]. A complete customer profile has two parts: factual and behavioral. The factual profile contains information, such as name, gender, and date of birth that the personalization system obtained from the customer’s 1568-4946/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2003.08.004 66 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 factual data. The factual profile also can contain information derived from the transaction data. A behavioral profile models the customer’s actions and is usually derived from transactional data. Personalization begins with collecting customer data from various sources. This data might include histories of customers; web purchasing and browsing activities, as well as demographic and psychological information. After the data is collected, it must be prepared, cleaned, and stored in data warehouse. Real world data is dirty. Data cleaning including the removal of contradictory and redundant data items and the elimination of irrelevant attributes has been an important topic in data mining research development [3]. Extracting rules from a given database via trained neural networks is important [4]. Although several algorithms have been proposed by several researchers [5,6], there is no algorithm which can be applied to any type of networks, to any training algorithm, and to both discrete and continuous values [4]. A method for extracting M-of-N rules from trained artificial neural networks (ANN) was presented by Setiono [5]. However, the algorithm was based on the standard three—layered feed forward networks. Also the attributes of the database are assumed to have binary values −1 or 1. Hiroshi had presented a decomposition algorithm that can be applied to multilayer ANN and recurrent networks [6]. The units of ANN are approximated by Boolean functions. The computational complexity of the approximation is exponential, and so a polynomial algorithm was presented [7]. To reduce the computational complexity higher order terms were neglected. Consequently the extraction of accurate rules is not guaranteed. An approach for extracting rules from trained ANNs for regression was presented [13]. Each rule in the extracted rule set corresponds to subregion of the input space and a linear function involving the relevant input attributes of the data approximates the network output for all data samples in this subregion. However, the method extracts rules from trained ANN by approximating the hidden activation function; h(x) = tanh(x) by either three-piece or five-piece linear function. This approximation yields to less accuracy and makes the computation burdensome. This paper presents a new algorithm for extracting rules from trained neural network using genetic algorithm. It does not depend on the training algorithms of ANN and does not modify the training results. Also the algorithm can be applied on discrete and continuous attributes. The algorithm does not make any approximation to the hidden unit activation function. Additionally it takes into consideration any number of hidden layers in the trained ANN. The extracted rules can be used to define customer profile in order to make easy online shopping. 2. Problem formulation A supervised ANN uses a set of training examples or records. These records include N attributes. Each attribute, An (n = 1, 2, . . . , N), can be encoded into a fixed length binary sub-string {x1 . . . xi . . . xmn }, where mn is the number of possible values for an attribute An . The element xi = 1 if its corresponding attribute value exists, while all the other elements = 0. So, the proposed number of input nodes, I, in the input layer of ANN can be given by I= N mn (1) n=1 The input attributes vectors, Xm , to the input layer can be rewritten as Xm = {x1 . . . xi . . . xI }m (2) where m = (1, 2, . . . , M), M is the total number of input training patterns. The output class vector, Ck (k = 1, 2, . . . , K), can be encoded as a bit vector of a fixed length K as follows: Ck {ψ1 . . . ψk . . . ψK } (3) where K is the number of different possible classes. If the output vector belongs to classk then the element ψk is equal to 1 while all the other elements in the vector are zeros. Therefore, the proposed number of output nodes in the output layer of ANN is K. Accordingly the input and the output nodes of the ANN are determined and the structure of the ANN is shown in Fig. 1. The ANN is trained on the encoded vectors of the input attributes and the corresponding vectors of the output classes. The training of ANN is processed until the convergence rate between the actual and the desired output will be achieved. The convergence rate can be A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 67 Fig. 1. The structure of the ANN. improved by changing the number of iterations, the number of hidden nodes (J), the learning rate, and the momentum rate. After training the ANN, two groups of weights can be obtained. The first group, (WG1)i,j , includes the weights between the input node i and the hidden node j. The second group, (WG2)j,k , includes the weights between the hidden node j and the output node k. The activation function used in the hidden and output nodes of the ANN is a sigmoid function. The total input to the jth hidden node, IHN j, is given by; IHNj = I xi (WG1)i,j (4) i=1 So, the final value of the kth output node, ψk , is given by ψk = − 1+e 1 J j=1 WG2j,k 1/1+e − I x (WG1) i,j i=1 i (7) The function, ψk = f(xi , (WG1)i,j , (WG2)j,k ) is an exponential function in xi since (WG1)i,j , (WG2)j,k are constants. Its maximum output value is equal one. The output of the jth hidden node, OHNj , is givenby OHNj = 1+e − 1 I i=1 xi (WG1)i,j (5) The total input to the kth output node, IONk , is given by IONk = J j=1 (WG2)j,k 1+e − 1 I i=1 xi (WG1)i,j (6) Definition. An input vector, X m , belongs to a classk iff ψk ∈ Cm = 1 and all other elements in Cm = 0. Consequently, for extracting relation (rule) between the input attributes, Xm relating to a specific classk one must find the input vector, which maximizes ψ k . This is an optimization problem and can be stated as: Maximize 68 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 ψk (xi ) = − 1+e 1 J j=1 WG2j,k 1/1+e − I x (WG1) i,j i=1 i (8) Subjected to: xi are binary values (0 or 1) (9) Since the objective function ψk (xi ) is nonlinear and the constraints are binary so, it is a nonlinear integer optimization problem. The genetic algorithm (GA) can be used to solve it. The following algorithm explains how the GA can be used to obtain the best chromosome, which maximizes objective function ψk (xi ): Begin { Assume the fitness function as ψk (xi ) Create a chromosome structural as follows: { Generate number of slots equal I, which represent input vector X. Put a random value 0 or 1 in each slot} G=0 where G is the number of generation. Create the initial population, P, of T chromosomes, P(t)G , where t = 1 to T. Evaluate the fitness function according to P(t)G while termination conditions not satisfied Do {G = G + 1 Select number of chromosomes from P(t)G according to the roulette wheel procedure Recombine between them using crossover and mutation; Modify the population from P(t)G −1 to P(t)G Evaluate the fitness function according to P(t)G } Display the best chromosome that satisfies the conditions} End For extracting a rule belongs to classk the best chromosome must be decoded as follows: • The best chromosome is divided into N segments. • Each segment represents one attribute, An (n = 1, 2, . . . , N), and has a corresponding bits length mn which represents their values. • The attribute values are existed if the corresponding bits in the best chromosome equal one and vice versa. • The operators “OR” and “AND” are used to correlate the existing values of the same attribute and the different attributes, respectively. • After getting the set of rules make rule refinement and cancel redundant attributes, e.g. if an attribute has three values such as A, B, and C and a rule looks like: If attk has value A or B or C then classk such attribute can be dropped (redundant). The overall methodology of rule extraction is shown in Fig. 2. 3. Generalization for multiple hidden layers The objective function obtained in Eq. (8) can be generalized for ANN, which has more than one hidden layer. Fig. 3 shows the ANN that includes three hidden layers. The function, ψk , in the final form for the kth output mode is given by ψk = 1+e 1 − J −A )(WG4)j k] 3 j3 =1 [1/(1+e (10) where J A= j2 =1 1 1+e − J j1 =1 1/1+e − I X WG1) j (WG2)j j i 1 1 2 i=1 i × (WG3)j2 j3 (11) Xi is the input values, where i = 1, 2, . . . , I. I is the total number of nodes at input layer, j1 = 1, 2, . . . , J, for first hidden layer; j2 = 1, 2, . . . , J, for second hidden layer; j3 = 1, 2, . . . , J, for third hidden layer; J is the total number of nodes at each hidden layer; k = 1, 2, . . . , K; K is the total number of nodes at output layer; (WG1)ij1 weights group between input layer i, A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 69 Separate database into input vectors (Xm) and crossponding output vectors (Cm) Database is coded as bit string Iteration = 0.0 . Input nodes = 1 , 2 ,..................., I Hideen nodes = 1 , 2 ,..................., J Output nodes = 1, 2 ,...................., K Iteration = Iteration +1 Structure ANN with random paramters ( learning coef. , momentum coef. ) No Is the iteration reach max. ? No Is the error satisfactory ? Yes Create another random paramters for ANN Yes Extract the weight groups { (WG1) i,j & (WG2) j,k } Create a set of general form for output function, k (xi ) k=1 Create an intial population Iteration = 0.0 Maximize the fittness function k (xi ) Evaluate the fittness function Selection Crossover Mutation Update the population Iteration = Iteration +1 No Is the iteration reach max. ? Yes Arrange the fittness function from up to down until to certain level. Decode the crossponding population into equivalnt rules which meet the classk k=k+1 No Is k reach max. ? Yes Fig. 2. Overall flowchart for the proposed methodology. Stop 70 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 Input attributes vectors 1 0 X1 0 1 X2 1 0 XI Input layer i Output layer k Hidden layer j (WG1) i,,j1 (WG2) j1,,j2 (WG3) j1 j2 j2,,j3 Output class vectores (WG4) j3,k j3 Class Class S1 1 Sk K 1 0 1 1 0 1 Class 1 0 0 0 0 0 1 0 0 0 0 0 1 XI Fig. 3. ANN with three hidden layers. and first hidden layer, j1 . (WG2)j1 j2 is the weights group between first hidden layer, j1 , and second hidden layer, j2 . (WG3)j2 j3 is the weights group between second hidden layer, j2 , and third hidden layer, j3 . (WG4)j3 k is the weights group between third hidden layer, j3 , and output layer, k. 4. Personalized marketing and customer retention strategies As organizations attempt to develop marketing and customer retention strategies, they will need to collect visitor’s statistics and integrate data across systems. Additionally, there is a need to improve data about inventories. Personalization is a relatively new field, and different authors provide various definitions of the concept [11]. Fig. 4 shows the stages of personalization as an iterative process [2]. A framework in order to identify individual user behavior by a system to make easy online shopping and to maximize user satisfaction has presented [12]. It is clear that the individual user behavior will act based on his/her preferences, attitude and personality. Each individual behavior such as preferences and attitudes are different from the others. An individual activities or expressions are monitored and captured by using sensing devices. The individual user behaviors are recognized by pattern recognition systems (PRSs). The intelligent agents are used to make system strategies or plans based on the individual user behaviors and product state; so that the system can act as per individual behaviors to make it easy online shopping. • A proposed record for products and inventories can have the following attributes: product name, color, store size, city, month, quantity, quantity sold, profit. • A record for factual data include: customer ID, customer name, gender, birth date, nationality. Fig. 4. Stages of the personalization process. A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 Table 1 Example for target concept play tennis [8] 71 Table 3 Group of weights (WG1)i,j between input and hidden nodes Day Outlook Temperature Humidity Wind Play tennis D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild High High High High Normal Normal Normal High Normal Normal Normal High Normal High Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No Input nodes x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Hidden nodes H1 H2 H3 H4 −4.09699 6.154562 −0.82675 −0.42227 4.128692 −2.73254 −4.93463 5.282225 3.060052 −3.63009 3.741246 −4.56639 1.114981 0.2961 −3.07741 2.595217 4.005334 −4.36782 −3.11607 2.284223 −1.2106 0.349845 0.153325 −0.19704 −0.15498 −0.56767 −1.17037 0.235355 1.106763 −1.36338 −1.42853 1.109533 −0.47917 −0.55404 0.651919 −0.32539 −0.89697 0.616702 0.56799 −1.02158 Table 4 Group of weights (WG2)j,k output and hidden nodes Output nodes • A record for transactional data may include the attributes: customer ID, date, time, store, product, coupon used. ψ1 ψ2 5. Illustrative example Hidden nodes H1 H2 H3 H4 −9.20896 9.22879 9.012731 −9.00487 −1.2113 0.773881 −0.90564 1.218929 classes vectors, Cm . The number of input nodes is given by A given database (has four attributes and two different output classes) is shown in Table 1 [8]. The encoding values of the given database are shown in Table 2. The ANN is trained on the encoding input attributes vectors, Xm , and the corresponding output I= N mn = m1 + m2 + m3 + m4 = 10 n=1 The number of output nodes is K = 2. Table 2 Encoding database i/p patt. Xm Outlook, m1 = 3 Temperature, m2 = 3 Humidity, m3 = 2 Wind, m4 = 2 O/P patt. Play tennis Sunny (x1 ) Overcast Rain (x2 ) (x3 ) Hot (x4 ) Mild (x5 ) Cool (x6 ) High (x7 ) Norm (x8 ) Weak (x9 ) Strong (x10 ) Cm ψ1 No ψ2 Yes X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 1 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 0 1 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 0 1 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 1 0 0 0 1 72 Rule no. Fitness Xi vector from GA A1 A2 A3 Directly extracted rules (don’t play) Rules refinement (don’t play) A4 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 1 0.99988 1 0 0 1 1 1 1 0 0 1 If Outlook is Sunny And Temperature is Hot or Mild or Cool And Humidity is High And WIND is Strong If Outlook is Sunny And Humidity is High And WIND is Strong 2 0.999874 1 0 1 0 0 1 1 0 0 1 If Outlook is Sunny or Rain And Temperature is Cool And Humidity is High And WIND is Strong If Outlook is Sunny or Rain And Temperature is Cool And Humidity is High And WIND is Strong 3 0.999867 1 0 0 1 1 1 1 0 1 1 If Outlook is Sunny And TEPERATURE is Hot or Mild or Cool And Humidity is High And WIND is weak or Strong If Outlook is Sunny And Humidity is High 4 0.999849 0 0 1 0 0 1 1 1 0 1 If Outlook is Rain And Temperature is Cool And Humidity is High or Normal And Wind is Strong If Outlook is Rain And Temperature is Cool And Wind is Strong A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 Table 5 The rule extraction for class no (ψ1 is maximum) Rule no. Xi vector from GA A1 A2 A3 Directly extracted rules (play) Rules refinement (play) A4 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 1 0.99998 0 1 1 0 0 0 0 0 1 0 If Outlook is Overcast or Rain And Wind is Weak If Outlook is Overcast or Rain And Wind is Weak 2 3 0.999972 0.999960 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 If Outlook is Overcast If Outlook is Sunny or Overcast And Humidity is Normal If Outlook is Overcast If Outlook is Sunny or Overcast And Humidity is Normal A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 Table 6 The rule extraction for class yes (ψ2 is maximum) 73 74 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 Table 7 RITIO induced rule set from Table 1 [9] Rule no. Rule 1 2 3 4 5 6 7 If If If If If If If Outlook is Sunny And Humidity is High Then CLASS No Outlook is Overcast And Humidity is High Then CLASS Yes Humidity is Normal Then CLASS Yes HUMIDITY is Normal And Wind is Weak Then CLASS Yes Outlook is Rain And Humidity is High And Wind is Weak Then CLASS Yes Outlook is Rain And Humidity is normal And Wind is Strong Then CLASS No Outlook is Rain And Humidity is high And Wind is Strong Then CLASS No The convergence rate between the actual and the desired output is achieved by: 4 hidden nodes, 0.55 learning coefficient, 0.65 momentum coefficient and 30,000 iterations. The allowable error equals 0.000001. Table 3 shows the first group of weights (WG1)i,j between each input node and the hidden nodes. The second group of weights (WG2)j,k between each hidden node and the output nodes is shown in Table 4. Applying the GA to solve the equation ψ1 in order to get the i/p attributes vector which maximizes that function. The GA has population of 10 individuals evolving during 1300 generations. The crossover and the mutation were 0.25 and 0.01 respectively. The output chromosomes of play and don’t play target classes are sorted descendingly according to their fitness values. The threshold levels of the two target classes are 0.99996 and 0.999849, respectively. Therefore, both the local and global maximum of output chromosomes has been determined and will be translated into rules. Tables 5 and 6 present the best set of rules belonging to don’t play and play target, respectively. Table 7 shows RITIO induced set of rule for the same database [9]. Although RITIO gives a good indication of the algorithm stability over different databases; the rule number 3 is not verified. The algorithm proposed here shows that all rules are verified. 6. Application and results The MONK’S problems are benchmark binary classification tasks in which robots are described in terms of six characteristics and a rule is given which specifies the attributes that determine membership of the Table 8 The attributes and their values of MONK1’S database [10] Robot characteristics (attributes) Nominal values Head shape Body shape Is smiling Holding Jacket colour Has tie Round, square, octagon Round, square, octagon Yes, no Sword, flag, balloon Red, yellow, green, blue yes, no target class [10]. The six attributes and their values are shown in Table 8. The two rules that determine the memberships of the target class in the MONK1’S database are shown in Table 9. The ANN is trained on 123 input vectors, Xm . The corresponding output classes vectors, Cm are shown in Table 10. The number of input nodes, I = 17, and the number of output nodes, K = 2. The convergence rate between the actual and desired output is achieved by: 6 hidden nodes, 0.25 learning coefficient, 0.85 momentum coefficient and 31,999 iterations. The allowable error equals 0.0000001. Table 11 shows the first group weights (WG1)i,j between each input node and the hidden nodes. The second group weights (WG2)j,k between each hidden node and the output nodes is shown in Table 12. Table 9 Two rules satisfy the target Rule 1 Rule 2 If Head Shape Value = Body Shape Value THEN Robot is in Target Class If Jacket Color = Red THEN Robot is in Target Class A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 75 Table 10 The MONK1’S database [10] Xm Head shape Body shape Is smiling Holding Jacket colour Has tie 1 2 3 Round Round Round Round Round Square Yes Yes Yes Sword Flag Sword Green Yellow Green Yes Yes Yes 1 2 3 Yes Yes No 4 Round Octagon Yes Flag Blue Yes 4 No 55 56 57 58 Square Square Square Square Round Square Square Octagon Yes Yes Yes No Sword Sword Flag Balloon Green Green Red Red Yes Yes No Yes 55 56 57 58 No Yes Yes Yes 120 121 122 123 Octagon Octagon Octagon Octagon Round Round Octagon Octagon No No No No Sword Balloon Flag Flag Red Yellow Yellow Green Yes No No No 120 121 122 123 Yes No Yes Yes Cm Target Table 11 Group of weights (WG1)i,j between each input and hidden nodes Input nodes Hidden nodes H1 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 −5.08851 4.094656 2.711605 −2.9641 0.929943 3.494829 0.475753 0.358807 −0.10996 0.385337 −0.13311 −7.31878 2.941625 2.469945 2.658616 0.48247 0.878135 H2 −6.40872 −0.55311 7.121283 −7.48084 7.760751 0.138298 0.275564 0.269779 0.243966 −0.31376 −0.07916 12.26899 −3.99095 −4.1919 −3.47783 0.314717 0.340808 H3 H4 H5 H6 2.478146 2.24007 −2.49793 1.351769 −2.3443 2.217123 0.829914 0.623271 0.019956 0.989733 0.539239 −4.98723 1.822638 2.270769 2.435963 0.777509 0.315489 −0.53785 −1.00648 −0.15809 −0.69977 −0.53314 −0.63468 −1.09122 −1.23704 −0.29096 −0.58041 −1.02715 0.279794 −0.49974 −0.57977 −0.62123 −0.83715 −0.77439 3.331379 −6.64513 0.468151 −6.00667 5.33333 −1.24655 −1.47744 −1.61803 −1.02741 −0.54741 −0.74859 −4.79433 0.666357 0.686182 1.15922 −1.61191 −2.04905 −1.01267 −0.53136 −0.3962 −0.18359 −0.2059 −0.4458 −0.8716 −0.92063 0.006704 −0.50737 −0.77975 0.471633 −1.03168 −1.01134 −0.59382 −0.56232 −1.21304 Table 12 Group of weights (WG2)j,k between each hidden and output nodes Output nodes 1 2 Hidden nodes H1 H2 H3 H4 H5 H6 13.3740 −13.37457 −14.5207 14.52426 −6.48067 6.48808 −0.40159 0.07072 11.70462 −11.7054 −0.52939 0.33697 76 Table 13 The set of rules belongs to target class Fitness Xi vector from GA A1 A2 A3 A4 A5 Directly extracted rules Rules refinement If Jacket color is Red If Head Shape is Octagon AND Body Shape is Octagon AND Is Smiling is Yes OR No AND Holding is Sword OR Flag OR Balloon AND Jacket Color is Red OR Yellow OR Green OR Blue If Head Shape is Square AND Body Shape is Square AND Is Smiling is Yes OR No AND Holding is Sword OR Flag OR Balloon AND Has Tie is Yes OR No If Head Shape is Round AND Body Shape is Round AND Is Smiling is Yes OR No AND Holding is Sword OR Flag OR Balloon If Jacket color is Red If Head Shape is Octagon AND Body Shape is Octagon A6 1 2 0.9999 0.99947 x1 0 x2 0 x3 1 x4 0 x5 0 x6 1 x7 1 x8 1 x9 1 x10 1 x11 1 x12 1 x13 1 x14 1 x15 1 x16 0 x17 0 3 0.99946 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 1 4 0.99845 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 If Head Shape is Square AND Body Shape is Square If Head Shape is Round AND Body Shape is Round A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 Rule no. A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 65–77 Table 14 Accuracy results for different algorithms [9] Database MONK1’S HCV (%) C4.5 (%) RITIO (%) C4.5 rules (%) Proposed algorithm (%) 100 83.3 97.37 100 100 The GA has a population of 10 individuals evolving during 1225 generations. The crossover and mutation are 0.28 and 0.002, respectively. The output chromosomes for target class are sorted according to their fitness values until the level 0.99845. Table 13 presents the best set of rules, belongs to target class according to the fitness values. From Table 13, the rules extracted from the proposed algorithm and the standard rules given in Table 9 are identical. This shows a good indication of the algorithm stability. The accuracy of the proposed algorithm among different algorithms for MONK1’s database is shown in Table 14 [9]. The discovered rules for hypothetical individual person data and the products are in the following format: IF PRODUCT = Hat THEN Profit = Medium. IF Color = Blue THEN Profit = High. IF MONTH = June THEN Profit = Medium. IF MONTH = December THEN Profit = High. 7. Conclusions A novel machine learning algorithm for extracting comprehensible rules have been presented in this paper. It does not need the computational complexity as deterministic finite state automata (DNF) algorithm. It takes all input attributes into consideration so it produces an accurate rules but other algorithms such as DNF uses only the input attributes up to certain level. Also, it uses only part of weights to extract rules belongs to certain class. So it has a less computational time compared with another algorithms. The proposed 77 methodology does not make any approximation to the activation function. The user profile information stored in a database along with a unique user ID and password. A data warehouse repository with such data can be analyzed. This algorithm can help devise rules to govern which messages are offered to the an anonymous prospect, how to counter points of resistance, and when to attempt to close a sale. The future work should consist of more experiments with other data sets, as well as more elaborated experiments to optimize the GA parameters of the proposed algorithm. References [1] J. Jhang, H. Jain, K. Ramamurthy, Effective design of electronic commerce environments: a proposed theory of congruence and an illustration, IEEE Trans. Systems Man Cybernet. Part A: Syst. Hum. 30 (4) (2000) 456–471. [2] G. Adomavicius, A. Tuzbilin, Using data mining methods to build customer profiles, IEEE Comput. 34 (2) (2001) 74–82. [3] X. Wu, D. Urpani, Induction by attribute elimination, IEEE Trans. Knowl. Data Eng. 11 (5) (1999) 805–812. [4] H. Tsukimoto, Extracting rules from trained neural networks, IEEE Trans. Neural Networks 11 (2) (2000) 377–389. [5] R. Setiono, Extracting M-of-N rules from trained neural networks, IEEE Trans. Neural Networks 11 (2) (2000) 512– 519. [6] F. Wotawa, G. Wotawa, Deriving qualitative rules from neural networks—a case study for ozone forecasting AI communications, vol. 14, 2001, 23-33 ISSN 0921-7126, © 2001, IOS Press. [7] H. Tsukimoto, Extracting rules from trained neural networks, IEEE Trans. Neural Networks 11 (2) (2000) 377–389. [8] Tom M. Mitchell, Machine Learning Book, Copyright 1997. [9] Xindond Wu, D. Urpani, Induction by attribute elimination, IEEE Trans. Knowl. Data Eng. 11 (5) (1999). [10] http://www.cse.unsw.edu.au∼cs3411/C4.5/Data. [11] Comm. ACM, Special Issue on Personalization, vol. 43, no. 8, 2000. [12] A.E. El-Alfy, R. Haque, Y. Al-Ohali, A framework to employ multi AI systems to facilitate easy online shopping, http://www-3.ibm.com/easy/eou ext.nsf/Publish/2049. [13] R. Setiono, W.K. Leow, J.M. Zurada, Extraction of rules from artificial neural networks for nonlinear regression, IEEE Trans. Neural Networks 13 (3) (2002) 564–577.