Expert Systems: Principles and Programming, Fourth Edition Chapter 5 Inexact Reasoning At a Glance Table of Contents Overview Objectives Instructor Notes Quick Quizzes Discussion Questions Key Terms 5-1 Expert Systems: Principles and Programming, Fourth Edition 5-2 Lecture Notes Overview In this chapter, we continue the discussion of reasoning under uncertainty. In the previous chapter, the main paradigm for this type of reasoning was probabilistic reasoning and Bayes’ Theorem. In this chapter, we cover fuzzy logic, which has applications in image processing and control, fuzzy rule-based machine learning classification, clustering, and function approximation. Two versions of CLIPS have been developed for expert systems that use a lot of fuzzy reasoning NRC fuzzyCLIPS and Togai InfraLogic Fuzzy CLIPS. Probability was originally developed for ideal games of chance where the same experiment could be reproduced indefinitely reproducible uncertainty means (in the statistical sense), when there is a large population and the results are averaged over many trials. We can give the odds but no statement of certainty. While the subjective probability has uses, there are alternative theories to deal with inexact reasoning where the antecedent, the conclusion, and even the meaning of the rule itself are uncertain to some extent. Chapter objectives After completing this chapter, students will be able to answer the following: What are the sources of uncertainty in rules? What are some methods of dealing with uncertainty? What is the Dempster-Shafer theory? What is the theory of uncertainty based on fuzzy logic? What are some commercial applications of fuzzy logic? Instructor Notes Uncertainty and Rules This section provides an overview of rules and uncertainty. We have already seen that expert systems can operate with uncertain information. There are several sources of uncertainty in rules: 1. Uncertainty related to individual rules 2. Uncertainty due to conflict resolution 3. Uncertainty due to incompatibility of rules The goal of the knowledge engineer is to minimize, or eliminate, these uncertainties if possible. Minimizing uncertainty is part of the verification of the rule. Verification is concerned with the correctness of the system’s building blocks – rules. Even if all the rules are correct, it does not necessarily mean that the system will give a correct answer. We use the term verification to mean minimizing the local uncertainties and the term validation refers to minimizing the global uncertainties of the entire expert system. Not only are there uncertainties associated with the creation of rules but also with the assignment of values. Expert Systems: Principles and Programming, Fourth Edition 5-3 The ad hoc introduction of formulas such as fuzzy logic to a probabilistic system introduces a problem. The expert system then lacks a sound theoretical foundation based on classical probability. The danger of ad hoc methods is the lack of complete theory to guide the application or warn of inappropriate situations. As to conflict resolution, if the knowledge engineer specifies the explicit priority of rules, there is a potential source of error since the priority may not be optimum or correct. One cause of uncertainty is the potential contradiction of rules rules may fire with contradictory consequents, possibly as a result of antecedents not being specified properly. A second source of uncertainty is subsumption of rules one rules is subsumed by another if a portion of its antecedent is a subset of another rule. There is uncertainty in conflict resolution with regard to priority of firing and may depend on a number of factors, including: 1. Explicit priority rules 2. Implicit priority of rules a. Specificity of patterns b. Recency of facts matching patterns c. Ordering of patterns (i) Lexicographic (ii) Means-Ends Analysis d. Ordering that rules are entered Regarding recency of facts, whenever a fact is entered in the working memory, it receives a unique timetag indicating when it was entered. The order that rules are entered in the expert system may also be a factor in conflict resolution if the inference engine cannot prioritize rules; an arbitrary choice has to be made. Redundant rules are accidentally entered by the knowledge engineer or accidentally occur when a rule is modified by pattern deletion. Deciding which redundant rule to delete is not a trivial matter. Uncertainty arising from missing rules occurs if the human expert forgets or is unaware of a rule. Another cause of uncertainty is data fusion the fusing of data from different types of information. Certainty Factors Another method of dealing with uncertainty uses uncertainty factors. 1. Difficulties with the Bayesian Method While the Bayesian method is useful in medicine (and geology) because we are determining the probability of a specific disease (location of a mineral deposit) given certain symptoms. The problem is that is usually impossible to determine the probabilities of all these symptoms with complete values (or all known geological hypotheses about the 92 natural elements). Evidence tends to accumulate piece-by-piece over time each new piece of evidence each requires its own probability. 2. Belief and Disbelief Even though the probability that I have a disease plus the probability that I don’t have a disease equals one stated another way, the probability that I have a disease is one minus the probability that I don’t have it. It was found that physicians were reluctant to state their knowledge in the form: P(H | E)=1 P(H' | E) Expert Systems: Principles and Programming, Fourth Edition 5-4 where E represents evidence. This reluctance stems from likelihoods of belief / disbelief, not in the probabilities. The problem is that this equation implies a cause-and-effect relationship between E and H, there may be no cause-and-effect relationship between E and H’; the equation implies a cause-and-effect relationship between E and H’ if there is a cause-and-effect between E and H. 3. Measures of Belief and Disbelief The certainty factor, CF, is a way of combining belief and disbelief into a single number. This has two uses a) The certainty factor can be used to rank hypotheses in order of importance b) The certainty factor, CF, indicates the net belief in a hypothesis based on some evidence. A positive CF means the evidence supports the hypothesis. A CF of 1 means the evidence definitely proves the hypothesis. CF equal to zero could mean that there is no evidence or that the belief and the disbelief completely cancel each other. A negative CF means that evidence favors the negation of the hypothesis more reason to disbelieve the hypothesis than to believe it. In MYCIN, a rule’s antecedent CF must be greater than 0.2 for the antecedent to be considered true and activate the rule. This threshold value is a way of minimizing the activation of rules that only weakly suggest the hypothesis this improves the efficiency of the system by preventing many rules to be activated with little or no value. A combining function can be used to combine the values of two CF values. 4. Difficulties with Certainty Factors In MYCIN, which was very successful in diagnosis, there were difficulties with the theoretical foundations of certain factors there was some basis for the CF values in probability theory and confirmation theory, but the CF were also partly ad hoc. Also, the CF values could be the opposite of conditional probabilities. Dempster-Shafer Theory The Dempster-Shafer theory is a method of inexact reasoning. It is based on the work of Dempster who attempted to model uncertainty by a range of probabilities rather than as a single probabilistic number. 1. The Dempster-Shafer theory assumes that there is a fixed set of mutually exclusive and exhaustive elements called environment and symbolized by the Greek letter : θ1 , θ2 , ... θ N The environment is another term for the universe of discourse in set theory. Consider the following: = rowboat, sailboat, destroyer, aircraft carrier These are all mutually exclusive elements. Now, we could ask the question, “What are the military boats?” The answer would be a subset of . θ3 , θ4 destroyer, aircraft carrier Likewise, the question, “What boat is powered by oars?” is the set: θ1 = rowboat Expert Systems: Principles and Programming, Fourth Edition 5-5 This set is called a singleton because it has only one element. Each subset of is a possible answer to the question since the elements are mutually exclusive and the environment is exhaustive, there can be only one correct answer to a question. Each subset can be considered an implied proposition: The correct answer is: θ1, θ2 , θ3 The correct answer is: θ1, θ3 All subsets of the environment can be drawn as a hierarchical lattice with the bottom. at the top and the null set at An environment is called a frame of discernment when its elements may be interpreted as possible answers, and only one answer is correct. If the answer is not in the frame, then the frame must be enlarged to accommodate the additional knowledge of elements. 2. Mass Functions and Ignorance In Bayesian theory, the posterior probability changes as evidence is acquired. In Dempster-Shafer theory, the belief in evidence may vary we talk about the degree of belief in evidence as analogous to the mass of a physical object. Thus, the mass supports the belief. The evidence measure is the amount of mass. The fundamental difference between Dempster-Shafer theory and probability theory is the treatment of ignorance. The Dempster-Shafer theory does not force belief to be assigned to ignorance or refutation of a hypothesis mass is only assigned to those subsets of the environment to which you wish to assign belief any belief not assigned to a specific subset is considered no belief or nonbelief and just associated with the environment. Every set in the power set of the environment which as mass greater than 0 is a focal element. Every mass can formally be expressed as a function that maps each element of the power set into a real number in the interval 0 to 1, stated as follows: m: P 3. 0, 1 Combining Evidence Dempster’s rule combines mass to produce a new mass that represents a consensus of the original, possibly conflicting evidence. The new mass is a consensus because it tends to favor agreement rather than disagreement by including only masses in the set intersections. The set intersections represent common elements of evidence. In evidential reasoning, the evidence is said to induce an evidential interval or range of belief in the evidence. The lower bound is called the support (Spt) (evidential reasoning) and Bel in Dempster-Shafter theory. The upper bound is called the plausibility (Pls). The support belief function, Bel, is the total belief of a set of and all its subsets sometimes called the belief measure. 4. The moving mass analogy is helpful in understanding support and plausibility. The main concepts are as follows: The support is the mass assigned to a set and all its subsets. Mass of a set can move freely into its subsets. Mass in a set cannot move into its supersets. Moving mass from a set into its subset can only contribute to the plausibility of the subset, not its support. Expert Systems: Principles and Programming, Fourth Edition 5. Mass in the environment, 5-6 , can move into any subset since is outside all. One difficulty with the Dempster-Shafer theory occurs with normalization and may lead to results that are contrary to our expectations the problem with normalization is that it ignores the belief that the object being considered does not exist. Approximate Reasoning In this section, we discuss a theory of uncertainty based on fuzzy logic and concerned with quantifying and reasoning using natural language where words have ambiguous meanings. Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth truth value between completely truth and completely false. The term soft-computing means computing that is not based on classical two-valued logics, and includes fuzzy logic, neural networks, and probabilistic reasoning. 1. Fuzzy Sets and Natural Language A discriminati0n function is a characteristic function is a way to representing which objects are members of a set 1 means an object is an element of a set; 0 means that it is not. Sets to which this type of representation applies are called crisp sets, in contrast to fuzzy sets. So fuzzy logic plays to the middle ground, the gray areas of partially true, partially false situations which make up most of human reasoning. It is based on the assumption that everything consists of degrees age beauty, wealth, etc. In fuzzy sets, an object may belong partially to a set degree of membership is measured by a generalization of the characteristic function called the membership function or compatibility function. A particular value of the membership function, e.g. 0.5, is called a grade of membership. A fuzzy truth value is called a fuzzy qualifier. Unlike crisp propositions, which are not allowed to have quantifiers, fuzzy propositions may have fuzzy quantifiers. The term compatibility means how well one object conforms to some attribute. There are many types of membership functions. Membership functions may be constructed from a person’s opinions or from a group of people’s opinions. In an expert system, the membership function will be constructed from the expert’s opinion. In a membership function, the crossover point is where 0.5 . The S-function is often used in fuzzy sets as a membership function. 2. Fuzzy Set Operations An ordinary crisp set is a special case f a fuzzy set with membership function [0, 1]. All the definitions, proofs, and theorems of fuzzy sets must be compatible in the limit as the fuzziness goes to 0 and the fuzzy sets become crisp sets. The following is a summary of fuzzy set operations: set equality A=B μA x = μB x set complement A’ μA' (x) = 1 μA (x) set containment A B μA x μB x proper subset μA x A B μ B x and μ A x < μ B x for all x X for all x X for all x X for at least one x X Expert Systems: Principles and Programming, Fourth Edition 3. 5-7 set union A B μ A μ B (x)= μ A (x), μ B (x) for all x X set intersection A B μ A μ B (x)= μ A (x), μ B (x) for all x X set product AB μAB (x) = μA (x) μ B (x) power of a set AN A (A (x)) N probabilistic sum A+B A+B (x) = A (x) + B (x) A (x)B (x) bounded sum AB μ AB (x) = 1, (μ A (x) + μ B (x)) bounded product AB B AB 0, μA (x) μ B (x) 1 bounded difference A || B A | | B ( x) 0, μA (x) μ B (x) concentration CON(A) CON(A) (x) = A (x) dilation DIL(A) DIL(A) (x) = A (x) intensification INT(A) INT(A) (x) = normalization NORM(A) NORM(A) A (x) / max A (x) N 2 0.5 2 A (x) 2 1 2 1 A (x) Fuzzy Relations A relation from set A to set B is a subset of the Cartesian product: A B= a,b|a A and b B If X and Y are universal sets, then R = R (x, y) / (x, y) | (x, y) X Y for 0 A (x) 0.5 for 0.5 < A (x) 1 Expert Systems: Principles and Programming, Fourth Edition 5-8 The composition of relations is the net effect of applying one relation after another. For the case of two binary relations P and Q, the composition of their relations is the binary relation R: R(A, C) = Q(A, B) P(B, C) where: R(A, C) is a relation between A and C Q(A, B) is a relation between A and B P(B, C) is a relation between B and C A, B, and C are sets and 4. is the composition operator. Linguistic Variables One important application of fuzzy sets is computational linguistics where the goal is to calculate with natural language statements analogous to the way the logic calculates with logic statements. Fuzzy sets and linguistic variables can be used to quantify the meaning of natural language, which then can be manipulated. Linguistic variables must have a valid syntax and semantics. 5. Extension Principle The extension principle is a very important concept which defines how to extend the domain of a given crisp function to include fuzzy sets. Using this principle, any ordinary or crisp function from mathematics, science, engineering, business, and so forth can be extended tow work a fuzzy domain with fuzzy sets. This principle makes fuzzy sets applicable to all fields. 6. Fuzzy Logic Jus as classical logic forms the basis of conventional expert systems, fuzzy logic forms the basis of fuzzy expert systems. Besides dealing with uncertainty, fuzzy expert systems are also capable of modeling commonsense reasoning. Fuzzy logic may be considered an extension of multivalued logic. Fuzzy logic is the logic of approximate reasoning the inference of a possibly imprecise conclusion from a set of possibly imprecise premises. 7. Fuzzy Rules As a simple example of fuzzy set operators, we consider the problem of pattern recognition in this section. 8. Possibility and Probability The term possibility has a specific meaning in fuzzy theory it refers to allowed values. A possibility distribution is not the same as a probability distribution frequency of expected occurrence of some random variable. 9. Translation Rules Translation rules specify how modified or composite propositions are generated from their elementary propositions. Such rules fall into categories of Type I, II, III, and IV: Type I modification rules Expert Systems: Principles and Programming, Fourth Edition Type II composition rules conditional composition, conjunctive composition, disjunctive composition Type III quantification rules 5-9 Type IV quantification rules truth quantification, probability qualification, possibility qualification The State of Uncertainty and Commercial Applications of Fuzzy Logic There are two mountains the mountain of logic and the mountain of uncertainty. Expert systems are built on the mountain of logic and must reach valid conclusions given a set of premises. Valid conclusions given that: 1. The rules were written correctly 2. The facts upon which the inference engine generates valid conclusions are true facts Today, fuzzy logic and Bayesian theory are most often used for uncertainty. Classical Bayesian, convex set Bayesian, Dempster-Shafer, Kyburg, and the possibility theory of fuzzy logic are among a few methods that are widely available as software tools today. Quick Quiz 1. Probability has been called by mathematicians a theory of _____ uncertainty. Answer: reproducible 2. An intelligent system is any system that can operate with _____ information. Answer: uncertain 3. _____ can be viewed as minimizing local uncertainties. Answer: Verification 4. _____ is uncertainty as a result of one rule contradicting another rule. Answer: Contradiction of rules 5. Whenever a fact is entered in the working memory of a computer, it receives a unique ____. Answer: timetag 6. The order that rules are entered into an expert system may be a factor in conflict resolution (true / false). Answer: true 7. The purpose of _____ is to help the system keep executing a certain task without becoming distracted by very recent facts that have entered working memory. Answer: means-end-analysis (MEA) 8. Rules that have the same consequence and evidence are called ____ rules. Answer: redundant Expert Systems: Principles and Programming, Fourth Edition 9. 5-10 Rules that the human expert forgets, or is unaware of, are called _____ rules. Answer: missing 10. A(n) _____ CF value indicates that the evidence supports the hypothesis. Answer: positive 11. A CF value of _____ indicates that the evidence proves the hypothesis. Answer: 1 12. A(n) _____ is a way of minimizing the activation of rules which only weakly suggest a hypothesis. Answer: threshold value 13. The attenuation factor is based on the assumption that all the evidence is known with ____. Answer: certainty 14. The Dempster-Shafer theory is characterized by modeling uncertainty by a(n) _____ rather than a single probabilistic number. Answer: range of values 15. The Dempster-Shafer theory assumes there is a fixed set of mutually exclusive and exhaustive elements called the _____ and symbolized by the Greek letter _____. Answer: environment, 16. A(n) ____ set has only one element. Answer: singleton 17. The Dempster-Shafer theory refers to the degree of belief in evidence as analogous to the _____ of a physical object. Answer: mass 18. A fundamental difference between the Dempster-Shafer theory and probabilistic theory is the treatment of _____. Answer: ignorance 19. In the Dempster-Shafer theory, instead of restricting belief to a single value, there is a(n) _____ in belief of the evidence. Answer: range 20. The _____ function is sometime called the belief measure. Answer: Bel 21. The main concepts of the mass moving analogy states that mass in a set (can / cannot) move into its supersets. Answer: cannot Expert Systems: Principles and Programming, Fourth Edition 5-11 22. Computing that is not based on the classical two-valued logic is sometimes called _____ computing. Answer: soft 23. Set equality for fuzzy sets is defined as _____. Answer: μ A x = μ B x for all x X 24. Can fuzzy set A and A’ overlap (yes / no)? Answer: yes 25. A(n) _____ is sometimes called a mapping because it associates elements from one domain to another. Answer: relation 26. An application of fuzzy sets to natural language is _____. Answer: computational linguistics 27. The term _____ in fuzzy refers to allowed values. Answer: possibility 28. Very likely, unlikely, not very likely are examples of _____. Answer: fuzzy qualifiers Discussion Questions 1. What are the three major uncertainties in a rule-based expert system? 2. What is the difference between verification and validation in an expert system? 3. What are the five causes of uncertainty with regard to compatibility with regard to rules? 4. What is data fusion? 5. What is the problem associated with belief and disbelief in the MYCIN system? 6. Interpret a CF value of 0. 7. How does a threshold value improve the efficiency of an expert system? 8. What is the purpose of combining CF values? 9. What is the main difference between the way ignorance is regarded in Dempster-Shafer and probability theory? 10. How is fuzzy logic associated with camcorders? Expert Systems: Principles and Programming, Fourth Edition 5-12 Key Terms Certainty Factor: A number that combines belief and disbelief into a single number. Dempster-Shafer Theory: A method of inexact reasoning also called theory of belief functions. Extension Principle: Defines how to extend the domain of a given crisp function to include fuzzy sets. Fuzzy Reasoning: The inference of a possibly imprecise conclusion from a set of possibly imprecise premises. Fuzzy Set Operations: Operations analogous to those of ordinary crisp sets. Possibility: Refers to allowed values. Relation: A mapping which associates the elements from one domain to another. Reproducible Uncertainty: Involves a large population where results are averaged over many trials. Validation: The process of minimizing global uncertainties. Verification: The process of minimizing local uncertainties.