chapter05 - WordPress.com

advertisement
Expert Systems: Principles and Programming, Fourth Edition
Chapter 5
Inexact Reasoning
At a Glance
Table of Contents

Overview

Objectives

Instructor Notes

Quick Quizzes

Discussion Questions

Key Terms
5-1
Expert Systems: Principles and Programming, Fourth Edition
5-2
Lecture Notes
Overview
In this chapter, we continue the discussion of reasoning under uncertainty. In the previous
chapter, the main paradigm for this type of reasoning was probabilistic reasoning and Bayes’
Theorem. In this chapter, we cover fuzzy logic, which has applications in image processing and
control, fuzzy rule-based machine learning classification, clustering, and function approximation.
Two versions of CLIPS have been developed for expert systems that use a lot of fuzzy reasoning
 NRC fuzzyCLIPS and Togai InfraLogic Fuzzy CLIPS.
Probability was originally developed for ideal games of chance where the same experiment could
be reproduced indefinitely  reproducible uncertainty means (in the statistical sense), when there
is a large population and the results are averaged over many trials. We can give the odds but no
statement of certainty. While the subjective probability has uses, there are alternative theories to
deal with inexact reasoning  where the antecedent, the conclusion, and even the meaning of the
rule itself are uncertain to some extent.
Chapter objectives
After completing this chapter, students will be able to answer the following:





What are the sources of uncertainty in rules?
What are some methods of dealing with uncertainty?
What is the Dempster-Shafer theory?
What is the theory of uncertainty based on fuzzy logic?
What are some commercial applications of fuzzy logic?
Instructor Notes
Uncertainty and Rules
This section provides an overview of rules and uncertainty. We have already seen that expert systems can operate
with uncertain information. There are several sources of uncertainty in rules:
1.
Uncertainty related to individual rules
2.
Uncertainty due to conflict resolution
3.
Uncertainty due to incompatibility of rules
The goal of the knowledge engineer is to minimize, or eliminate, these uncertainties if possible. Minimizing
uncertainty is part of the verification of the rule. Verification is concerned with the correctness of the system’s
building blocks – rules. Even if all the rules are correct, it does not necessarily mean that the system will give a
correct answer. We use the term verification to mean minimizing the local uncertainties and the term validation
refers to minimizing the global uncertainties of the entire expert system. Not only are there uncertainties associated
with the creation of rules but also with the assignment of values.
Expert Systems: Principles and Programming, Fourth Edition
5-3
The ad hoc introduction of formulas such as fuzzy logic to a probabilistic system introduces a problem. The expert
system then lacks a sound theoretical foundation based on classical probability. The danger of ad hoc methods is the
lack of complete theory to guide the application or warn of inappropriate situations.
As to conflict resolution, if the knowledge engineer specifies the explicit priority of rules, there is a potential source
of error since the priority may not be optimum or correct. One cause of uncertainty is the potential contradiction of
rules  rules may fire with contradictory consequents, possibly as a result of antecedents not being specified
properly. A second source of uncertainty is subsumption of rules  one rules is subsumed by another if a portion of
its antecedent is a subset of another rule. There is uncertainty in conflict resolution with regard to priority of firing
and may depend on a number of factors, including:
1.
Explicit priority rules
2.
Implicit priority of rules
a.
Specificity of patterns
b.
Recency of facts matching patterns
c.
Ordering of patterns
(i) Lexicographic
(ii) Means-Ends Analysis
d.
Ordering that rules are entered
Regarding recency of facts, whenever a fact is entered in the working memory, it receives a unique timetag
indicating when it was entered. The order that rules are entered in the expert system may also be a factor in conflict
resolution  if the inference engine cannot prioritize rules; an arbitrary choice has to be made. Redundant rules are
accidentally entered by the knowledge engineer or accidentally occur when a rule is modified by pattern deletion.
Deciding which redundant rule to delete is not a trivial matter. Uncertainty arising from missing rules occurs if the
human expert forgets or is unaware of a rule. Another cause of uncertainty is data fusion  the fusing of data from
different types of information.
Certainty Factors
Another method of dealing with uncertainty uses uncertainty factors.
1.
Difficulties with the Bayesian Method
While the Bayesian method is useful in medicine (and geology) because we are determining the probability of
a specific disease (location of a mineral deposit) given certain symptoms. The problem is that is usually
impossible to determine the probabilities of all these symptoms with complete values (or all known geological
hypotheses about the 92 natural elements). Evidence tends to accumulate piece-by-piece over time  each new
piece of evidence each requires its own probability.
2.
Belief and Disbelief
Even though the probability that I have a disease plus the probability that I don’t have a disease equals one 
stated another way, the probability that I have a disease is one minus the probability that I don’t have it. It was
found that physicians were reluctant to state their knowledge in the form:
P(H | E)=1  P(H' | E)
Expert Systems: Principles and Programming, Fourth Edition
5-4
where E represents evidence. This reluctance stems from likelihoods of belief / disbelief, not in the
probabilities. The problem is that this equation implies a cause-and-effect relationship between E and H, there
may be no cause-and-effect relationship between E and H’; the equation implies a cause-and-effect relationship
between E and H’ if there is a cause-and-effect between E and H.
3.
Measures of Belief and Disbelief
The certainty factor, CF, is a way of combining belief and disbelief into a single number. This has two uses 
a)
The certainty factor can be used to rank hypotheses in order of importance
b)
The certainty factor, CF, indicates the net belief in a hypothesis based on some evidence.
A positive CF means the evidence supports the hypothesis. A CF of 1 means the evidence definitely proves
the hypothesis. CF equal to zero could mean that there is no evidence or that the belief and the disbelief
completely cancel each other. A negative CF means that evidence favors the negation of the hypothesis 
more reason to disbelieve the hypothesis than to believe it.
In MYCIN, a rule’s antecedent CF must be greater than 0.2 for the antecedent to be considered true and
activate the rule. This threshold value is a way of minimizing the activation of rules that only weakly suggest
the hypothesis  this improves the efficiency of the system by preventing many rules to be activated with little
or no value. A combining function can be used to combine the values of two CF values.
4.
Difficulties with Certainty Factors
In MYCIN, which was very successful in diagnosis, there were difficulties with the theoretical foundations of
certain factors  there was some basis for the CF values in probability theory and confirmation theory, but the
CF were also partly ad hoc. Also, the CF values could be the opposite of conditional probabilities.
Dempster-Shafer Theory
The Dempster-Shafer theory is a method of inexact reasoning. It is based on the work of Dempster who attempted
to model uncertainty by a range of probabilities rather than as a single probabilistic number.
1.
The Dempster-Shafer theory assumes that there is a fixed set of mutually exclusive and exhaustive elements
called environment and symbolized by the Greek letter  :
  θ1 , θ2 , ... θ N 
The environment is another term for the universe of discourse in set theory. Consider the following:
 = rowboat, sailboat, destroyer, aircraft carrier
These are all mutually exclusive elements. Now, we could ask the question, “What are the military boats?”
The answer would be a subset of  .
θ3 , θ4  destroyer, aircraft carrier
Likewise, the question, “What boat is powered by oars?” is the set:
θ1 = rowboat
Expert Systems: Principles and Programming, Fourth Edition
5-5
This set is called a singleton because it has only one element. Each subset of  is a possible answer to the
question  since the elements are mutually exclusive and the environment is exhaustive, there can be only one
correct answer to a question.
Each subset can be considered an implied proposition:
The correct answer is:
θ1, θ2 , θ3
The correct answer is:
θ1, θ3
All subsets of the environment can be drawn as a hierarchical lattice with
the bottom.
 at the top and the null set  at
An environment is called a frame of discernment when its elements may be interpreted as possible answers,
and only one answer is correct. If the answer is not in the frame, then the frame must be enlarged to
accommodate the additional knowledge of elements.
2.
Mass Functions and Ignorance
In Bayesian theory, the posterior probability changes as evidence is acquired. In Dempster-Shafer theory, the
belief in evidence may vary  we talk about the degree of belief in evidence as analogous to the mass of a
physical object. Thus, the mass supports the belief. The evidence measure is the amount of mass.
The fundamental difference between Dempster-Shafer theory and probability theory is the treatment of
ignorance. The Dempster-Shafer theory does not force belief to be assigned to ignorance or refutation of a
hypothesis  mass is only assigned to those subsets of the environment to which you wish to assign belief 
any belief not assigned to a specific subset is considered no belief or nonbelief and just associated with the
environment.
Every set in the power set of the environment which as mass greater than 0 is a focal element. Every mass can
formally be expressed as a function that maps each element of the power set into a real number in the interval 0
to 1, stated as follows:
m: P
3.
 
 0, 1
Combining Evidence
Dempster’s rule combines mass to produce a new mass that represents a consensus of the original, possibly
conflicting evidence. The new mass is a consensus because it tends to favor agreement rather than
disagreement by including only masses in the set intersections. The set intersections represent common
elements of evidence. In evidential reasoning, the evidence is said to induce an evidential interval  or range
of belief in the evidence. The lower bound is called the support (Spt) (evidential reasoning) and Bel in
Dempster-Shafter theory. The upper bound is called the plausibility (Pls). The support belief function, Bel, is
the total belief of a set of and all its subsets  sometimes called the belief measure.
4.
The moving mass analogy is helpful in understanding support and plausibility. The main concepts are as
follows:




The support is the mass assigned to a set and all its subsets.
Mass of a set can move freely into its subsets.
Mass in a set cannot move into its supersets.
Moving mass from a set into its subset can only contribute to the plausibility of the subset, not its
support.
Expert Systems: Principles and Programming, Fourth Edition

5.
Mass in the environment,
5-6
 , can move into any subset since  is outside all.
One difficulty with the Dempster-Shafer theory occurs with normalization and may lead to results that are
contrary to our expectations  the problem with normalization is that it ignores the belief that the object being
considered does not exist.
Approximate Reasoning
In this section, we discuss a theory of uncertainty based on fuzzy logic and concerned with quantifying and
reasoning using natural language where words have ambiguous meanings. Fuzzy logic is a superset of conventional
(Boolean) logic that has been extended to handle the concept of partial truth  truth value between completely truth
and completely false.
The term soft-computing means computing that is not based on classical two-valued logics, and includes fuzzy
logic, neural networks, and probabilistic reasoning.
1.
Fuzzy Sets and Natural Language
A discriminati0n function is a characteristic function is a way to representing which objects are members of a
set  1 means an object is an element of a set; 0 means that it is not. Sets to which this type of representation
applies are called crisp sets, in contrast to fuzzy sets. So fuzzy logic plays to the middle ground, the gray areas
of partially true, partially false situations which make up most of human reasoning. It is based on the
assumption that everything consists of degrees  age beauty, wealth, etc.
In fuzzy sets, an object may belong partially to a set  degree of membership is measured by a generalization
of the characteristic function called the membership function or compatibility function. A particular value of
the membership function, e.g. 0.5, is called a grade of membership.
A fuzzy truth value is called a fuzzy qualifier. Unlike crisp propositions, which are not allowed to have
quantifiers, fuzzy propositions may have fuzzy quantifiers. The term compatibility means how well one object
conforms to some attribute. There are many types of membership functions. Membership functions may be
constructed from a person’s opinions or from a group of people’s opinions. In an expert system, the
membership function will be constructed from the expert’s opinion. In a membership function, the crossover
point is where   0.5 . The S-function is often used in fuzzy sets as a membership function.
2.
Fuzzy Set Operations
An ordinary crisp set is a special case f a fuzzy set with membership function [0, 1]. All the definitions, proofs,
and theorems of fuzzy sets must be compatible in the limit as the fuzziness goes to 0 and the fuzzy sets become
crisp sets.
The following is a summary of fuzzy set operations:
 set equality
A=B
μA  x  = μB  x 
 set complement
A’
μA' (x) = 1  μA (x)
 set containment
A  B
μA  x   μB  x 
 proper subset
μA  x 
A  B
 μ B  x  and μ A  x  < μ B  x 
for all x  X
for all x  X
for all x  X
for at least one x  X
Expert Systems: Principles and Programming, Fourth Edition
3.
5-7
 set union
A B
μ A  μ B (x)= μ A (x), μ B (x)  for all x  X
 set intersection
A  B
μ A  μ B (x)=  μ A (x), μ B (x)  for all x  X
 set product
AB
μAB (x) = μA (x) μ B (x)
 power of a set
AN
A  (A (x)) N
 probabilistic sum
A+B
A+B (x) = A (x) + B (x)  A (x)B (x)
 bounded sum
AB
μ AB (x) =  1, (μ A (x) + μ B (x)) 
 bounded product
AB B
AB    0,  μA (x)  μ B (x) 1 
 bounded difference
A || B
A | | B ( x)    0,  μA (x)  μ B (x)  
 concentration
CON(A)
CON(A) (x) =   A (x) 
 dilation
DIL(A)
 DIL(A) (x) =   A (x) 
 intensification
INT(A)
INT(A) (x) = 
 normalization
NORM(A)
NORM(A)  A (x) / max A (x)
N
2
0.5

2  A (x) 
2

1  2 1  A (x) 
Fuzzy Relations
A relation from set A to set B is a subset of the Cartesian product:
A  B=
 a,b|a  A and b  B
If X and Y are universal sets, then
R = R (x, y) / (x, y) | (x, y)  X  Y
for 0  A (x)  0.5
for 0.5 < A (x)  1
Expert Systems: Principles and Programming, Fourth Edition
5-8
The composition of relations is the net effect of applying one relation after another. For the case of two binary
relations P and Q, the composition of their relations is the binary relation R:
R(A, C) = Q(A, B)
P(B, C)
where:
R(A, C) is a relation between A and C
Q(A, B) is a relation between A and B
P(B, C) is a relation between B and C
A, B, and C are sets
and
4.
is the composition operator.
Linguistic Variables
One important application of fuzzy sets is computational linguistics where the goal is to calculate with natural
language statements analogous to the way the logic calculates with logic statements. Fuzzy sets and linguistic
variables can be used to quantify the meaning of natural language, which then can be manipulated. Linguistic
variables must have a valid syntax and semantics.
5.
Extension Principle
The extension principle is a very important concept which defines how to extend the domain of a given crisp
function to include fuzzy sets. Using this principle, any ordinary or crisp function from mathematics, science,
engineering, business, and so forth can be extended tow work a fuzzy domain with fuzzy sets. This principle
makes fuzzy sets applicable to all fields.
6.
Fuzzy Logic
Jus as classical logic forms the basis of conventional expert systems, fuzzy logic forms the basis of fuzzy
expert systems. Besides dealing with uncertainty, fuzzy expert systems are also capable of modeling
commonsense reasoning.
Fuzzy logic may be considered an extension of multivalued logic. Fuzzy logic is the logic of approximate
reasoning  the inference of a possibly imprecise conclusion from a set of possibly imprecise premises.
7.
Fuzzy Rules
As a simple example of fuzzy set operators, we consider the problem of pattern recognition in this section.
8.
Possibility and Probability
The term possibility has a specific meaning in fuzzy theory  it refers to allowed values. A possibility
distribution is not the same as a probability distribution  frequency of expected occurrence of some random
variable.
9.
Translation Rules
Translation rules specify how modified or composite propositions are generated from their elementary
propositions. Such rules fall into categories of Type I, II, III, and IV:
Type I
modification rules
Expert Systems: Principles and Programming, Fourth Edition
Type II
composition rules  conditional composition, conjunctive composition, disjunctive composition
Type III
quantification rules
5-9
Type IV quantification rules  truth quantification, probability qualification, possibility qualification
The State of Uncertainty and Commercial Applications of Fuzzy Logic
There are two mountains  the mountain of logic and the mountain of uncertainty. Expert systems are built on the
mountain of logic and must reach valid conclusions given a set of premises. Valid conclusions given that:
1.
The rules were written correctly
2.
The facts upon which the inference engine generates valid conclusions are true facts
Today, fuzzy logic and Bayesian theory are most often used for uncertainty. Classical Bayesian, convex set
Bayesian, Dempster-Shafer, Kyburg, and the possibility theory of fuzzy logic are among a few methods that are
widely available as software tools today.
Quick Quiz
1.
Probability has been called by mathematicians a theory of _____ uncertainty.
Answer: reproducible
2.
An intelligent system is any system that can operate with _____ information.
Answer: uncertain
3.
_____ can be viewed as minimizing local uncertainties.
Answer: Verification
4.
_____ is uncertainty as a result of one rule contradicting another rule.
Answer: Contradiction of rules
5.
Whenever a fact is entered in the working memory of a computer, it receives a unique ____.
Answer: timetag
6.
The order that rules are entered into an expert system may be a factor in conflict resolution
(true / false).
Answer: true
7.
The purpose of _____ is to help the system keep executing a certain task without becoming
distracted by very recent facts that have entered working memory.
Answer: means-end-analysis (MEA)
8.
Rules that have the same consequence and evidence are called ____ rules.
Answer: redundant
Expert Systems: Principles and Programming, Fourth Edition
9.
5-10
Rules that the human expert forgets, or is unaware of, are called _____ rules.
Answer: missing
10. A(n) _____ CF value indicates that the evidence supports the hypothesis.
Answer: positive
11. A CF value of _____ indicates that the evidence proves the hypothesis.
Answer: 1
12. A(n) _____ is a way of minimizing the activation of rules which only weakly suggest a
hypothesis.
Answer: threshold value
13. The attenuation factor is based on the assumption that all the evidence is known with ____.
Answer: certainty
14. The Dempster-Shafer theory is characterized by modeling uncertainty by a(n) _____ rather
than a single probabilistic number.
Answer: range of values
15. The Dempster-Shafer theory assumes there is a fixed set of mutually exclusive and
exhaustive elements called the _____ and symbolized by the Greek letter _____.
Answer: environment, 
16. A(n) ____ set has only one element.
Answer: singleton
17. The Dempster-Shafer theory refers to the degree of belief in evidence as analogous to the
_____ of a physical object.
Answer: mass
18. A fundamental difference between the Dempster-Shafer theory and probabilistic theory is
the treatment of _____.
Answer: ignorance
19. In the Dempster-Shafer theory, instead of restricting belief to a single value, there is a(n)
_____ in belief of the evidence.
Answer: range
20. The _____ function is sometime called the belief measure.
Answer: Bel
21. The main concepts of the mass moving analogy states that mass in a set (can / cannot) move
into its supersets.
Answer: cannot
Expert Systems: Principles and Programming, Fourth Edition
5-11
22. Computing that is not based on the classical two-valued logic is sometimes called _____
computing.
Answer: soft
23. Set equality for fuzzy sets is defined as _____.
Answer: μ A  x  = μ B  x  for all x  X
24. Can fuzzy set A and A’ overlap (yes / no)?
Answer: yes
25. A(n) _____ is sometimes called a mapping because it associates elements from one domain
to another.
Answer: relation
26. An application of fuzzy sets to natural language is _____.
Answer: computational linguistics
27. The term _____ in fuzzy refers to allowed values.
Answer: possibility
28. Very likely, unlikely, not very likely are examples of _____.
Answer: fuzzy qualifiers
Discussion Questions
1.
What are the three major uncertainties in a rule-based expert system?
2.
What is the difference between verification and validation in an expert system?
3.
What are the five causes of uncertainty with regard to compatibility with regard to rules?
4.
What is data fusion?
5.
What is the problem associated with belief and disbelief in the MYCIN system?
6.
Interpret a CF value of 0.
7.
How does a threshold value improve the efficiency of an expert system?
8.
What is the purpose of combining CF values?
9.
What is the main difference between the way ignorance is regarded in Dempster-Shafer and
probability theory?
10. How is fuzzy logic associated with camcorders?
Expert Systems: Principles and Programming, Fourth Edition
5-12
Key Terms
 Certainty Factor: A number that combines belief and disbelief into a single number.
 Dempster-Shafer Theory: A method of inexact reasoning  also called theory of belief
functions.
 Extension Principle: Defines how to extend the domain of a given crisp function to
include fuzzy sets.
 Fuzzy Reasoning: The inference of a possibly imprecise conclusion from a set of
possibly imprecise premises.
 Fuzzy Set Operations: Operations analogous to those of ordinary crisp sets.
 Possibility: Refers to allowed values.
 Relation: A mapping which associates the elements from one domain to another.
 Reproducible Uncertainty: Involves a large population where results are averaged over
many trials.
 Validation: The process of minimizing global uncertainties.
 Verification: The process of minimizing local uncertainties.
Download