Chapter 12

advertisement
Chapter
12
Rule-Based Systems
Chapter Objectives
State that the field of artificial intelligence focuses on problems that cannot be solved with
traditional computing techniques.
Explain how intelligent problem-solving methods define a problem as a state space to be
searched.
Summarize how data mining techniques employ a state space search to build generalized models
from data.
Describe how rule-based systems separate problem-solving knowledge from the reasoning
mechanism used to apply the knowledge.
Illustrate how an expert system is a special type of rule-based system that contains the
knowledge of one or more human experts.
Explain how building an expert system requires the skills of a knowledge engineer who is
responsible for extracting and transforming the knowledge of one or more experts.
List the types of problems that can be solved with a rule-based approach.
Describe how an expert system shell is used to decrease the amount of time necessary to build a
rule-based system.
Explain how a rule-based system can be built using a top-down or a bottom-up problem-solving
approach.
_________
The purpose of data mining and knowledge discovery is to turn data into models for
decision making. Although data mining is an appropriate solution methodology for
many applications, there are times when this approach is not feasible.An obvious scenario
is any situation lacking quality data to be analyzed. Fortunately, knowledge
comes in many forms and can be gathered in many ways. Because of this, when data
mining is not a viable choice, other options for building useful decision-making models
may be available.
In Part IV we examine two alternative methods for building models to aid in the
decision-making process. Expert systems are computer programs designed to emulate
the behavior of a human being who is expert at solving problems within a specialized
area. Intelligent agents are computational entities capable of autonomously achieving
goals by executing needed actions. Data mining, expert systems, and intelligent agents all
exist under the umbrella of artificial intelligence problem-solving techniques.Although
all three approaches are distinct, each has a primary goal of creating intelligent systems.
In Section 12.1 we provide a brief introduction to the field of artificial intelligence
(AI) and describe the types of problems of interest to AI practitioners. In Section 12.2
we offer a general approach for AI problem solving and show how this model is applied
to data mining. In Section 12.3 we discuss rule-based expert systems. Expert systems are
of interest because, like data mining, they focus on building and applying generalized
models to specific problems. However, unlike data mining applications, which construct
models from data, expert systems are built by extracting knowledge from one or more
human experts. In Section 12.4 we provide two examples to illustrate how rules and
rule-based techniques are used for representing and reasoning with knowledge.
12.1 Exploring Artificial Intelligence
The history of artificial intelligence (AI) dates back to the 1940s, starting with the
work of Warren McCulloch and Walter Pitts (1943), Claude Shannon (1950), and
Alan Turing (1950). Since its beginnings, the AI field has experienced many successes
as well as failures.Two of AI’s greatest commercial successes have been in the areas of
expert systems and data mining.Table 12.1 displays a list of common areas of current
study within the AI field.
Because of its extensive scope, a single all-encompassing definition of AI is difficult
to state. Let’s examine four plausible definitions. Problem solving is a common
theme exhibited by each definition.
1. AI is the study of problems that cannot be solved using traditional algorithmic
techniques.
2. AI is the study of how to make computers do things which at the moment
people do better (Rich and Knight, 1991).
•
CD-4 Chapter 12 Rule-Based Systems
3. AI is the study of computations that make it possible to perceive, reason, and
act (Winston, 1992).
4. AI is the study of finding polynomial solutions to exponentially hard problems.
Each definition gives some insight about the types of problems studied by AI
practitioners. The first definition tells us that problems such as how to update credit
card accounts, print payroll checks, and find square roots of numbers are not of interest
to AI researchers, as these problems can be easily solved with simple algorithms.
The second and third definitions emphasize the human side of the field. Problems
such as how to make computers see (computer vision), understand the written or spoken
language (natural language processing and speech recognition), tutor an algebra
student (intelligent tutoring systems), or play a mean game of chess (game playing) do
appeal to the AI community.
The fourth definition is of particular interest because it emphasizes the difficult
nature of the problems studied by AI researchers.An exponentially hard problem is
any problem that cannot be solved in a reasonable amount of time with a traditional
algorithmic approach.A classic problem that fits this category is the traveling salesperson
problem. One description of the problem is as follows.
Given a set of n cities, with one of the cities designated as the starting position, visit all
of the cities (and arrive back home) while keeping the total distance traveled as short as
possible.
• Exploring Artificial Intelligence CD-5
Table 12.1 • Areas of Study Within Artificial Intelligence
12.1
Scheduling Problems
Expert Systems
Game Playing
Intelligent Agents
Intelligent Database Retrieval
Intelligent Tutoring Systems
Machine Learning
Natural Language Processing
Planning
Robotics and Computer Vision
Speech Recognition
Theorem Proving
Although the problem is easy enough to understand, a good solution is not readily
apparent. A first thought is to simply let the traveling salesperson try each alternative
route. For a weekly route, the salesperson might travel a new course each week until all
possibilities have been exhausted. Given this, a weekly itinerary of four cities allows the
salesperson to know a best course of travel after just six weeks. Unfortunately, for the
general case of n cities, there are a total of (n – 1)! possible paths.Therefore when the
number of cities increases to 10, the total count of possible tours expands to 362,880.
Trying one path per week, our salesperson will be traveling a new route for over 6975
years! Even a computer program that knows the travel cost between each pair of cities
will require a significant amount of computing time to check all possible paths.
The traveling salesperson problem can easily be generalized. An interesting variation
of the problem follows.
A school district has a fixed number of buses with which to transport a population of
students to school each day. An individual student should not have to board their bus
before 7:30 A.M. Determine the route and stop locations for each bus so that the students
are transported to the school on time.
The traveling salesperson problem and its variations are notoriously difficult to
solve.To apply a brute force technique of trying each alternative path simply allows for
too many possibilities.Although no one has discovered an efficient method that can find
the shortest path for a given set of points, many methods have been studied that seem to
work well in most situations. These methods use one or more heuristics. Recall that
heuristics are rules of thumb that tend to give good enough answers in most situations
but never guarantee a best solution. One popular approach uses the nearest neighbor
heuristic, which tells us to always travel the route of the next-closest city. By applying
the nearest neighbor heuristic, we limit the total number of tour possibilities while still
having a reasonable chance of achieving an acceptable solution. Let’s look at an example.
Figure 12.1 displays four cities that are directly connected to each other.
Individual paths are labeled with a cost for traveling from one city to the connecting
city.That is, a 10-unit cost is seen when traveling directly between cities A and C.The
route from city A to C through city B shows a total cost of 8 units. Let’s assume city A
as the starting point and apply the nearest neighbor heuristic.The path traveled by the
nearest neighbor algorithm is A-B-C-D-A, which gives a unit cost of 32. In this case,
the path determined by the heuristic is one of two best paths. The alternative best
path, which simply reverses the ordering of the first path, is A-D-C-B-A.To see that
the nearest neighbor heuristic does not always give a best answer, let’s apply the
method to the graph in Figure 12.2. Starting at A, the best route is A-B-C-D-A with
a total cost of 33 units.The nearest neighbor algorithm travels the route A-C-D-B-A
with a total cost of 35 units.
The method employed by the nearest neighbor heuristic of defining a problem as
a search through a set of limited possibilities is central to AI problem solving.The idea
•
CD-6 Chapter 12 Rule-Based Systems
for solving a difficult problem is to find a solution in a moderate amount of time that
works within the boundaries of the constraints defining the problem. For the school
district busing problem, our chances of finding a minimal cost solution are nearly impossible.
However, a solution that lies within the district’s fixed budget is a likely option.
The next section expands on the notion of heuristic search.
12.2 Problem Solving as a State Space Search
Two issues fundamental to AI problem solving are how to represent knowledge and
how to reason with this knowledge to find problem solutions. It is often helpful to
think of the process of finding a solution to a difficult problem as a search through a
state space of possibilities. A state space is defined as the set of all possible problem
states.A reasoning strategy that allows us to search through this state space in a reasonable
amount of time will ultimately lead to a problem solution.The components of a
state space search are as follows:
• One or more initial states
• One or more goal states
• Problem Solving as a State Space Search CD-7
Figure 12.1 • Starting at city A, the nearest neighbor heuristic finds a
12.2
shortest path (A-B-C-D-A)
A
C
B
D
5
3
15
12
9
10
• A set of rules that move the search from the current state to a new state
• A control strategy that examines the current state and determines the next rule to
apply
The basic solution strategy is quite simple.The first step is to describe how to
represent an individual state of the state space. After this, one of the initial states is
chosen as a starting position. Next, the control strategy chooses a rule to apply.The result
of applying the rule leads the search to a new state. If the new state is a solution to
the problem, the search terminates. If the new state does not represent a solution, the
process of choosing and applying a new rule is repeated until a solution is found or
the search is terminated.An example illustrates the process.
The Water Jug Problem
The water jug problem offers a clear example for us to follow the process of state
space search.A description of the water jug problem follows.
Suppose we have two jugs.One jug is capable of holding 3 gallons of water.A second
jug can hold up to 4 gallons of water.There are no measurement lines on either jug.
CD-8 Chapter 12
•
• Rule-Based Systems
Figure 12.2 The nearest neighbor heuristic chooses the path A-C-D-B-A
even though A-B-C-D-A is a better choice
C
B
A
D
15
5
6
12
10
4
Therefore we can never determine the exact amount of water in either jug. However, by
looking in either jug, we can determine if the jug is empty, full, or contains some water.
We are to use some combination of the rules displayed in Table 12.2 to place 2 gallons
of water into the 4-gallon jug.
Before we look at a solution, you may wish to solve the problem on your own. If
you choose to come up with your own solution,we encourage you to record your answer
so you can compare your result with ours. In any case, to find a solution, we must
first determine a representation for an individual problem state.As a representation need
only keep track of the amount of water in each jug, an obvious choice is to display each
state as an ordered pair (x,y) where x is the amount of water in the 4-gallon jug and y is
the amount of water in the 3-gallon jug.We can assume an initial problem state of (0,0)
without any loss of generality. A goal state is any state showing a value of 2 for x.
Therefore we represent the goal as (2,?) where ? indicates a “don’t care” condition.
Next,we need a set of rules to move from one state to a new state.Table 12.2 offers
a listing of eight such rules. The column labeled Required Conditions shows the
preconditions to be met for each rule to be applied.The column labeled Action tells us
• Problem Solving as a State Space Search CD-9
Table 12.2 • Rules for the Water Jug Problem
12.2
Required Resultant
Action Conditions State
1. Fill the 4-gallon jug. The 4-gallon jug is not full. (4,y)
2. Fill the 3-gallon jug. The 3-gallon jug is not full. (x,3)
3. Empty the 4-gallon jug The 4-gallon jug is not empty. (0,y)
onto the ground.
4. Empty the 3-gallon jug The 3-gallon jug is not empty. (x,0)
onto the ground.
5. Pour water from the
3-gallon jug into the 4-gallon The total amount of water in both jugs (4,y – (4 – x))
jug until the 4-gallon jug is full. is > = 4 and the 3-gallon jug is not empty.
6. Pour water from the 4-gallon The total amount of water in both jugs (x – (3 – y),3)
jug into the 3-gallon jug until is > = 3 and the 4-gallon jug is not empty.
the 3-gallon jug is full.
7. Pour all the water from the The total amount of water in both jugs (x + y,0)
3-gallon jug into the 4-gallon jug. is <=4 and the 3-gallon jug is not empty.
8. Pour all the water from the The total amount of water in both jugs (0, x + y)
4-gallon jug into the 3-gallon jug. is <=3 and the 4-gallon jug is not empty.
how to apply each rule.The column displaying Resultant State computes the new state
following the application of each rule.
The final ingredient for completing the problem description is a control strategy.
In general, the search through a state space can be either depth-first or breadth-first.A
left-to-right depth-first search moves vertically deeper into the search space before
expanding the search in a horizontal manner.A breadth-first search moves horizontally
through the search space before extending the search to the next level. Figure 12.3
applies each technique to a hypothetical state space having node A as the initial state.
A depth-first or breadth-first search can be applied using either forward or backward
chaining.To apply forward chaining, we look at the antecedent (required conditions
in Table 12.2) of a rule to determine if the conditions of the rule have been
met by what is currently known. If the conditions have been met, the rule fires, and
the consequent of the rule is added to the current state of knowledge.The water jug
problem is best solved using forward chaining. We will describe backward chaining
in the next section.
CD-10 Chapter 12
• Rule-Based Systems
•
Figure 12.3 A hypothetical state space representation: A depth-first
search through the space follows A-B-E-F-C-G-I-J-H-D
whereas a breadth-first search is given by the path
A-B-C-D-E-F-G-H-I-J
A
CBD
JI
HGFE
To apply forward chaining, we examine the Required Conditions column of Table
12.2 and see that either rule 1 or rule 2 can be applied to the start state of (0,0).The result
of applying rule 1 to the initial state gives the state node (4,0). Applying rule 2 to
the initial state results in a new state node of (0,3).The arrow labeled step 1 in Figure
12.4 points to the search space after the application of both rules to the initial state.
We must now decide whether the search will be depth-first or breadth-first. For
our problem, a depth-first search tells us to examine nodes that follow the path given
by (4,0) until a solution is found or a dead end is encountered.With a breadth-first
search, we also generate the immediate children of node (4,0). However, if a solution
is not given by a child of (4,0),we then move across the search space and generate the
children of (0,3).We will solve the problem with a depth-first approach.
Table 12.2 tells us that rules 2 and 6 can be applied to the state given by (4,0).
Rule 3 is also applicable; however, because its application returns us to the initial state,
the rule is not considered.To proceed, rule 2 generates (4,3) and rule 6 gives us state
node (1,3).The arrow labeled step 2 in Figure 12.4 points to the updated search space.
As neither node is a goal state, we proceed to investigate rules applicable to the state
(4,3). Rule 3 or rule 4 can be applied to state node (4,3). However, the application of
either rule leads us to a state that is already in the state space.Therefore the state node
(4,3) represents a dead-end in the search space.We abandon this state and attempt to
generate new states using state node (1,3).
Rule 4 is the only rule applicable to state node (1,3).Applying this rule adds (1,0)
to the next level of the search space.The new state space is shown in Figure 12.4 to
the right of the arrow labeled step 3. Continuing from state node (1,0),Table 12.2 tells
• Problem Solving as a State Space Search CD-11
Figure 12.4 • A depth-first solution for the water jug problem
12.2
(0,0)
(0,3) (4,0)
Rule 1 Rule 2
(0,0)
Initial State
Step 1 (0,0)
(0,3) (4,0)
Rule 1 Rule 2
Step 2
(1,3) (4,3)
Rule 2 Rule 6
(0,0)
(0,3) (4,0)
Rule 1 Rule 2
Step 3
(1,3) (4,3)
Rule 2 Rule 6
(1,0)
Rule 4
(0,0)
(0,3) (4,0)
Rule 1 Rule 2
Step 4, 5 & 6
(1,3) (4,3)
Rule 2 Rule 6
(1,0)
Rule 4
(0,1)
Rule 8
(4,1)
Rule 1
(2,3)
Rule 6
us that the only valid rule is rule 8, which instructs us to pour all of the water from
the 4-gallon jug into the 3-gallon jug.The new state node (0,1) is added to the state
space. Once again, checking the rules,we see that rule 1 can be applied to the current
state.The resultant state is (4,1).The single rule applicable to this new state is rule 6,
which instructs us to pour water from the 4-gallon jug into the 3-gallon jug until the
3-gallon jug is full. The resulting state is (2,3), which represents a goal state, thereby
giving us a solution to the problem.The arrow labeled steps 4, 5, & 6 in Figure 12.4
shows the final search space leading to a solution.
Achieving the goal is useless without knowing how the goal was determined.
Therefore our solution must be stated as a sequence of steps.We can write the solution
by following the path in the search space leading to the goal. Here are the steps
for the solution to the water jug problem:
1. Apply Rule 1: Fill the 4-gallon jug.
2. Apply Rule 6: Pour water from the 4-gallon jug into the 3-gallon jug until
the 3-gallon jug is full.
3. Apply Rule 4: Empty the 3-gallon jug onto the ground.
4. Apply Rule 8: Pour all the water from the 4-gallon jug into the 3-gallon jug.
5. Apply Rule 1: Fill the 4-gallon jug.
6. Apply Rule 6: Pour water from the 4-gallon jug into the 3-gallon jug until
the 3-gallon jug is full.
Applying a breadth-first search to the rules in Table 12.2 reveals a second solution
to the water jug problem. Both solutions can be seen in Figure 12.5, which displays
the complete state space generated by the rules defining the water jug problem.
Although simple, the water jug problem gives us insight into how problems are
solved with a state space search strategy. More difficult problems require sophisticated
heuristics to limit the size of the search space. As is the case with the nearest neighbor
heuristic described earlier, most time-saving heuristics provide a good enough answer,
however, they rarely offer a best result.
Backward Chaining
Forward chaining is appropriate when we wish to determine all possible outcomes
from a set of facts and rules.With forward chaining, we apply rules whose antecedent
conditions match a set of known facts. The process continues until a desired goal is
achieved or additional information cannot be added to what is currently known to be
true. However, many problems are best solved by starting at a goal state and working
•
CD-12 Chapter 12 Rule-Based Systems
backwards. One such problem is finding a path through a maze.As a second example,
when diagnosing a patient, a medical doctor will frequently use a combination of
both forward and backward reasoning. As a first step, the patient offers an initial set of
symptoms.The doctor then uses the symptoms to reason in a forward fashion to determine
a set of one or more possible diseases (goals). The goals are each tested in
some order by continuing to examine and question the patient until a diagnosis can
be made. If a diagnosis cannot be determined, the doctor recommends a series of laboratory
tests. Forward reasoning with the results of the tests leads to a new set of possible
diseases, and the process is repeated.
In general, any goal that can be stated as a question is a likely candidate for backward
chaining. Here is a list of additional problems appropriate for a backward-reasoning
approach:
1. Is a specific credit card holder likely to accept the life insurance promotional
offering contained with their next billing statement?
2. Which data mining tool should I apply to a given dataset?
3. Is the current loan applicant likely to default on a new car loan?
• Problem Solving as a State Space Search CD-13
Figure 12.5 • The complete state space for the water jug problem
12.2
(0,0)
(0,3) (4,0)
Rule 1 Rule 2
(1,3) (4,3)
Rule 2 Rule 6
(1,0)
Rule 4
(0,1)
Rule 8
(4,1)
Rule 1
(2,3)
Rule 6
(3,0)
Rule 7
(3,3)
Rule 2
(4,2)
Rule 5
(0,2)
Rule 3
(2,0)
Rule 7
4. What delivery method should I use to send my package to Seattle?
5. My mother-in-law is currently living in our home. Can I claim her as a
dependent?
Backward chaining attempts to satisfy a goal by examining the consequent of individual
rules.The process of backward chaining creates a tree structure known as a goal
tree. The top-level node of the goal tree is a goal state.A goal tree represents a partial
state space and is useful in that only those states pertinent to a solution are examined.
Backward chaining is best illustrated with an example. Let’s see how backward
chaining works with the hypothetical production rules displayed in Table 12.3.We assume
that facts a and c are initially true and attempt to prove that g is true.The procedure
is as follows:
1. The first step is to check the set of facts to determine if g is initially true. As g
is not currently true,we create a node representing the goal to be proved.The
goal node is shown in the upper-left portion of Figure 12.6.
2. Next, we examine the rules in Table 12.3 in an attempt to find a rule whose
consequent states the goal is true. Rule 3 tells us that if y and x are true, then
g is true.Therefore the antecedent conditions of rule 3 are added to the goal
tree.The updated goal tree is shown to the right of the arrow labeled rule 3 in
Figure 12.6.The rule’s and conditional is specified by the arc passing through
both antecedent conditions.
3. Proceeding across the goal tree in a left-to-right fashion,we ask if condition y
is currently part of our list of known facts. As it is not, y becomes a new goal
to be satisfied.We examine Table 12.3 to find a rule whose consequent states
that y is true. Rule 4, which states that w must be true for y to be true, is such
CD-14 Chapter 12
Table 12.3
• Rule-Based Systems
• A Hypothetical Set of Production Rules
Production Rules Known Facts
1. If a then b a, c
2. If c and d then x
3. If x and y then g
4. If w then y
5. If w then d
6. If b then y
a rule.The antecedent condition of rule 4 is added to the goal tree.The updated
tree is seen by following the arrow labeled rule 4.
4. Next, we must decide if the process of building the goal tree is to proceed
depth-first or breadth-first. A depth-first strategy makes w a new goal to be
pursued. A breath-first strategy looks for other rules having y as their consequent
before proceeding vertically deeper into the goal tree. As backward
chaining is usually implemented as a depth-first procedure,we follow a depthfirst
approach and make w the current goal.
5. Given that w is not a fact,we look for a rule whose then part indicates that w is
true. As no such rule exists, we have encountered a dead end.This requires us
to attempt to satisfy y with another rule.We check the table rules and find that
rule 6 states y is true provided b is true.The new goal tree is shown in Figure
12.6 to the right of the arrow labeled rule 6.
• Problem Solving as a State Space Search CD-15
Figure 12.6 • Creating a goal tree
12.2
g
Rule 3
goal
g
xy
Rule 4
g
xy
w
Rule 6 g
xy
wb
Rule 1
g
xy
wb
a
Rule 2 g
xy
wb
a
Rule 5
dc
g
xy
wb
a
dc
w
6. Next,we check to see if b is currently a fact. It is not. However, rule 1 tells us
that b is true provided a is a true fact.The added information is shown in the
goal tree and is labeled by the arrow rule 1.
7. To continue,we check the list of facts and discover that a is known to be true.
As a is true, we can move up the goal tree.That is, because a is true, we know
we can prove b is true by applying rule 2. Because b can be proven true,we can
also prove y to be true (rule 1).We must still prove x is true to conclude the
initial goal.
8. Rule 2 tells us that x is true if c and d are true.We add this information to the
goal tree.The updated tree is seen to the right of rule 2 in Figure 12.6.
9. As c is a known fact,we attempt to show d is true. Rule 5 tells us that d is true
if w is a true statement.This new information creates the goal tree seen by following
the arrow labeled rule 5.
10. Next, we attempt to prove w is true. As w is not currently known to be true,
we look for a rule whose consequent states w as true. Such a rule does not exist.
Therefore we must try to prove x is true in a new way. Because another
rule stating x as its consequent does not exist,we must attempt to find another
way to prove g is true. Because a second rule stating g as a consequent does
not exist, the process of creating the goal tree terminates, and we conclude
that g cannot be proved true.
The goal tree makes it clear that to prove g we must first find a way to show that
w is true. As w was not initially stated as a known fact, and there are no rules to prove
w as true, it appears that all hope of showing g as true is lost. However, a third option
frequently made available with rule-based systems is to ask the user about the truth
value of a particular fact. The technique is appropriate when the questions asked of
the user are specific to the user’s particular problem. Questions about a user’s age, income
range, and gender are representative of the type of questions falling into this category.
We will examine rule-based systems in more detail in Section 12.4. First, we
describe the connection between data mining and state space search.
Data Mining and State Space Search
A state space is the set of all possible problem states for a given application.We can
view the process of building a data mining model as a state space search.A goal state is
defined by a predetermined criteria for success such as a minimum value for a performance
measure. At a minimum, an initial problem state description will contain the
following information:
CD-16 Chapter 12
• Rule-Based Systems
• A learning strategy and a learner technique for implementing the strategy
• A set of training and test data as appropriate
• A selection of input and/or output attributes
• Initial parameter settings for the chosen data mining tool
• The model created by applying the learner technique to the training data
• Predictiveness and predictability scores for all categorical input attributes and attribute
significance scores for all numeric input attributes
• A measure of performance (e.g., test set accuracy) specifying a goodness score for
the current state
As there is neither sufficient time nor resources to generate all possible models,
we rely on heuristic rules to guide us in choosing a model to best represent the data in
question. Here are five rules to help guide a state space search through the set of all
possible data mining models. In each case, the application of a rule creates a new current
state, which in turn generates a new learner model.
1. IF the learner model was created using backpropagation
AND the rms does not fall below a maximal range
THEN double the number of epochs seen in the current state.
2. IF test set accuracy is below an accepted level of performance
AND training data is limited
THEN add the test data to the training set data
AND test future models using cross-validation.
3. IF test set accuracy is below an accepted level of performance
THEN remove all attributes of little predictive value.
4. IF numerical attribute a is suspected to be predictive of class membership
AND attribute a turns out to be of little predictive value
THEN replace the values of attribute a with their base 2 logarithmic equivalents.
5. IF the learner model is ESX
AND learning is unsupervised
AND the current learner model shows more than five clusters
THEN lower the value of the similarity parameter by a value of 10.
Hundreds of heuristic rules such as those presented here can be stated.As we develop
our data mining skills,we create our own favorite set of model building rules as
12.2
• Problem Solving as a State Space Search CD-17
well as our own set of sequences for applying these rules. Fortunately, the nature of
the data mining tools we use are quite forgiving in that they allow us to find acceptable
models when we follow less than optimal paths within the state space of all possible
models.
12.3 Expert Systems
Expert systems initially came to light in the early 1970s with the development of
three programs each containing the knowledge of one or more human experts.The
first program, DENDRAL (Buchanan et al., 1969), was designed to imply molecular
structure from information provided by a mass spectrometer.The system incorporated
heuristic rules used by chemists to limit the space of possible solutions. The second
success was a system called MYCIN (Shortliffe, 1976). MYCIN contained approximately
450 rules that were used to diagnose infectious blood diseases. MYCIN differed
from DENDRAL in that it associated a measure of uncertainty with each rule.
The third system, PROSPECTOR (Duda et al., 1979), incorporated a probabilistic
approach to suggest exploratory drilling sites for several types of minerals.
Building an expert system requires the knowledge of at least one expert.
Unbounded problems that do not have agreed-upon solutions—such as how to handle
the national debt or how to choose stocks to buy—are not viable problems for expert
system application. Today, expert systems are used to help make decisions and
decrease the amount of time spent on problem-solving tasks. When appropriate, an
expert system can speed up the problem-solving process by a factor of 10.The following
is a small sampling of areas where expert systems have been successfully applied:
• Should an individual or a company be given a loan?
• What is an appropriate listing of homes to be viewed by a particular family?
• What are the insurance needs (life, health,home, car, etc.) for a particular individual?
• Can a life insurance policy be offered to an applicant? If so, how much insurance
can be obtained by the applicant and at what premium?
• How can individual hotel rooms be allocated so as to maximize profit?
• What are the possible causes of a computer hardware failure?
• How should weapons be allocated to enemy targets?
• How many round-trip flights should be scheduled over the Christmas holiday
between Chicago and Boston?
CD-18 Chapter 12
• Rule-Based Systems
• What should be the cost and payment schedules for those responsible for polluting
the environment?
• When should regular maintenance be performed on my automobile?
You may notice that some of these application areas coincide with problems that
can be solved through a data mining approach. In one of the end-of-chapter exercises
you are asked to review this list and determine which problems are also candidates for
a data mining solution.
The Components of an Expert System
Figure 12.7 displays the three major components of an expert system. The knowledge
base contains the domain knowledge about the problem at hand.The domain
knowledge is usually declarative and is often stored as a set of facts and production
rules. Some expert systems store declarative knowledge as objects or network structures.
The knowledge base may also contain metaknowledge, which represents
knowledge about the knowledge contained in the expert system.
The inference engine determines which parts of the knowledge base will be
applied to a specific problem.When the domain knowledge is stored as a set of production
rules, the inference engine uses forward or backward chaining to determine
which rules apply to the current situation. Regardless of the search technique, the inference
engine must have a conflict resolution strategy when more than one rule
applies to a specific circumstance. One approach to conflict resolution is to use a set of
metarules to determine the order of rule application. Once a rule has been chosen,
the inference engine executes the rule and adds the knowledge contained in the rule
consequent to the current state.
The third major expert system component is the user interface. The user interface
allows the user to interact with the expert system.The interface queries the user
for information about a specific problem and offers explanations about the results of
each reasoning process.
Separating Knowledge and Reasoning
Figure 12.7 shows that—unlike conventional programs—an expert system separates
the knowledge contained within the program and the reasoning mechanism used to
apply this knowledge.This is a general strategy used by all rule-based systems.You saw
this separation of knowledge and control in our description of the water jug problem.
The following are three reasons why this strategy is particularly appealing:
12.3
• Expert Systems CD-19
1. Knowledge can be easily added to or deleted from the knowledge base without
disturbing the code contained in the user interface or the inference engine.
2. Several reasoning strategies can be applied to the same knowledge base.
3. Once an expert system has been built, additional expert systems can be built
by simply replacing the knowledge contained in the existing system with the
knowledge for each new application.
An expert system shell is a tool that implements this third feature. An expert
system shell is essentially an expert system with its domain knowledge removed. A
shell contains a user interface and an inference engine. When the user interface and
the inference engine are linked with a particular knowledge base, an expert system results.
Shells are used for quick and efficient expert system development.
The single most important criteria to consider in choosing a shell is that the shell
must adapt well to the problem at hand.Trying to fit an expert system shell to a particular
problem is a serious mistake and will likely result in project failure.
Sophisticated expert system shells allow for several knowledge representation and reasoning
process options.
Several public domain expert system shells are readily available for download from
various Web sites.Two expert system shells are of particular interest. The Java Expert
System Shell (Jess) is freely available from Sandia National Laboratories at
http://herzberg.ca.sandia.gov/jess. Jess contains many of the features seen with expensive
expert system shells.The ES Expert System shell, described in the October/November
1990 issue of BYTE magazine, is available for free download at ftp.uu.net:/pub/ai/
expert-sys/ as summers.tar.Z. ES is easy to use, implements both forward and backward
chaining, and has the capability of reasoning about uncertain information.
CD-20 Chapter 12
Figure 12.7
• Rule-Based Systems
• An expert system architecture
User
Interface
Inference
Engine
Knowledge
Base
User
Developing an Expert System
Building a large-scale expert system is a tedious and difficult process. Figure 12.8
shows the task can be divided into five major subproblems. The arrows indicate that
we can revisit each subproblem as necessary. Let’s take a closer look at each step of the
expert system development cycle.
Step 1: Problem Definition
Problem definition is the first major phase of the expert system building process.
Once a problem domain has been identified, a specific task within this domain is
clearly defined. For example, one possible problem domain is disease diagnosis.A specific
task within this domain is the diagnosis of infectious blood diseases.
Step 2: Knowledge Acquisition
During knowledge acquisition, a knowledge engineer interacts with one or more
human experts in an attempt to capture their knowledge. Knowledge acquisition is
the most difficult part of the expert system building process.This is because much of
an expert’s knowledge is in a compiled form. Compiled knowledge is any knowledge
not easily extracted and interpreted.
Knowledge acquisition techniques can be divided into two major categories.
Bottom-up methods are inductive in that the knowledge engineer observes experts
as they solve real or hypothetical problems within the application domain.The observation
involves recording and making generalizations about the details of the problemsolving process. After this, the experts and the knowledge engineer meet several
times to determine if the important elements of the problem-solving process have
• Expert Systems CD-21
Figure 12.8 • The expert system development cycle
12.3
1 Problem
Definition
Knowledge
Representation
Knowledge
Acquisition
Knowledge
Programming
Testing
&
Evaluation
2
4
3
been correctly generalized.Top-down methods are deductive and include conducting
structured interviews with one or more experts as well as having experts play the
role of a teacher or learning partner.
Step 3: Knowledge Representation
It is the responsibility of the knowledge engineer to represent the knowledge obtained
from the knowledge acquisition procedure in a format suitable for processing by a computer
program. Although several knowledge representation techniques exist, the most
common method is to represent an expert’s knowledge as a set of production rules.
Step 4: Knowledge Programming
Knowledge programming is the process of encoding the knowledge extracted
from an expert into a knowledge base. In order to perform this task, the knowledge
engineer must decide on what knowledge engineering tool he or she will use. In a
majority of cases, a sophisticated expert system shell—which allows for a multitude of
knowledge representation and reasoning strategies—is sufficient for implementing the
system. In rare cases, an expert system will need to be developed from scratch by writing
a computer program.The main advantage of writing a computer program to implement
the system is flexibility. The main disadvantages are length of development
time and increased development cost.
Step 5: Testing and Evaluation
An expert system must perform as expected and must meet the cosmetic demands of
those individuals that will be using the system on a regular basis. The following are
three common expert system evaluation techniques:
• Verification asks if the expert system solves problems by using the same reasoning
process as that used by the expert(s) whose knowledge is contained within the
system.To test this, the expert system is given a set of problems to be solved. For
each problem, the analytical procedure of the expert system is matched against
the reasoning process used by the human expert(s).
• Validation seeks to determine if the expert system performs within an accepted
level of accuracy.A validation asks if the expert system is able to correctly solve
the same problems as those solved by the expert(s) used to build the system.
Unlike a system verification, where the focus is on the internal workings of the
system, the focus of an expert system validation is external.
CD-22 Chapter 12
• Rule-Based Systems
• Reliability asks if the expert system is able to perform with an accepted level of
consistency. Candidates for reliability testing include the user interface, the
knowledge base, and the inference engine.The ability of an expert system to handle
uncertain information is also measured during a reliability testing procedure.
Size, cost, and development time for a specific application varies with the task and
the building tools available.With large-scale applications, a great deal of caution must
be exercised to ensure that the cost of a project will not exceed the potential profits
realized by the developed software.The following is an important rule to follow when
building a large-scale expert system.
Initially model a scaled-down version of the expert system being developed.The scaleddown
model is then expanded and modified as necessary until the final product results.
This technique is referred to as rapid prototyping.
With rapid prototyping, major problems such as poor task definition, inappropriate
inference methods, and the like can be immediately dealt with. Also, demonstration
versions can be used to show that the system will be worthwhile once it has been
completed.
12.4 Structuring a Rule-Based System
Many of our problem-solving skills can be expressed in the form of rules. Recipe
books offer step-by-step procedures to help us cook tasty meals. Nearly all of our
mathematical skills are based on rules.We use rules to decide whether we should send
our packages by way of regular postal service, Federal Express, or UPS.We even have
rules to help us when we engage in social conversation.
Experts are also able to express their problem-solving knowledge in the form of
rules. Because of this, nearly all of today’s expert systems have at least some rule-based
components. In this section we provide two examples to illustrate how rules and rulebased
techniques are used for representing and reasoning with knowledge. However,
before we begin,we find it instructive to make a distinction between the terms expert
system and knowledge-based system.
Strictly speaking, an expert system must contain the knowledge of one or more
domain experts.However, many useful rule-based systems have been built from knowledge
extracted from textbooks and manuals. Such systems are often incorrectly referred
to as expert systems because they are able to provide useful information from responses
to simple questions.Although these programs are not expert systems, they do fall under
the more general category of knowledge-based systems. Knowledge-based systems
12.4
• Structuring a Rule-Based System CD-23
include all rule-based systems that have been built from knowledge extracted from one
or more experts, as well as those systems built with the help of other sources of information.
We avoid confusion by using the term rule-based system to include any system
that stores its knowledge in the form of a set of production rules. Let’s look at our
first example!
Example 1: Form 1040 Tax Dependency
Our lives as well as the laws used to determine our tax obligations change each year.
For this reason, most of us who are brave enough to fill out our own tax forms need at
least some regular assistance. Help with various tax documents comes in many forms.
One option is to telephone the IRS, leave a message, and wait for a return call. An
IRS office in close proximity offers a second possibility.An appealing option is to buy
a software package able to walk us through our tax season form-filing woes. Finally,
we may be able to find answers to our tax questions on the Internet. Let’s explore this
final option in more detail.
Suppose the federal government wishes to significantly decrease the number of
telephone calls it receives during tax season. Their plan is to build a Web site able to
help the average taxpayer with their tax questions. A Web site supported by a rulebased
system is an excellent implementation choice for several reasons. First, most tax
laws are given as rules. Second, tax booklets contain clear step-by-step instructions for
filling out tax forms.Third, the separation of knowledge and reasoning in a rule-based
system allows the rule base to be easily modified whenever the tax laws change.
Finally, requested tax assistance is invariably in the form of one or more questions
readily structured for a backward-reasoning approach.
As a first step, a menu listing all possible categories of available help items must be
designed. For our example, we concentrate on but one of the many possible tax assistance
categories. Specifically, we wish to design a rule base to help taxpayers determine
if a person they are supporting in some way qualifies as a dependent. The
knowledge we need to construct the rule base is contained in the text of a federal
1040 document.Therefore the knowledge acquisition process amounts to us examining
and interpreting the contents of the textual description given for the dependency
criteria.
The year 2001 federal tax form 1040 shows us that five tests must be met for a
person to qualify as a dependent. Here is a list of the five tests together with specific
details about the criteria for passing each test:
1. Relationship test. The person must either be your relative or have lived in
your home as a family member all year. If the person is not your relative, the
relationship must not violate local law.
•
CD-24 Chapter 12 Rule-Based Systems
2. Joint return test. If the person is married, he or she cannot file a joint return.
However, the person can file a joint return if the return is filed only as a
claim for refund and no tax liability would exist for either spouse if they had
filed separate returns.
3. Citizen or resident test. The person must be a U.S. citizen or resident alien
or a resident of Canada or Mexico.There is an exception for certain adopted
children.
4. Income test. The person’s gross income must be less than $2,900. However,
your child’s gross income can be $2,900 or more if he or she was either under
age 19 at the end of 2001 or under age 24 at the end of 2001 and was a student.
5. Support test. You must provide over half of the person’s total support in
2001. However, there are two exceptions to this test: one for children of divorced
or separated parents and one for persons supported by two or more
taxpayers.
The knowledge required to construct the rule base is contained within the description
of the five tests. Our next problem is to represent this knowledge as a set of
production rules.This potentially difficult task is simplified if we first create a goal tree
for the information contained within the descriptions for each test. The goal tree is
useful in that it allows us to clearly visualize the problem and provides a base from
which to construct a solution. Figure 12.9 displays the first level of the goal tree representing
a solution for the dependency problem. The arc indicates that all five tests
must be satisfied for a person to be claimed as a dependent.
Our next step is to continue developing the goal tree by creating a subgoal structure
for each condition shown at the first tree level.The process terminates when the goal tree
contains leaf nodes whose truth value can be easily determined by a potential user of the
• Structuring a Rule-Based System CD-25
Figure 12.9 • A first-level goal tree for dependency exemption
12.4
Person Is a
Dependent
Support Test Is
Satisfied
Relationship
Test Is
Satisfied
Joint Return
Test Is
Satisfied
Income Test Is
Satisfied
Citizen or
Resident Test
Is Satisfied
system.We discuss the structure of the goal tree for the citizen or resident test (shown in
Figure 12.10) and leave construction of the remainder of the goal tree as an exercise.
Upon revisiting the citizen or resident test, we see that it is natural to distinguish
candidate dependents who are residents of the United States from those who are nonresidents.
The second level of the goal tree in Figure 12.10 shows the distinction by
creating two nodes, one of which states that the resident test is satisfied and a second indicating
that the nonresident test is satisfied.We can further differentiate residents into
the subcategories United States citizen and resident alien. Likewise, we can distinguish
nonresidents who are adopted children from those individuals who reside in Canada
or Mexico.The tests to determine if someone is a citizen of the United States or a resident
of Canada or Mexico are shown as leaf nodes within the goal tree.However, to
complete the paths for resident alien is satisfied and adopted child test is satisfied requires
additional research. IRS publication 519 holds the needed information.
IRS publication 519 states that a person qualifies as a resident alien if the person
holds a green card or satisfies the substantial presence test. The substantial presence test as
given in the publication is as follows.
CD-26 Chapter 12
Figure 12.10
• Rule-Based Systems
• A goal tree for the citizen/resident test
Rule 1
Person Is a
Dependent
Support Test Is
Satisfied
Relationship
Test Is
Satisfied
Joint Return
Test Is
Satisfied
Income Test Is
Satisfied
Citizen or
Resident Test
Is Satisfied
Green Card
Test Is
Satisfied
Presence Test
Is Satisfied
Resident Alien
Is Satisfied
C/M Test Is
Satisfied
Adopted Child
Test Is
Satisfied
United States
Citizen
Non-Resident
Test Is
Satisfied
Resident Test
Is Satisfied
183 Days
During Last 3
Years
31 Days
Current Year
Resident of
Mexico
Resident of
Canada
Child Is
Adopted
Foreign Test Is
Satisfied
Lived with
Taxpayer
Entire Year
Lived in
Foreign
Country
Citizen/Alien
Is Satisfied
United States
Resident
Not United
States
Resident
Country/Child
Is Satisfied
Rule 2
Rule 3
Rule 11
Rule 10 Rule 8
Rule 6
Rule 4
Rule 9
Rule 7
Rule 5
The candidate dependent must have lived in the United States for at least 31 days of
the current year and must have lived in the United States for a sum total of at least
183 days over the past three years.A maximum of 1/3 of the 183 days can come from
the year prior to the current year.Also, a maximum of 1/6 of the 183 days can be allocated
from two years prior to the current year.
Please note, for a final implementation, the 183-day requirement necessitates further
subdivision. Finally, an adopted nonresident child who lived in a foreign country
with the taxpayer also satisfies the citizen or resident test.
Once the goal tree is complete, mapping the tree to a set of production rules is
straightforward.We can either begin at the top or the bottom of the tree structure.
Let’s start at the top.The actual structure of the rules depends on the choice of a rulebased
shell.We present the rules in a generic format. Here is the top-level rule represented
by the goal tree:
Rule 1: Dependency Test
IF relationship test is satisfied
AND joint return test is satisfied
AND citizen or resident test is satisfied
AND income test is satisfied
AND support test is satisfied
THEN person is dependent
Continuing to the next tree level with rules for the citizen or resident test we
have:
Rule 2: Citizen or Resident Test
IF resident test is satisfied
OR nonresident test is satisfied
THEN citizen or resident test is satisfied
As an alternative,we could write this rule as two separate rules, each with one antecedent
condition. Specifically,
Rule 2a: Citizen or Resident Test 1
IF resident test is satisfied
THEN citizen or resident test is satisfied
Rule 2b: Citizen or Resident Test 2
IF nonresident test is satisfied
THEN citizen or resident test is satisfied
Continuing from left to right, we complete the rule set. The list of remaining
rules follows.
•
Structuring a Rule-Based System CD-27
Rule 3: Resident Test Rule 4: Nonresident Test
IF United States resident IF not United States resident
AND citizen/alien is satisfied AND country/child is satisfied
THEN resident test is satisfied THEN nonresident test is satisfied
Rule 5: Citizen/Alien Test Rule 6: Country/Child Test
IF United States citizen IF C/M test is satisfied
OR resident alien is satisfied OR adopted child test is satisfied
THEN citizen/alien is satisfied THEN country/child is satisfied
Rule 7: Resident Alien Test Rule 8: C/M Test
IF green card test is satisfied IF resident of Canada
OR presence test is satisfied OR resident of Mexico
THEN resident alien is satisfied THEN C/M test is satisfied
Rule 9: Presence Test Rule 10: Adopted Child Test
IF 31 days current year IF child is adopted
AND 183 days during last 3 years AND foreign test is satisfied
THEN presence test is satisfied THEN adopted child test is satisfied
Rule 11: Foreign Country Test
IF lived in foreign country
AND lived with taxpayer entire year
THEN foreign test is satisfied
12.4
The rules may be applied with forward or backward chaining. However, the natural
inference strategy is backward chaining with the goal person is a dependent. If backward
chaining is applied, rule inference to decide if the citizen or resident test is satisfied
begins by asking the user one of 11 questions. The questions are defined by the antecedent
conditions seen in the rules linked to the leaf nodes of the tree. For a final
implementation, text statements clearly explaining each question are associated with
the antecedent conditions of each rule.
The questions defined by the antecedent conditions can be asked in any order.
However, questions about candidate dependents who are residents of the United
States are more likely than questions about nonresidents.Therefore pursuing the conditions
associated with rule 3 before the conditions seen with rule 4 is appropriate.
Because of this, a conflict resolution strategy to determine a best rule ordering is necessary.
Finally, depending on the rule-based tool, it may be necessary to write additional
rules to indicate negations of satisfied conditions. Here is a top-level rule stating
the candidate has failed at least one of the five tests and therefore does not qualify as a
dependent.
•
CD-28 Chapter 12 Rule-Based Systems
Rule 12: Dependency Failure
IF relationship test is not satisfied
OR joint return test is not satisfied
OR citizen or resident test is not satisfied
OR income test is not satisfied
OR support test is not satisfied
THEN person is not a dependent
The approach we have taken to solve the dependency problem is a top-down
strategy because we started by building a tree containing the most general conditions
for satisfying the goal.We continued to break down each subgoal into a simpler
format until we created a structure containing leaf nodes with questions to be
answered by the naïve user.The process was straightforward in that the information
we needed to build the rule base had been designed and written for us. Our task
was simply that of restructuring the information in a format amenable for knowledge
programming. Our next example also uses a top-down strategy but is more
difficult because it requires us to first interpret unstructured knowledge before we
build the goal tree.
Example 2: Choosing a Data Mining Technique
As we have seen, each data mining method has its own list of advantages and disadvantages.
For example, backpropagation neural networks are better at handling
datasets containing noise but are poor at explaining their behavior. For our second
example, we wish to incorporate knowledge about the individual data mining
methods into a rule-based system.The rule base will help us determine which data
mining technique is appropriate in any given situation. A description of the problem
follows.
Given a set of data containing attributes and values to be mined together with information
about the nature of the data and the problem to be solved, develop a rule-based
system to choose an appropriate data mining technique.
The task is to build a rule-based system to determine which data mining technique
to choose for a specific application. Unlike the previous example, we do not
have a single document clearly describing the problem to be solved.Therefore a major
portion of the work falls under the category of knowledge acquisition. One knowledge
acquisition strategy is to build the system based on knowledge gathered from this
text as well as other textbooks, journals, and magazines.A more appealing approach is
•
Structuring a Rule-Based System CD-29
to interview a data mining expert. Here is a paraphrase of a partial interaction between
a knowledge engineer and a data mining expert who has been asked about the
circumstances under which she decides to select a decision tree model to help solve a
particular problem.
KE: “What are the major factors that lead you to consider a decision tree
building tool as a best choice for a particular problem?”
E:“Well, learning must be supervised and there can be only one output attribute.”
KE:“Are there any other factors that must hold true?”
E:“Oh yes, the output attribute has to be categorical. It can’t be a numeric representation.
Of course, you can define discrete categories for numbers, but this is usually a poor
choice.”
KE:“So you’re saying that learning must be supervised, there must be but one
output attribute, and the output attribute must be categorical for you to consider
a decision tree approach?”
E:“Yes, that’s right!”
KE: “If the conditions you have mentioned hold true, are there any other
techniques besides a decision tree approach that could be used?”
E:“Yes, you could use Bayes classifier or a production rule generator.There are others.”
KE:“What makes you want to choose a decision tree over these other methods?”
E: “Decision trees are particularly good at explaining their results.Also, when information
about the distribution of the data is unknown, I tend to prefer a decision tree approach.”
KE:“Why does a lack of knowledge about the data help you choose a decision
tree model?”
E: “Unlike some approaches, the decision tree technique does not require me to make
assumptions about the data distribution.”
KE: “Suppose the necessary conditions for selecting a decision tree are satisfied.
What circumstances would make you prefer an alternative approach?”
E:“Well, I don’t like to use a decision tree when most or all of the data is numeric.The
tree will likely have too many conditional tests for it to be informative. Also, the rules
tend to be difficult to interpret.”
Figure 12.11 shows a partial goal tree for choosing a data mining technique
where the subtree with top-level node Decision Tree Is Selected was developed based on
12.4
•
CD-30 Chapter 12 Rule-Based Systems
the interaction of the knowledge engineer and the expert.The production rule corresponding
to the goal node Input Criteria Are Satisfied is of particular interest:
IF some input attributes are categorical
OR data distribution test is satisfied
THEN input attribute criteria are satisfied
The truth of the first antecedent condition is not easily determined because the
word some leaves us with a degree of uncertainty about whether the condition has
been met. Suppose there are a total of 10 input attributes. Is the condition satisfied if
• Structuring a Rule-Based System CD-31
Figure 12.11 • A partial goal tree for choosing a data mining technique
12.4
Technique
Selected
Association
Rule Strategy
Selected
Supervised
Strategy
Selected
Unsupervised
Strategy
Selected
Supervised
Model
Chosen
Learning
Is Supervised
Backpropogation
Learning
Is Selected
Decision Tree
Is Selected
Linear
Regression
Is Selected
Desirable
Criteria Test Is
Satisfied
Output
Constraints
are Satisfied
Output
Attribute Is
Singular
Output
Attribute Is
Categorical
Input Attribute
Criteria are
Satisfied
Explanation
Is Required
Production
Rules
are Required
Data
Distribution
Test Is
Satisfied
Some Input
Attributes Are
Categorical
...
......
Data
Distribution Is
Not Normal
Data
Distribution Is
Unknown
two input attributes are categorical? If not, how about three input attributes? An obvious
solution is to state a specific criterion to be met for the condition to hold true.
For example,we could rephrase the rule as follows:
IF 30% of all input attributes are categorical
OR data distribution test is satisfied
THEN input attribute criteria are satisfied
We now have a clear criterion against which to test the truth value of the antecedent
condition.To illustrate, suppose our dataset contains 50 input attributes, 15 of
which are categorical. Since 30% of 50 is 15, the antecedent condition is satisfied.
However, if only 14 categorical input attributes exist, the antecedent condition will not
be met, and the data distribution test must be satisfied for the rule to fire. Common sense
tells us that the rule antecedent would prove to be of better use if we could associate a
degree of truth with the categorical attribute condition. Associating measures of certainty
with rule antecedent and consequent conditions is the topic of the next chapter.
12.5 Chapter Summary
Artificial intelligence concentrates on developing solutions to difficult problems that
cannot be solved using traditional computing techniques. The central theme behind
AI problem solving is to define a problem as a search through a space of possible problem
states.
The key to successful problem solving is to represent each state in a way that allows
a reasoning strategy to search through a subset of the state space in a reasonable
amount of time.The ability to limit the search to the relevant states increases the likelihood
of finding a problem solution.
AI problem-solving methods often represent knowledge as a set of production
rules.Two fundamental reasoning strategies used for rule-based problem solving are
forward chaining and backward chaining. Forward chaining is applied when we wish
to determine all possible outcomes from a set of rules and facts. Backward chaining is
appropriate when we have a question to be answered or a specific goal to be satisfied.
Expert systems are computer programs designed to emulate the behavior of one or
more human beings who are expert at solving problems within a specialized area. An
expert system contains a knowledge base, which holds the domain knowledge for a specific
problem, an inference engine, which reasons about stored knowledge, and a user interface,
which allows the user to interact with the system.The user interface queries the
user for information and offers explanations about the results of each reasoning process.
Human experts often state their problem-solving knowledge as a set of rules.
Therefore most expert systems have a rule-based component. A rule-based system is
any system that stores its knowledge as a set of production rules.
•
CD-32 Chapter 12 Rule-Based Systems
Potentially difficult rule-based applications are simplified if we first create a goal
tree.A goal tree is a tree structure whose top-level (root) node is a goal state. Nodes at
each level of the tree represent preconditions to be satisfied to traverse to the next
higher tree level. A goal tree is useful in that it allows us to clearly visualize the problem
and provides a base from which to construct a solution.
As rule-based systems separate knowledge from the strategies used to reason
about the knowledge, they can be easily modified. Also, rule-based systems have the
ability to handle uncertain information and can explain their behavior.
12.6 Key Terms
Backward chaining. This reasoning strategy places emphasis on creating new
knowledge by determining what must be true for a goal to be achieved. If knowledge
is in the form of a set of rules, rule consequents are examined to find a rule
that, when executed, will achieve a goal. If such a rule is found, and the rule antecedent
is determined to be true, the rule is applied. If the antecedent is not immediately
true, the antecedent condition becomes a new goal and the process is
recursively applied.
Bottom-up knowledge acquisition. Any knowledge acquisition method that
makes use of induction.
Breadth-first search. A search strategy that expands a search space horizontally before
moving deeper into the search space.
Compiled knowledge. Knowledge not easily interpreted and/or extracted.
Conflict resolution strategy. A technique for choosing a rule from the set of all applicable
rules.
Depth-first search. A search strategy that moves vertically deeper into the search
space before expanding the search in a horizontal manner.
Expert system. A computer program designed to emulate the behavior of a human
being who is expert at solving problems within a specialized area.
Expert system shell. A structure that contains a user interface and an inference engine.
When linked with a knowledge base, an expert system results.
Exponentially hard problem. Any problem that cannot be solved in a reasonable
amount of time with a traditional algorithmic approach.
Forward chaining. This reasoning strategy emphasizes the creation of new knowledge
from what is known to be true. If the knowledge is in the form of rules, rule
antecedents are examined to determine which rules can be applied.
Goal tree. A tree structure whose top-level (root) node is a goal state. Nodes at each
level of the tree represent preconditions that must be satisfied in order to traverse
the tree to the next higher level.
12.6
• Key Terms CD-33
Inference engine. The component of an expert system that determines and applies
selected parts of the knowledge base to a specific problem.
Intelligent agent. A computational entity capable of autonomously achieving goals
by executing needed actions.
Knowledge base. The component of an expert system that contains the domain
knowledge.
Knowledge-based system. A general term referring to any rule-based system that
has been built from knowledge extracted from one or several sources.
Knowledge engineer. A person who interacts with an expert in an attempt to capture
his or her knowledge.
Knowledge programming. The process of encoding the knowledge extracted from
an expert into a knowledge base.
Metaknowledge. Knowledge about knowledge.
Nearest neighbor heuristic. A heuristic technique used to navigate within a state
space.The technique tells us to always move to the next closest state.
State space. The set of all possible problem states.
Rapid prototyping. The process of building a small-scale model of an application
being developed.The scaled-down model is then expanded upon and modified
as necessary until the final product results.
Reliability. Testing an expert system to determine if it performs within an accepted
level of consistency.
Rule-based system. Any system that stores its knowledge as a set of production rules.
Top-down knowledge acquisition. Any knowledge acquisition method that
makes use of deductive reasoning.
User interface. The component part of an expert system that communicates with
the user.
Validation. Testing an expert system to determine if it performs within an accepted
level of accuracy.
Verification. Testing an expert system to determine if it uses the same reasoning
process as the expert(s) used to build the system.
12.7 Exercises
Review Questions
1. Differentiate between the following terms:
a. Top-down and bottom-up knowledge acquisition
b. Forward and backward chaining
c. Breadth-first and depth-first search
d. Knowledge and Metaknowledge
e. Knowledge-based system and expert system
f. Validation and verification
2. List several heuristics that you use in your daily activities.
3. List two or more situations where you have used the nearest neighbor heuristic.
4. Which of the expert system application areas in Section 12.3 could also be
approached with a data mining solution?
5. Here is a bottom-up knowledge acquisition approach for the example described
in the second part of Section 12.4. Recall that the example dealt with
building a rule-based system to choose a data mining tool.
• Present data mining problems one at a time together with corresponding
datasets to a data mining expert.
• Ask the expert to verbalize how he or she selects a data mining technique
for each problem. Attempt to determine how the expert chose a technique
for each problem by recording and analyzing the verbalizations.
• Meet with the expert as necessary to discuss and modify your conclusions
about how the technique for each problem was chosen.
Compare this approach to the one described by the original example.Which
approach will require more time from start to finish in order to build a working
system? Which method is more likely to give a best result?
6. Can the water jug problem be applied via backward chaining? Why or why not?
Computational Questions
1. Use the rules in Table 12.2 to write the sequence of actions leading to a solution
of the water jug problem by following a breadth-first search.
2. Three missionaries and three cannibals find themselves on one side of a river.
They have agreed that they would like to get to the other side.The missionaries
want to arrange the trip across the river so that the number of missionaries
on either side of the river is never less than the number of cannibals who are
on the same side. The only boat available holds a maximum of two people.
Draw the state space for this problem making sure that the missionaries never
find themselves at risk of being eaten.
3. The eight-puzzle is a square tray containing eight square tiles. Each tile is
numbered.The remaining ninth tile is uncovered and represents a blank space.
A tile that is adjacent to the blank space can be slid into that space.A game
consists of a starting position and a goal position.The goal is to transform the
starting position into the goal by sliding the tiles. Draw the state space for the
eight-puzzle problem shown below. Can you think of any heuristics to help
limit the state space search? (Hint: Think of each move as repositioning the
blank space.)
Start
Goal
283
164
7 5
123
8 4
765
4. State space representations can be simplistic or quite complex. Consider the
game of tic-tac-toe. Each state of the state space can be nicely represented as a
3 3 grid. However, even this simple problem results in a very large state
space.Develop several heuristics that help you determine an appropriate move
at any point within the state space.
5. Show how the production rules given in Table 12.3 are applied in a forward
manner to conclude g is true. Assume a and c are true.
6. Provide the production rules for the subtree shown in Figure 12.11 representing
the goal Decision Tree Is Selected.
7. Build a goal tree for one or more of the remaining federal tax dependency
tests.
8. Build goal tree for one or more of the following problems:
• Selecting a package delivery method
• Fossil or tree identification
• Deciding on which federal tax form to use (1040EZ, 1040A, 1040)
• Determining someone’s astrological sign
• Finding an apartment
• Grading a coin
• An expert dating service
• Choosing a university to attend
• Determining whether or not you can give blood
• Determining if you can file an electronic federal tax return
• Determining what type of pet would be best for you
Download