Chapter 12 Rule-Based Systems Chapter Objectives State that the field of artificial intelligence focuses on problems that cannot be solved with traditional computing techniques. Explain how intelligent problem-solving methods define a problem as a state space to be searched. Summarize how data mining techniques employ a state space search to build generalized models from data. Describe how rule-based systems separate problem-solving knowledge from the reasoning mechanism used to apply the knowledge. Illustrate how an expert system is a special type of rule-based system that contains the knowledge of one or more human experts. Explain how building an expert system requires the skills of a knowledge engineer who is responsible for extracting and transforming the knowledge of one or more experts. List the types of problems that can be solved with a rule-based approach. Describe how an expert system shell is used to decrease the amount of time necessary to build a rule-based system. Explain how a rule-based system can be built using a top-down or a bottom-up problem-solving approach. _________ The purpose of data mining and knowledge discovery is to turn data into models for decision making. Although data mining is an appropriate solution methodology for many applications, there are times when this approach is not feasible.An obvious scenario is any situation lacking quality data to be analyzed. Fortunately, knowledge comes in many forms and can be gathered in many ways. Because of this, when data mining is not a viable choice, other options for building useful decision-making models may be available. In Part IV we examine two alternative methods for building models to aid in the decision-making process. Expert systems are computer programs designed to emulate the behavior of a human being who is expert at solving problems within a specialized area. Intelligent agents are computational entities capable of autonomously achieving goals by executing needed actions. Data mining, expert systems, and intelligent agents all exist under the umbrella of artificial intelligence problem-solving techniques.Although all three approaches are distinct, each has a primary goal of creating intelligent systems. In Section 12.1 we provide a brief introduction to the field of artificial intelligence (AI) and describe the types of problems of interest to AI practitioners. In Section 12.2 we offer a general approach for AI problem solving and show how this model is applied to data mining. In Section 12.3 we discuss rule-based expert systems. Expert systems are of interest because, like data mining, they focus on building and applying generalized models to specific problems. However, unlike data mining applications, which construct models from data, expert systems are built by extracting knowledge from one or more human experts. In Section 12.4 we provide two examples to illustrate how rules and rule-based techniques are used for representing and reasoning with knowledge. 12.1 Exploring Artificial Intelligence The history of artificial intelligence (AI) dates back to the 1940s, starting with the work of Warren McCulloch and Walter Pitts (1943), Claude Shannon (1950), and Alan Turing (1950). Since its beginnings, the AI field has experienced many successes as well as failures.Two of AI’s greatest commercial successes have been in the areas of expert systems and data mining.Table 12.1 displays a list of common areas of current study within the AI field. Because of its extensive scope, a single all-encompassing definition of AI is difficult to state. Let’s examine four plausible definitions. Problem solving is a common theme exhibited by each definition. 1. AI is the study of problems that cannot be solved using traditional algorithmic techniques. 2. AI is the study of how to make computers do things which at the moment people do better (Rich and Knight, 1991). • CD-4 Chapter 12 Rule-Based Systems 3. AI is the study of computations that make it possible to perceive, reason, and act (Winston, 1992). 4. AI is the study of finding polynomial solutions to exponentially hard problems. Each definition gives some insight about the types of problems studied by AI practitioners. The first definition tells us that problems such as how to update credit card accounts, print payroll checks, and find square roots of numbers are not of interest to AI researchers, as these problems can be easily solved with simple algorithms. The second and third definitions emphasize the human side of the field. Problems such as how to make computers see (computer vision), understand the written or spoken language (natural language processing and speech recognition), tutor an algebra student (intelligent tutoring systems), or play a mean game of chess (game playing) do appeal to the AI community. The fourth definition is of particular interest because it emphasizes the difficult nature of the problems studied by AI researchers.An exponentially hard problem is any problem that cannot be solved in a reasonable amount of time with a traditional algorithmic approach.A classic problem that fits this category is the traveling salesperson problem. One description of the problem is as follows. Given a set of n cities, with one of the cities designated as the starting position, visit all of the cities (and arrive back home) while keeping the total distance traveled as short as possible. • Exploring Artificial Intelligence CD-5 Table 12.1 • Areas of Study Within Artificial Intelligence 12.1 Scheduling Problems Expert Systems Game Playing Intelligent Agents Intelligent Database Retrieval Intelligent Tutoring Systems Machine Learning Natural Language Processing Planning Robotics and Computer Vision Speech Recognition Theorem Proving Although the problem is easy enough to understand, a good solution is not readily apparent. A first thought is to simply let the traveling salesperson try each alternative route. For a weekly route, the salesperson might travel a new course each week until all possibilities have been exhausted. Given this, a weekly itinerary of four cities allows the salesperson to know a best course of travel after just six weeks. Unfortunately, for the general case of n cities, there are a total of (n – 1)! possible paths.Therefore when the number of cities increases to 10, the total count of possible tours expands to 362,880. Trying one path per week, our salesperson will be traveling a new route for over 6975 years! Even a computer program that knows the travel cost between each pair of cities will require a significant amount of computing time to check all possible paths. The traveling salesperson problem can easily be generalized. An interesting variation of the problem follows. A school district has a fixed number of buses with which to transport a population of students to school each day. An individual student should not have to board their bus before 7:30 A.M. Determine the route and stop locations for each bus so that the students are transported to the school on time. The traveling salesperson problem and its variations are notoriously difficult to solve.To apply a brute force technique of trying each alternative path simply allows for too many possibilities.Although no one has discovered an efficient method that can find the shortest path for a given set of points, many methods have been studied that seem to work well in most situations. These methods use one or more heuristics. Recall that heuristics are rules of thumb that tend to give good enough answers in most situations but never guarantee a best solution. One popular approach uses the nearest neighbor heuristic, which tells us to always travel the route of the next-closest city. By applying the nearest neighbor heuristic, we limit the total number of tour possibilities while still having a reasonable chance of achieving an acceptable solution. Let’s look at an example. Figure 12.1 displays four cities that are directly connected to each other. Individual paths are labeled with a cost for traveling from one city to the connecting city.That is, a 10-unit cost is seen when traveling directly between cities A and C.The route from city A to C through city B shows a total cost of 8 units. Let’s assume city A as the starting point and apply the nearest neighbor heuristic.The path traveled by the nearest neighbor algorithm is A-B-C-D-A, which gives a unit cost of 32. In this case, the path determined by the heuristic is one of two best paths. The alternative best path, which simply reverses the ordering of the first path, is A-D-C-B-A.To see that the nearest neighbor heuristic does not always give a best answer, let’s apply the method to the graph in Figure 12.2. Starting at A, the best route is A-B-C-D-A with a total cost of 33 units.The nearest neighbor algorithm travels the route A-C-D-B-A with a total cost of 35 units. The method employed by the nearest neighbor heuristic of defining a problem as a search through a set of limited possibilities is central to AI problem solving.The idea • CD-6 Chapter 12 Rule-Based Systems for solving a difficult problem is to find a solution in a moderate amount of time that works within the boundaries of the constraints defining the problem. For the school district busing problem, our chances of finding a minimal cost solution are nearly impossible. However, a solution that lies within the district’s fixed budget is a likely option. The next section expands on the notion of heuristic search. 12.2 Problem Solving as a State Space Search Two issues fundamental to AI problem solving are how to represent knowledge and how to reason with this knowledge to find problem solutions. It is often helpful to think of the process of finding a solution to a difficult problem as a search through a state space of possibilities. A state space is defined as the set of all possible problem states.A reasoning strategy that allows us to search through this state space in a reasonable amount of time will ultimately lead to a problem solution.The components of a state space search are as follows: • One or more initial states • One or more goal states • Problem Solving as a State Space Search CD-7 Figure 12.1 • Starting at city A, the nearest neighbor heuristic finds a 12.2 shortest path (A-B-C-D-A) A C B D 5 3 15 12 9 10 • A set of rules that move the search from the current state to a new state • A control strategy that examines the current state and determines the next rule to apply The basic solution strategy is quite simple.The first step is to describe how to represent an individual state of the state space. After this, one of the initial states is chosen as a starting position. Next, the control strategy chooses a rule to apply.The result of applying the rule leads the search to a new state. If the new state is a solution to the problem, the search terminates. If the new state does not represent a solution, the process of choosing and applying a new rule is repeated until a solution is found or the search is terminated.An example illustrates the process. The Water Jug Problem The water jug problem offers a clear example for us to follow the process of state space search.A description of the water jug problem follows. Suppose we have two jugs.One jug is capable of holding 3 gallons of water.A second jug can hold up to 4 gallons of water.There are no measurement lines on either jug. CD-8 Chapter 12 • • Rule-Based Systems Figure 12.2 The nearest neighbor heuristic chooses the path A-C-D-B-A even though A-B-C-D-A is a better choice C B A D 15 5 6 12 10 4 Therefore we can never determine the exact amount of water in either jug. However, by looking in either jug, we can determine if the jug is empty, full, or contains some water. We are to use some combination of the rules displayed in Table 12.2 to place 2 gallons of water into the 4-gallon jug. Before we look at a solution, you may wish to solve the problem on your own. If you choose to come up with your own solution,we encourage you to record your answer so you can compare your result with ours. In any case, to find a solution, we must first determine a representation for an individual problem state.As a representation need only keep track of the amount of water in each jug, an obvious choice is to display each state as an ordered pair (x,y) where x is the amount of water in the 4-gallon jug and y is the amount of water in the 3-gallon jug.We can assume an initial problem state of (0,0) without any loss of generality. A goal state is any state showing a value of 2 for x. Therefore we represent the goal as (2,?) where ? indicates a “don’t care” condition. Next,we need a set of rules to move from one state to a new state.Table 12.2 offers a listing of eight such rules. The column labeled Required Conditions shows the preconditions to be met for each rule to be applied.The column labeled Action tells us • Problem Solving as a State Space Search CD-9 Table 12.2 • Rules for the Water Jug Problem 12.2 Required Resultant Action Conditions State 1. Fill the 4-gallon jug. The 4-gallon jug is not full. (4,y) 2. Fill the 3-gallon jug. The 3-gallon jug is not full. (x,3) 3. Empty the 4-gallon jug The 4-gallon jug is not empty. (0,y) onto the ground. 4. Empty the 3-gallon jug The 3-gallon jug is not empty. (x,0) onto the ground. 5. Pour water from the 3-gallon jug into the 4-gallon The total amount of water in both jugs (4,y – (4 – x)) jug until the 4-gallon jug is full. is > = 4 and the 3-gallon jug is not empty. 6. Pour water from the 4-gallon The total amount of water in both jugs (x – (3 – y),3) jug into the 3-gallon jug until is > = 3 and the 4-gallon jug is not empty. the 3-gallon jug is full. 7. Pour all the water from the The total amount of water in both jugs (x + y,0) 3-gallon jug into the 4-gallon jug. is <=4 and the 3-gallon jug is not empty. 8. Pour all the water from the The total amount of water in both jugs (0, x + y) 4-gallon jug into the 3-gallon jug. is <=3 and the 4-gallon jug is not empty. how to apply each rule.The column displaying Resultant State computes the new state following the application of each rule. The final ingredient for completing the problem description is a control strategy. In general, the search through a state space can be either depth-first or breadth-first.A left-to-right depth-first search moves vertically deeper into the search space before expanding the search in a horizontal manner.A breadth-first search moves horizontally through the search space before extending the search to the next level. Figure 12.3 applies each technique to a hypothetical state space having node A as the initial state. A depth-first or breadth-first search can be applied using either forward or backward chaining.To apply forward chaining, we look at the antecedent (required conditions in Table 12.2) of a rule to determine if the conditions of the rule have been met by what is currently known. If the conditions have been met, the rule fires, and the consequent of the rule is added to the current state of knowledge.The water jug problem is best solved using forward chaining. We will describe backward chaining in the next section. CD-10 Chapter 12 • Rule-Based Systems • Figure 12.3 A hypothetical state space representation: A depth-first search through the space follows A-B-E-F-C-G-I-J-H-D whereas a breadth-first search is given by the path A-B-C-D-E-F-G-H-I-J A CBD JI HGFE To apply forward chaining, we examine the Required Conditions column of Table 12.2 and see that either rule 1 or rule 2 can be applied to the start state of (0,0).The result of applying rule 1 to the initial state gives the state node (4,0). Applying rule 2 to the initial state results in a new state node of (0,3).The arrow labeled step 1 in Figure 12.4 points to the search space after the application of both rules to the initial state. We must now decide whether the search will be depth-first or breadth-first. For our problem, a depth-first search tells us to examine nodes that follow the path given by (4,0) until a solution is found or a dead end is encountered.With a breadth-first search, we also generate the immediate children of node (4,0). However, if a solution is not given by a child of (4,0),we then move across the search space and generate the children of (0,3).We will solve the problem with a depth-first approach. Table 12.2 tells us that rules 2 and 6 can be applied to the state given by (4,0). Rule 3 is also applicable; however, because its application returns us to the initial state, the rule is not considered.To proceed, rule 2 generates (4,3) and rule 6 gives us state node (1,3).The arrow labeled step 2 in Figure 12.4 points to the updated search space. As neither node is a goal state, we proceed to investigate rules applicable to the state (4,3). Rule 3 or rule 4 can be applied to state node (4,3). However, the application of either rule leads us to a state that is already in the state space.Therefore the state node (4,3) represents a dead-end in the search space.We abandon this state and attempt to generate new states using state node (1,3). Rule 4 is the only rule applicable to state node (1,3).Applying this rule adds (1,0) to the next level of the search space.The new state space is shown in Figure 12.4 to the right of the arrow labeled step 3. Continuing from state node (1,0),Table 12.2 tells • Problem Solving as a State Space Search CD-11 Figure 12.4 • A depth-first solution for the water jug problem 12.2 (0,0) (0,3) (4,0) Rule 1 Rule 2 (0,0) Initial State Step 1 (0,0) (0,3) (4,0) Rule 1 Rule 2 Step 2 (1,3) (4,3) Rule 2 Rule 6 (0,0) (0,3) (4,0) Rule 1 Rule 2 Step 3 (1,3) (4,3) Rule 2 Rule 6 (1,0) Rule 4 (0,0) (0,3) (4,0) Rule 1 Rule 2 Step 4, 5 & 6 (1,3) (4,3) Rule 2 Rule 6 (1,0) Rule 4 (0,1) Rule 8 (4,1) Rule 1 (2,3) Rule 6 us that the only valid rule is rule 8, which instructs us to pour all of the water from the 4-gallon jug into the 3-gallon jug.The new state node (0,1) is added to the state space. Once again, checking the rules,we see that rule 1 can be applied to the current state.The resultant state is (4,1).The single rule applicable to this new state is rule 6, which instructs us to pour water from the 4-gallon jug into the 3-gallon jug until the 3-gallon jug is full. The resulting state is (2,3), which represents a goal state, thereby giving us a solution to the problem.The arrow labeled steps 4, 5, & 6 in Figure 12.4 shows the final search space leading to a solution. Achieving the goal is useless without knowing how the goal was determined. Therefore our solution must be stated as a sequence of steps.We can write the solution by following the path in the search space leading to the goal. Here are the steps for the solution to the water jug problem: 1. Apply Rule 1: Fill the 4-gallon jug. 2. Apply Rule 6: Pour water from the 4-gallon jug into the 3-gallon jug until the 3-gallon jug is full. 3. Apply Rule 4: Empty the 3-gallon jug onto the ground. 4. Apply Rule 8: Pour all the water from the 4-gallon jug into the 3-gallon jug. 5. Apply Rule 1: Fill the 4-gallon jug. 6. Apply Rule 6: Pour water from the 4-gallon jug into the 3-gallon jug until the 3-gallon jug is full. Applying a breadth-first search to the rules in Table 12.2 reveals a second solution to the water jug problem. Both solutions can be seen in Figure 12.5, which displays the complete state space generated by the rules defining the water jug problem. Although simple, the water jug problem gives us insight into how problems are solved with a state space search strategy. More difficult problems require sophisticated heuristics to limit the size of the search space. As is the case with the nearest neighbor heuristic described earlier, most time-saving heuristics provide a good enough answer, however, they rarely offer a best result. Backward Chaining Forward chaining is appropriate when we wish to determine all possible outcomes from a set of facts and rules.With forward chaining, we apply rules whose antecedent conditions match a set of known facts. The process continues until a desired goal is achieved or additional information cannot be added to what is currently known to be true. However, many problems are best solved by starting at a goal state and working • CD-12 Chapter 12 Rule-Based Systems backwards. One such problem is finding a path through a maze.As a second example, when diagnosing a patient, a medical doctor will frequently use a combination of both forward and backward reasoning. As a first step, the patient offers an initial set of symptoms.The doctor then uses the symptoms to reason in a forward fashion to determine a set of one or more possible diseases (goals). The goals are each tested in some order by continuing to examine and question the patient until a diagnosis can be made. If a diagnosis cannot be determined, the doctor recommends a series of laboratory tests. Forward reasoning with the results of the tests leads to a new set of possible diseases, and the process is repeated. In general, any goal that can be stated as a question is a likely candidate for backward chaining. Here is a list of additional problems appropriate for a backward-reasoning approach: 1. Is a specific credit card holder likely to accept the life insurance promotional offering contained with their next billing statement? 2. Which data mining tool should I apply to a given dataset? 3. Is the current loan applicant likely to default on a new car loan? • Problem Solving as a State Space Search CD-13 Figure 12.5 • The complete state space for the water jug problem 12.2 (0,0) (0,3) (4,0) Rule 1 Rule 2 (1,3) (4,3) Rule 2 Rule 6 (1,0) Rule 4 (0,1) Rule 8 (4,1) Rule 1 (2,3) Rule 6 (3,0) Rule 7 (3,3) Rule 2 (4,2) Rule 5 (0,2) Rule 3 (2,0) Rule 7 4. What delivery method should I use to send my package to Seattle? 5. My mother-in-law is currently living in our home. Can I claim her as a dependent? Backward chaining attempts to satisfy a goal by examining the consequent of individual rules.The process of backward chaining creates a tree structure known as a goal tree. The top-level node of the goal tree is a goal state.A goal tree represents a partial state space and is useful in that only those states pertinent to a solution are examined. Backward chaining is best illustrated with an example. Let’s see how backward chaining works with the hypothetical production rules displayed in Table 12.3.We assume that facts a and c are initially true and attempt to prove that g is true.The procedure is as follows: 1. The first step is to check the set of facts to determine if g is initially true. As g is not currently true,we create a node representing the goal to be proved.The goal node is shown in the upper-left portion of Figure 12.6. 2. Next, we examine the rules in Table 12.3 in an attempt to find a rule whose consequent states the goal is true. Rule 3 tells us that if y and x are true, then g is true.Therefore the antecedent conditions of rule 3 are added to the goal tree.The updated goal tree is shown to the right of the arrow labeled rule 3 in Figure 12.6.The rule’s and conditional is specified by the arc passing through both antecedent conditions. 3. Proceeding across the goal tree in a left-to-right fashion,we ask if condition y is currently part of our list of known facts. As it is not, y becomes a new goal to be satisfied.We examine Table 12.3 to find a rule whose consequent states that y is true. Rule 4, which states that w must be true for y to be true, is such CD-14 Chapter 12 Table 12.3 • Rule-Based Systems • A Hypothetical Set of Production Rules Production Rules Known Facts 1. If a then b a, c 2. If c and d then x 3. If x and y then g 4. If w then y 5. If w then d 6. If b then y a rule.The antecedent condition of rule 4 is added to the goal tree.The updated tree is seen by following the arrow labeled rule 4. 4. Next, we must decide if the process of building the goal tree is to proceed depth-first or breadth-first. A depth-first strategy makes w a new goal to be pursued. A breath-first strategy looks for other rules having y as their consequent before proceeding vertically deeper into the goal tree. As backward chaining is usually implemented as a depth-first procedure,we follow a depthfirst approach and make w the current goal. 5. Given that w is not a fact,we look for a rule whose then part indicates that w is true. As no such rule exists, we have encountered a dead end.This requires us to attempt to satisfy y with another rule.We check the table rules and find that rule 6 states y is true provided b is true.The new goal tree is shown in Figure 12.6 to the right of the arrow labeled rule 6. • Problem Solving as a State Space Search CD-15 Figure 12.6 • Creating a goal tree 12.2 g Rule 3 goal g xy Rule 4 g xy w Rule 6 g xy wb Rule 1 g xy wb a Rule 2 g xy wb a Rule 5 dc g xy wb a dc w 6. Next,we check to see if b is currently a fact. It is not. However, rule 1 tells us that b is true provided a is a true fact.The added information is shown in the goal tree and is labeled by the arrow rule 1. 7. To continue,we check the list of facts and discover that a is known to be true. As a is true, we can move up the goal tree.That is, because a is true, we know we can prove b is true by applying rule 2. Because b can be proven true,we can also prove y to be true (rule 1).We must still prove x is true to conclude the initial goal. 8. Rule 2 tells us that x is true if c and d are true.We add this information to the goal tree.The updated tree is seen to the right of rule 2 in Figure 12.6. 9. As c is a known fact,we attempt to show d is true. Rule 5 tells us that d is true if w is a true statement.This new information creates the goal tree seen by following the arrow labeled rule 5. 10. Next, we attempt to prove w is true. As w is not currently known to be true, we look for a rule whose consequent states w as true. Such a rule does not exist. Therefore we must try to prove x is true in a new way. Because another rule stating x as its consequent does not exist,we must attempt to find another way to prove g is true. Because a second rule stating g as a consequent does not exist, the process of creating the goal tree terminates, and we conclude that g cannot be proved true. The goal tree makes it clear that to prove g we must first find a way to show that w is true. As w was not initially stated as a known fact, and there are no rules to prove w as true, it appears that all hope of showing g as true is lost. However, a third option frequently made available with rule-based systems is to ask the user about the truth value of a particular fact. The technique is appropriate when the questions asked of the user are specific to the user’s particular problem. Questions about a user’s age, income range, and gender are representative of the type of questions falling into this category. We will examine rule-based systems in more detail in Section 12.4. First, we describe the connection between data mining and state space search. Data Mining and State Space Search A state space is the set of all possible problem states for a given application.We can view the process of building a data mining model as a state space search.A goal state is defined by a predetermined criteria for success such as a minimum value for a performance measure. At a minimum, an initial problem state description will contain the following information: CD-16 Chapter 12 • Rule-Based Systems • A learning strategy and a learner technique for implementing the strategy • A set of training and test data as appropriate • A selection of input and/or output attributes • Initial parameter settings for the chosen data mining tool • The model created by applying the learner technique to the training data • Predictiveness and predictability scores for all categorical input attributes and attribute significance scores for all numeric input attributes • A measure of performance (e.g., test set accuracy) specifying a goodness score for the current state As there is neither sufficient time nor resources to generate all possible models, we rely on heuristic rules to guide us in choosing a model to best represent the data in question. Here are five rules to help guide a state space search through the set of all possible data mining models. In each case, the application of a rule creates a new current state, which in turn generates a new learner model. 1. IF the learner model was created using backpropagation AND the rms does not fall below a maximal range THEN double the number of epochs seen in the current state. 2. IF test set accuracy is below an accepted level of performance AND training data is limited THEN add the test data to the training set data AND test future models using cross-validation. 3. IF test set accuracy is below an accepted level of performance THEN remove all attributes of little predictive value. 4. IF numerical attribute a is suspected to be predictive of class membership AND attribute a turns out to be of little predictive value THEN replace the values of attribute a with their base 2 logarithmic equivalents. 5. IF the learner model is ESX AND learning is unsupervised AND the current learner model shows more than five clusters THEN lower the value of the similarity parameter by a value of 10. Hundreds of heuristic rules such as those presented here can be stated.As we develop our data mining skills,we create our own favorite set of model building rules as 12.2 • Problem Solving as a State Space Search CD-17 well as our own set of sequences for applying these rules. Fortunately, the nature of the data mining tools we use are quite forgiving in that they allow us to find acceptable models when we follow less than optimal paths within the state space of all possible models. 12.3 Expert Systems Expert systems initially came to light in the early 1970s with the development of three programs each containing the knowledge of one or more human experts.The first program, DENDRAL (Buchanan et al., 1969), was designed to imply molecular structure from information provided by a mass spectrometer.The system incorporated heuristic rules used by chemists to limit the space of possible solutions. The second success was a system called MYCIN (Shortliffe, 1976). MYCIN contained approximately 450 rules that were used to diagnose infectious blood diseases. MYCIN differed from DENDRAL in that it associated a measure of uncertainty with each rule. The third system, PROSPECTOR (Duda et al., 1979), incorporated a probabilistic approach to suggest exploratory drilling sites for several types of minerals. Building an expert system requires the knowledge of at least one expert. Unbounded problems that do not have agreed-upon solutions—such as how to handle the national debt or how to choose stocks to buy—are not viable problems for expert system application. Today, expert systems are used to help make decisions and decrease the amount of time spent on problem-solving tasks. When appropriate, an expert system can speed up the problem-solving process by a factor of 10.The following is a small sampling of areas where expert systems have been successfully applied: • Should an individual or a company be given a loan? • What is an appropriate listing of homes to be viewed by a particular family? • What are the insurance needs (life, health,home, car, etc.) for a particular individual? • Can a life insurance policy be offered to an applicant? If so, how much insurance can be obtained by the applicant and at what premium? • How can individual hotel rooms be allocated so as to maximize profit? • What are the possible causes of a computer hardware failure? • How should weapons be allocated to enemy targets? • How many round-trip flights should be scheduled over the Christmas holiday between Chicago and Boston? CD-18 Chapter 12 • Rule-Based Systems • What should be the cost and payment schedules for those responsible for polluting the environment? • When should regular maintenance be performed on my automobile? You may notice that some of these application areas coincide with problems that can be solved through a data mining approach. In one of the end-of-chapter exercises you are asked to review this list and determine which problems are also candidates for a data mining solution. The Components of an Expert System Figure 12.7 displays the three major components of an expert system. The knowledge base contains the domain knowledge about the problem at hand.The domain knowledge is usually declarative and is often stored as a set of facts and production rules. Some expert systems store declarative knowledge as objects or network structures. The knowledge base may also contain metaknowledge, which represents knowledge about the knowledge contained in the expert system. The inference engine determines which parts of the knowledge base will be applied to a specific problem.When the domain knowledge is stored as a set of production rules, the inference engine uses forward or backward chaining to determine which rules apply to the current situation. Regardless of the search technique, the inference engine must have a conflict resolution strategy when more than one rule applies to a specific circumstance. One approach to conflict resolution is to use a set of metarules to determine the order of rule application. Once a rule has been chosen, the inference engine executes the rule and adds the knowledge contained in the rule consequent to the current state. The third major expert system component is the user interface. The user interface allows the user to interact with the expert system.The interface queries the user for information about a specific problem and offers explanations about the results of each reasoning process. Separating Knowledge and Reasoning Figure 12.7 shows that—unlike conventional programs—an expert system separates the knowledge contained within the program and the reasoning mechanism used to apply this knowledge.This is a general strategy used by all rule-based systems.You saw this separation of knowledge and control in our description of the water jug problem. The following are three reasons why this strategy is particularly appealing: 12.3 • Expert Systems CD-19 1. Knowledge can be easily added to or deleted from the knowledge base without disturbing the code contained in the user interface or the inference engine. 2. Several reasoning strategies can be applied to the same knowledge base. 3. Once an expert system has been built, additional expert systems can be built by simply replacing the knowledge contained in the existing system with the knowledge for each new application. An expert system shell is a tool that implements this third feature. An expert system shell is essentially an expert system with its domain knowledge removed. A shell contains a user interface and an inference engine. When the user interface and the inference engine are linked with a particular knowledge base, an expert system results. Shells are used for quick and efficient expert system development. The single most important criteria to consider in choosing a shell is that the shell must adapt well to the problem at hand.Trying to fit an expert system shell to a particular problem is a serious mistake and will likely result in project failure. Sophisticated expert system shells allow for several knowledge representation and reasoning process options. Several public domain expert system shells are readily available for download from various Web sites.Two expert system shells are of particular interest. The Java Expert System Shell (Jess) is freely available from Sandia National Laboratories at http://herzberg.ca.sandia.gov/jess. Jess contains many of the features seen with expensive expert system shells.The ES Expert System shell, described in the October/November 1990 issue of BYTE magazine, is available for free download at ftp.uu.net:/pub/ai/ expert-sys/ as summers.tar.Z. ES is easy to use, implements both forward and backward chaining, and has the capability of reasoning about uncertain information. CD-20 Chapter 12 Figure 12.7 • Rule-Based Systems • An expert system architecture User Interface Inference Engine Knowledge Base User Developing an Expert System Building a large-scale expert system is a tedious and difficult process. Figure 12.8 shows the task can be divided into five major subproblems. The arrows indicate that we can revisit each subproblem as necessary. Let’s take a closer look at each step of the expert system development cycle. Step 1: Problem Definition Problem definition is the first major phase of the expert system building process. Once a problem domain has been identified, a specific task within this domain is clearly defined. For example, one possible problem domain is disease diagnosis.A specific task within this domain is the diagnosis of infectious blood diseases. Step 2: Knowledge Acquisition During knowledge acquisition, a knowledge engineer interacts with one or more human experts in an attempt to capture their knowledge. Knowledge acquisition is the most difficult part of the expert system building process.This is because much of an expert’s knowledge is in a compiled form. Compiled knowledge is any knowledge not easily extracted and interpreted. Knowledge acquisition techniques can be divided into two major categories. Bottom-up methods are inductive in that the knowledge engineer observes experts as they solve real or hypothetical problems within the application domain.The observation involves recording and making generalizations about the details of the problemsolving process. After this, the experts and the knowledge engineer meet several times to determine if the important elements of the problem-solving process have • Expert Systems CD-21 Figure 12.8 • The expert system development cycle 12.3 1 Problem Definition Knowledge Representation Knowledge Acquisition Knowledge Programming Testing & Evaluation 2 4 3 been correctly generalized.Top-down methods are deductive and include conducting structured interviews with one or more experts as well as having experts play the role of a teacher or learning partner. Step 3: Knowledge Representation It is the responsibility of the knowledge engineer to represent the knowledge obtained from the knowledge acquisition procedure in a format suitable for processing by a computer program. Although several knowledge representation techniques exist, the most common method is to represent an expert’s knowledge as a set of production rules. Step 4: Knowledge Programming Knowledge programming is the process of encoding the knowledge extracted from an expert into a knowledge base. In order to perform this task, the knowledge engineer must decide on what knowledge engineering tool he or she will use. In a majority of cases, a sophisticated expert system shell—which allows for a multitude of knowledge representation and reasoning strategies—is sufficient for implementing the system. In rare cases, an expert system will need to be developed from scratch by writing a computer program.The main advantage of writing a computer program to implement the system is flexibility. The main disadvantages are length of development time and increased development cost. Step 5: Testing and Evaluation An expert system must perform as expected and must meet the cosmetic demands of those individuals that will be using the system on a regular basis. The following are three common expert system evaluation techniques: • Verification asks if the expert system solves problems by using the same reasoning process as that used by the expert(s) whose knowledge is contained within the system.To test this, the expert system is given a set of problems to be solved. For each problem, the analytical procedure of the expert system is matched against the reasoning process used by the human expert(s). • Validation seeks to determine if the expert system performs within an accepted level of accuracy.A validation asks if the expert system is able to correctly solve the same problems as those solved by the expert(s) used to build the system. Unlike a system verification, where the focus is on the internal workings of the system, the focus of an expert system validation is external. CD-22 Chapter 12 • Rule-Based Systems • Reliability asks if the expert system is able to perform with an accepted level of consistency. Candidates for reliability testing include the user interface, the knowledge base, and the inference engine.The ability of an expert system to handle uncertain information is also measured during a reliability testing procedure. Size, cost, and development time for a specific application varies with the task and the building tools available.With large-scale applications, a great deal of caution must be exercised to ensure that the cost of a project will not exceed the potential profits realized by the developed software.The following is an important rule to follow when building a large-scale expert system. Initially model a scaled-down version of the expert system being developed.The scaleddown model is then expanded and modified as necessary until the final product results. This technique is referred to as rapid prototyping. With rapid prototyping, major problems such as poor task definition, inappropriate inference methods, and the like can be immediately dealt with. Also, demonstration versions can be used to show that the system will be worthwhile once it has been completed. 12.4 Structuring a Rule-Based System Many of our problem-solving skills can be expressed in the form of rules. Recipe books offer step-by-step procedures to help us cook tasty meals. Nearly all of our mathematical skills are based on rules.We use rules to decide whether we should send our packages by way of regular postal service, Federal Express, or UPS.We even have rules to help us when we engage in social conversation. Experts are also able to express their problem-solving knowledge in the form of rules. Because of this, nearly all of today’s expert systems have at least some rule-based components. In this section we provide two examples to illustrate how rules and rulebased techniques are used for representing and reasoning with knowledge. However, before we begin,we find it instructive to make a distinction between the terms expert system and knowledge-based system. Strictly speaking, an expert system must contain the knowledge of one or more domain experts.However, many useful rule-based systems have been built from knowledge extracted from textbooks and manuals. Such systems are often incorrectly referred to as expert systems because they are able to provide useful information from responses to simple questions.Although these programs are not expert systems, they do fall under the more general category of knowledge-based systems. Knowledge-based systems 12.4 • Structuring a Rule-Based System CD-23 include all rule-based systems that have been built from knowledge extracted from one or more experts, as well as those systems built with the help of other sources of information. We avoid confusion by using the term rule-based system to include any system that stores its knowledge in the form of a set of production rules. Let’s look at our first example! Example 1: Form 1040 Tax Dependency Our lives as well as the laws used to determine our tax obligations change each year. For this reason, most of us who are brave enough to fill out our own tax forms need at least some regular assistance. Help with various tax documents comes in many forms. One option is to telephone the IRS, leave a message, and wait for a return call. An IRS office in close proximity offers a second possibility.An appealing option is to buy a software package able to walk us through our tax season form-filing woes. Finally, we may be able to find answers to our tax questions on the Internet. Let’s explore this final option in more detail. Suppose the federal government wishes to significantly decrease the number of telephone calls it receives during tax season. Their plan is to build a Web site able to help the average taxpayer with their tax questions. A Web site supported by a rulebased system is an excellent implementation choice for several reasons. First, most tax laws are given as rules. Second, tax booklets contain clear step-by-step instructions for filling out tax forms.Third, the separation of knowledge and reasoning in a rule-based system allows the rule base to be easily modified whenever the tax laws change. Finally, requested tax assistance is invariably in the form of one or more questions readily structured for a backward-reasoning approach. As a first step, a menu listing all possible categories of available help items must be designed. For our example, we concentrate on but one of the many possible tax assistance categories. Specifically, we wish to design a rule base to help taxpayers determine if a person they are supporting in some way qualifies as a dependent. The knowledge we need to construct the rule base is contained in the text of a federal 1040 document.Therefore the knowledge acquisition process amounts to us examining and interpreting the contents of the textual description given for the dependency criteria. The year 2001 federal tax form 1040 shows us that five tests must be met for a person to qualify as a dependent. Here is a list of the five tests together with specific details about the criteria for passing each test: 1. Relationship test. The person must either be your relative or have lived in your home as a family member all year. If the person is not your relative, the relationship must not violate local law. • CD-24 Chapter 12 Rule-Based Systems 2. Joint return test. If the person is married, he or she cannot file a joint return. However, the person can file a joint return if the return is filed only as a claim for refund and no tax liability would exist for either spouse if they had filed separate returns. 3. Citizen or resident test. The person must be a U.S. citizen or resident alien or a resident of Canada or Mexico.There is an exception for certain adopted children. 4. Income test. The person’s gross income must be less than $2,900. However, your child’s gross income can be $2,900 or more if he or she was either under age 19 at the end of 2001 or under age 24 at the end of 2001 and was a student. 5. Support test. You must provide over half of the person’s total support in 2001. However, there are two exceptions to this test: one for children of divorced or separated parents and one for persons supported by two or more taxpayers. The knowledge required to construct the rule base is contained within the description of the five tests. Our next problem is to represent this knowledge as a set of production rules.This potentially difficult task is simplified if we first create a goal tree for the information contained within the descriptions for each test. The goal tree is useful in that it allows us to clearly visualize the problem and provides a base from which to construct a solution. Figure 12.9 displays the first level of the goal tree representing a solution for the dependency problem. The arc indicates that all five tests must be satisfied for a person to be claimed as a dependent. Our next step is to continue developing the goal tree by creating a subgoal structure for each condition shown at the first tree level.The process terminates when the goal tree contains leaf nodes whose truth value can be easily determined by a potential user of the • Structuring a Rule-Based System CD-25 Figure 12.9 • A first-level goal tree for dependency exemption 12.4 Person Is a Dependent Support Test Is Satisfied Relationship Test Is Satisfied Joint Return Test Is Satisfied Income Test Is Satisfied Citizen or Resident Test Is Satisfied system.We discuss the structure of the goal tree for the citizen or resident test (shown in Figure 12.10) and leave construction of the remainder of the goal tree as an exercise. Upon revisiting the citizen or resident test, we see that it is natural to distinguish candidate dependents who are residents of the United States from those who are nonresidents. The second level of the goal tree in Figure 12.10 shows the distinction by creating two nodes, one of which states that the resident test is satisfied and a second indicating that the nonresident test is satisfied.We can further differentiate residents into the subcategories United States citizen and resident alien. Likewise, we can distinguish nonresidents who are adopted children from those individuals who reside in Canada or Mexico.The tests to determine if someone is a citizen of the United States or a resident of Canada or Mexico are shown as leaf nodes within the goal tree.However, to complete the paths for resident alien is satisfied and adopted child test is satisfied requires additional research. IRS publication 519 holds the needed information. IRS publication 519 states that a person qualifies as a resident alien if the person holds a green card or satisfies the substantial presence test. The substantial presence test as given in the publication is as follows. CD-26 Chapter 12 Figure 12.10 • Rule-Based Systems • A goal tree for the citizen/resident test Rule 1 Person Is a Dependent Support Test Is Satisfied Relationship Test Is Satisfied Joint Return Test Is Satisfied Income Test Is Satisfied Citizen or Resident Test Is Satisfied Green Card Test Is Satisfied Presence Test Is Satisfied Resident Alien Is Satisfied C/M Test Is Satisfied Adopted Child Test Is Satisfied United States Citizen Non-Resident Test Is Satisfied Resident Test Is Satisfied 183 Days During Last 3 Years 31 Days Current Year Resident of Mexico Resident of Canada Child Is Adopted Foreign Test Is Satisfied Lived with Taxpayer Entire Year Lived in Foreign Country Citizen/Alien Is Satisfied United States Resident Not United States Resident Country/Child Is Satisfied Rule 2 Rule 3 Rule 11 Rule 10 Rule 8 Rule 6 Rule 4 Rule 9 Rule 7 Rule 5 The candidate dependent must have lived in the United States for at least 31 days of the current year and must have lived in the United States for a sum total of at least 183 days over the past three years.A maximum of 1/3 of the 183 days can come from the year prior to the current year.Also, a maximum of 1/6 of the 183 days can be allocated from two years prior to the current year. Please note, for a final implementation, the 183-day requirement necessitates further subdivision. Finally, an adopted nonresident child who lived in a foreign country with the taxpayer also satisfies the citizen or resident test. Once the goal tree is complete, mapping the tree to a set of production rules is straightforward.We can either begin at the top or the bottom of the tree structure. Let’s start at the top.The actual structure of the rules depends on the choice of a rulebased shell.We present the rules in a generic format. Here is the top-level rule represented by the goal tree: Rule 1: Dependency Test IF relationship test is satisfied AND joint return test is satisfied AND citizen or resident test is satisfied AND income test is satisfied AND support test is satisfied THEN person is dependent Continuing to the next tree level with rules for the citizen or resident test we have: Rule 2: Citizen or Resident Test IF resident test is satisfied OR nonresident test is satisfied THEN citizen or resident test is satisfied As an alternative,we could write this rule as two separate rules, each with one antecedent condition. Specifically, Rule 2a: Citizen or Resident Test 1 IF resident test is satisfied THEN citizen or resident test is satisfied Rule 2b: Citizen or Resident Test 2 IF nonresident test is satisfied THEN citizen or resident test is satisfied Continuing from left to right, we complete the rule set. The list of remaining rules follows. • Structuring a Rule-Based System CD-27 Rule 3: Resident Test Rule 4: Nonresident Test IF United States resident IF not United States resident AND citizen/alien is satisfied AND country/child is satisfied THEN resident test is satisfied THEN nonresident test is satisfied Rule 5: Citizen/Alien Test Rule 6: Country/Child Test IF United States citizen IF C/M test is satisfied OR resident alien is satisfied OR adopted child test is satisfied THEN citizen/alien is satisfied THEN country/child is satisfied Rule 7: Resident Alien Test Rule 8: C/M Test IF green card test is satisfied IF resident of Canada OR presence test is satisfied OR resident of Mexico THEN resident alien is satisfied THEN C/M test is satisfied Rule 9: Presence Test Rule 10: Adopted Child Test IF 31 days current year IF child is adopted AND 183 days during last 3 years AND foreign test is satisfied THEN presence test is satisfied THEN adopted child test is satisfied Rule 11: Foreign Country Test IF lived in foreign country AND lived with taxpayer entire year THEN foreign test is satisfied 12.4 The rules may be applied with forward or backward chaining. However, the natural inference strategy is backward chaining with the goal person is a dependent. If backward chaining is applied, rule inference to decide if the citizen or resident test is satisfied begins by asking the user one of 11 questions. The questions are defined by the antecedent conditions seen in the rules linked to the leaf nodes of the tree. For a final implementation, text statements clearly explaining each question are associated with the antecedent conditions of each rule. The questions defined by the antecedent conditions can be asked in any order. However, questions about candidate dependents who are residents of the United States are more likely than questions about nonresidents.Therefore pursuing the conditions associated with rule 3 before the conditions seen with rule 4 is appropriate. Because of this, a conflict resolution strategy to determine a best rule ordering is necessary. Finally, depending on the rule-based tool, it may be necessary to write additional rules to indicate negations of satisfied conditions. Here is a top-level rule stating the candidate has failed at least one of the five tests and therefore does not qualify as a dependent. • CD-28 Chapter 12 Rule-Based Systems Rule 12: Dependency Failure IF relationship test is not satisfied OR joint return test is not satisfied OR citizen or resident test is not satisfied OR income test is not satisfied OR support test is not satisfied THEN person is not a dependent The approach we have taken to solve the dependency problem is a top-down strategy because we started by building a tree containing the most general conditions for satisfying the goal.We continued to break down each subgoal into a simpler format until we created a structure containing leaf nodes with questions to be answered by the naïve user.The process was straightforward in that the information we needed to build the rule base had been designed and written for us. Our task was simply that of restructuring the information in a format amenable for knowledge programming. Our next example also uses a top-down strategy but is more difficult because it requires us to first interpret unstructured knowledge before we build the goal tree. Example 2: Choosing a Data Mining Technique As we have seen, each data mining method has its own list of advantages and disadvantages. For example, backpropagation neural networks are better at handling datasets containing noise but are poor at explaining their behavior. For our second example, we wish to incorporate knowledge about the individual data mining methods into a rule-based system.The rule base will help us determine which data mining technique is appropriate in any given situation. A description of the problem follows. Given a set of data containing attributes and values to be mined together with information about the nature of the data and the problem to be solved, develop a rule-based system to choose an appropriate data mining technique. The task is to build a rule-based system to determine which data mining technique to choose for a specific application. Unlike the previous example, we do not have a single document clearly describing the problem to be solved.Therefore a major portion of the work falls under the category of knowledge acquisition. One knowledge acquisition strategy is to build the system based on knowledge gathered from this text as well as other textbooks, journals, and magazines.A more appealing approach is • Structuring a Rule-Based System CD-29 to interview a data mining expert. Here is a paraphrase of a partial interaction between a knowledge engineer and a data mining expert who has been asked about the circumstances under which she decides to select a decision tree model to help solve a particular problem. KE: “What are the major factors that lead you to consider a decision tree building tool as a best choice for a particular problem?” E:“Well, learning must be supervised and there can be only one output attribute.” KE:“Are there any other factors that must hold true?” E:“Oh yes, the output attribute has to be categorical. It can’t be a numeric representation. Of course, you can define discrete categories for numbers, but this is usually a poor choice.” KE:“So you’re saying that learning must be supervised, there must be but one output attribute, and the output attribute must be categorical for you to consider a decision tree approach?” E:“Yes, that’s right!” KE: “If the conditions you have mentioned hold true, are there any other techniques besides a decision tree approach that could be used?” E:“Yes, you could use Bayes classifier or a production rule generator.There are others.” KE:“What makes you want to choose a decision tree over these other methods?” E: “Decision trees are particularly good at explaining their results.Also, when information about the distribution of the data is unknown, I tend to prefer a decision tree approach.” KE:“Why does a lack of knowledge about the data help you choose a decision tree model?” E: “Unlike some approaches, the decision tree technique does not require me to make assumptions about the data distribution.” KE: “Suppose the necessary conditions for selecting a decision tree are satisfied. What circumstances would make you prefer an alternative approach?” E:“Well, I don’t like to use a decision tree when most or all of the data is numeric.The tree will likely have too many conditional tests for it to be informative. Also, the rules tend to be difficult to interpret.” Figure 12.11 shows a partial goal tree for choosing a data mining technique where the subtree with top-level node Decision Tree Is Selected was developed based on 12.4 • CD-30 Chapter 12 Rule-Based Systems the interaction of the knowledge engineer and the expert.The production rule corresponding to the goal node Input Criteria Are Satisfied is of particular interest: IF some input attributes are categorical OR data distribution test is satisfied THEN input attribute criteria are satisfied The truth of the first antecedent condition is not easily determined because the word some leaves us with a degree of uncertainty about whether the condition has been met. Suppose there are a total of 10 input attributes. Is the condition satisfied if • Structuring a Rule-Based System CD-31 Figure 12.11 • A partial goal tree for choosing a data mining technique 12.4 Technique Selected Association Rule Strategy Selected Supervised Strategy Selected Unsupervised Strategy Selected Supervised Model Chosen Learning Is Supervised Backpropogation Learning Is Selected Decision Tree Is Selected Linear Regression Is Selected Desirable Criteria Test Is Satisfied Output Constraints are Satisfied Output Attribute Is Singular Output Attribute Is Categorical Input Attribute Criteria are Satisfied Explanation Is Required Production Rules are Required Data Distribution Test Is Satisfied Some Input Attributes Are Categorical ... ...... Data Distribution Is Not Normal Data Distribution Is Unknown two input attributes are categorical? If not, how about three input attributes? An obvious solution is to state a specific criterion to be met for the condition to hold true. For example,we could rephrase the rule as follows: IF 30% of all input attributes are categorical OR data distribution test is satisfied THEN input attribute criteria are satisfied We now have a clear criterion against which to test the truth value of the antecedent condition.To illustrate, suppose our dataset contains 50 input attributes, 15 of which are categorical. Since 30% of 50 is 15, the antecedent condition is satisfied. However, if only 14 categorical input attributes exist, the antecedent condition will not be met, and the data distribution test must be satisfied for the rule to fire. Common sense tells us that the rule antecedent would prove to be of better use if we could associate a degree of truth with the categorical attribute condition. Associating measures of certainty with rule antecedent and consequent conditions is the topic of the next chapter. 12.5 Chapter Summary Artificial intelligence concentrates on developing solutions to difficult problems that cannot be solved using traditional computing techniques. The central theme behind AI problem solving is to define a problem as a search through a space of possible problem states. The key to successful problem solving is to represent each state in a way that allows a reasoning strategy to search through a subset of the state space in a reasonable amount of time.The ability to limit the search to the relevant states increases the likelihood of finding a problem solution. AI problem-solving methods often represent knowledge as a set of production rules.Two fundamental reasoning strategies used for rule-based problem solving are forward chaining and backward chaining. Forward chaining is applied when we wish to determine all possible outcomes from a set of rules and facts. Backward chaining is appropriate when we have a question to be answered or a specific goal to be satisfied. Expert systems are computer programs designed to emulate the behavior of one or more human beings who are expert at solving problems within a specialized area. An expert system contains a knowledge base, which holds the domain knowledge for a specific problem, an inference engine, which reasons about stored knowledge, and a user interface, which allows the user to interact with the system.The user interface queries the user for information and offers explanations about the results of each reasoning process. Human experts often state their problem-solving knowledge as a set of rules. Therefore most expert systems have a rule-based component. A rule-based system is any system that stores its knowledge as a set of production rules. • CD-32 Chapter 12 Rule-Based Systems Potentially difficult rule-based applications are simplified if we first create a goal tree.A goal tree is a tree structure whose top-level (root) node is a goal state. Nodes at each level of the tree represent preconditions to be satisfied to traverse to the next higher tree level. A goal tree is useful in that it allows us to clearly visualize the problem and provides a base from which to construct a solution. As rule-based systems separate knowledge from the strategies used to reason about the knowledge, they can be easily modified. Also, rule-based systems have the ability to handle uncertain information and can explain their behavior. 12.6 Key Terms Backward chaining. This reasoning strategy places emphasis on creating new knowledge by determining what must be true for a goal to be achieved. If knowledge is in the form of a set of rules, rule consequents are examined to find a rule that, when executed, will achieve a goal. If such a rule is found, and the rule antecedent is determined to be true, the rule is applied. If the antecedent is not immediately true, the antecedent condition becomes a new goal and the process is recursively applied. Bottom-up knowledge acquisition. Any knowledge acquisition method that makes use of induction. Breadth-first search. A search strategy that expands a search space horizontally before moving deeper into the search space. Compiled knowledge. Knowledge not easily interpreted and/or extracted. Conflict resolution strategy. A technique for choosing a rule from the set of all applicable rules. Depth-first search. A search strategy that moves vertically deeper into the search space before expanding the search in a horizontal manner. Expert system. A computer program designed to emulate the behavior of a human being who is expert at solving problems within a specialized area. Expert system shell. A structure that contains a user interface and an inference engine. When linked with a knowledge base, an expert system results. Exponentially hard problem. Any problem that cannot be solved in a reasonable amount of time with a traditional algorithmic approach. Forward chaining. This reasoning strategy emphasizes the creation of new knowledge from what is known to be true. If the knowledge is in the form of rules, rule antecedents are examined to determine which rules can be applied. Goal tree. A tree structure whose top-level (root) node is a goal state. Nodes at each level of the tree represent preconditions that must be satisfied in order to traverse the tree to the next higher level. 12.6 • Key Terms CD-33 Inference engine. The component of an expert system that determines and applies selected parts of the knowledge base to a specific problem. Intelligent agent. A computational entity capable of autonomously achieving goals by executing needed actions. Knowledge base. The component of an expert system that contains the domain knowledge. Knowledge-based system. A general term referring to any rule-based system that has been built from knowledge extracted from one or several sources. Knowledge engineer. A person who interacts with an expert in an attempt to capture his or her knowledge. Knowledge programming. The process of encoding the knowledge extracted from an expert into a knowledge base. Metaknowledge. Knowledge about knowledge. Nearest neighbor heuristic. A heuristic technique used to navigate within a state space.The technique tells us to always move to the next closest state. State space. The set of all possible problem states. Rapid prototyping. The process of building a small-scale model of an application being developed.The scaled-down model is then expanded upon and modified as necessary until the final product results. Reliability. Testing an expert system to determine if it performs within an accepted level of consistency. Rule-based system. Any system that stores its knowledge as a set of production rules. Top-down knowledge acquisition. Any knowledge acquisition method that makes use of deductive reasoning. User interface. The component part of an expert system that communicates with the user. Validation. Testing an expert system to determine if it performs within an accepted level of accuracy. Verification. Testing an expert system to determine if it uses the same reasoning process as the expert(s) used to build the system. 12.7 Exercises Review Questions 1. Differentiate between the following terms: a. Top-down and bottom-up knowledge acquisition b. Forward and backward chaining c. Breadth-first and depth-first search d. Knowledge and Metaknowledge e. Knowledge-based system and expert system f. Validation and verification 2. List several heuristics that you use in your daily activities. 3. List two or more situations where you have used the nearest neighbor heuristic. 4. Which of the expert system application areas in Section 12.3 could also be approached with a data mining solution? 5. Here is a bottom-up knowledge acquisition approach for the example described in the second part of Section 12.4. Recall that the example dealt with building a rule-based system to choose a data mining tool. • Present data mining problems one at a time together with corresponding datasets to a data mining expert. • Ask the expert to verbalize how he or she selects a data mining technique for each problem. Attempt to determine how the expert chose a technique for each problem by recording and analyzing the verbalizations. • Meet with the expert as necessary to discuss and modify your conclusions about how the technique for each problem was chosen. Compare this approach to the one described by the original example.Which approach will require more time from start to finish in order to build a working system? Which method is more likely to give a best result? 6. Can the water jug problem be applied via backward chaining? Why or why not? Computational Questions 1. Use the rules in Table 12.2 to write the sequence of actions leading to a solution of the water jug problem by following a breadth-first search. 2. Three missionaries and three cannibals find themselves on one side of a river. They have agreed that they would like to get to the other side.The missionaries want to arrange the trip across the river so that the number of missionaries on either side of the river is never less than the number of cannibals who are on the same side. The only boat available holds a maximum of two people. Draw the state space for this problem making sure that the missionaries never find themselves at risk of being eaten. 3. The eight-puzzle is a square tray containing eight square tiles. Each tile is numbered.The remaining ninth tile is uncovered and represents a blank space. A tile that is adjacent to the blank space can be slid into that space.A game consists of a starting position and a goal position.The goal is to transform the starting position into the goal by sliding the tiles. Draw the state space for the eight-puzzle problem shown below. Can you think of any heuristics to help limit the state space search? (Hint: Think of each move as repositioning the blank space.) Start Goal 283 164 7 5 123 8 4 765 4. State space representations can be simplistic or quite complex. Consider the game of tic-tac-toe. Each state of the state space can be nicely represented as a 3 3 grid. However, even this simple problem results in a very large state space.Develop several heuristics that help you determine an appropriate move at any point within the state space. 5. Show how the production rules given in Table 12.3 are applied in a forward manner to conclude g is true. Assume a and c are true. 6. Provide the production rules for the subtree shown in Figure 12.11 representing the goal Decision Tree Is Selected. 7. Build a goal tree for one or more of the remaining federal tax dependency tests. 8. Build goal tree for one or more of the following problems: • Selecting a package delivery method • Fossil or tree identification • Deciding on which federal tax form to use (1040EZ, 1040A, 1040) • Determining someone’s astrological sign • Finding an apartment • Grading a coin • An expert dating service • Choosing a university to attend • Determining whether or not you can give blood • Determining if you can file an electronic federal tax return • Determining what type of pet would be best for you