Backward chaining Assume the same representation of rules as in forward chaining, i.e. If <antecedent 1> is true, <antecedent 2> is true, … <antecedent i> is true Then <consequent> is true. Rule interpretation starts with (i) an empty fact base, and (ii) a list of goals which the system tries to derive, and consists of the following steps: 1 Form a stack initially composed of all “top-level” goals. 2 Consider the first goal from the stack, and gather all of the rules capable of satisfying this goal. 3 For each of these rules, examine the rule’s premises: • If all premises are satisfied, execute the rule to infer its conclusion, and remove the satisfied goal from the stack. Backward chaining (cont.) If there is a premise which is not satisfied, look for rules by means of which this premise can be derived; if such rules exist, add the premise as a sub-goal on the top of the stack, and go to 2. • If no rule exists to satisfy the unknown premise, place a query to the user and add the supplied value to the fact base. If the premise cannot be satisfied, consider the next rule which has the initial goal as its conclusion. 4 If all rules that can satisfy the current goal have been attempted, and all failed, then the goal is unsatisfiable; remove the unsatisfiable goal from the stack and go to 2. If the stack is empty (i.e. all of the goals have been satisfied) stop. The fruit identification example Assuming that we do not have any information about the object that we are trying to recognize, let the “top-level” goal be (fruit = (? X)). Step1 Initial fact base: ( ) Initial stack of goals: ((fruit = (? X))) Step 2 Rules capable of satisfying this goal are: 1, 1A, 6, 7, 8, 9, 10, 11, 12, 13, 13A, 13B. Step 3 Consider Rule 1. Its first premise is (shape = long). There is no data in the FB matching this premise and no rule has (shape = (? Y)) as its conclusion. Therefore, a query is placed to the user to acquire for the shape of the fruit under consideration. Assume that the user replies that the fruit is round, i.e. the current FB becomes Current fact base: ((shape = round)). Rule 1 fails, and Rule 6 is examined next. The first premise of Rule 6 results in a new goal which is added at the beginning of the current stack of goals. Current stack of goals: ((fruitclass = (? Y)) (fruit = (? X))) The fruit identification example (cont.) There are three rules capable of satisfying the newly stated goal, namely 2, 2A, and 3.The first premise of Rule 2, (shape = round), matches a datum in the FB. The second premise leads to a new query regarding the diameter of the fruit. Assume that the answer is (diameter = 1 inch). Current fact base: ((shape = round)(diameter = 1 inch)). Rules 2 and 2A fail, and Rule 3 is examined next. It succeeds, thus a new conclusion, (fruitclass = tree) is added to the FB and the first goal is removed from the current stack of goals. Current fact base: ((shape = round)(diameter = 1 inch)(fruitclass = tree)). Now Rules 6, 7 and 8 fail, and Rule 9 is examined next. Its first premise succeeds, but its second premise places a new query regarding the color of the fruit. Assume that the user enters (color = red), which fails Rules 9 and 10. The first two premises of Rule 11 are satisfied, the third premise places a query regarding the seedclass. Rules 4 and 5 have conclusions (seedclass = (? Z)), which becomes a new subgoal. Rule 4 only premise (seedcouunt = 1) cannot be dirived by any rule --> place a query to the user about the seedcount and assume that the answer is (seedcouunt = 1). Rule 4 succeeds and (seedclass = stonefruit), is added to the FB resulting in all premisses of Rule 11 to hold and its conclusion (fruit = cherry) is added to the FB. Note that this proves our top-level goal. Current fact base: ((shape = round)(diameter = 1 inch)(fruitclass = tree)(color = red)(seedcount = 1)(seedclass = stonefruit)(fruit = cherry)). Current stack of goals: () Step 4 Stop, no more goals remain to be proved. Mixed modes of chaining Consider the following set of rules: Rule 1: F & H => K Rule 2: E & A => K Rule 3: E & B => H Rule 4: A & G => B Rule 5: B & D => H Rule 6: G & D => E Rule 7: A & B => D Rule 8: A & C => G Backward chaining rules Forward chaining rules Assume that A and C are the only known facts, and we want to infer K (the only top-level goal). Because the set of rules is divided into two subsets, backward and forward chaining rules, the are two ways in which these rules can be applied: 1 Forward chaining rules have a higher priority, thus they are the first ones to be tried. 2 Backward chaining rules have a higher priority. Priority to forward chaining rules (example cont.) Assuming that the forward chaining rules have a higher priority, the inference process is carried out as follows, given FB: (A, C) and goal K. Step 1: Attempt to fire only forward chaining rules for as long as possible. Rule 8 fires thus the current FB becomes (A, C, G) Rule 4 fires thus the current FB becomes (A, C, G, B) Rule 7 fires thus the current FB becomes (A, C, G, B, D) Rule 5 fires thus the current FB becomes (A, C, G, B, D, H) Rule 6 fires thus the current FB becomes (A, C, G, B, D, H, E) Step 2: If no more forward chaining rules can fire, and the goal has not been derived, proceed with the backward chaining rules. Rule 1 is attempted, but fails. Rule 2 succeeds, thus K is proven. Priority to backward chaining rules (example cont.) Assuming that the backward chaining rules have a higher priority, the inference process is carried out as follows: Step 1: Search for a backward chaining rule which has K as its conclusion. Rule 1 is such a rule, which will fire only if F and H are satisfied. Therefore, F and H become new goals. Step 1A: Search for a backward chaining rule whose conclusion is F. There is no such a rule, therefore this goal fail. Step 1B: Search for a backward chaining rule whose conclusion is H. Rule 3 is such a rule, and it in turn creates two new goals, E and B. However, there are no backward chaining rules whose conclusions are F, E or B, which is why the forward chaining rules must be activated next. Priority to backward chaining rules (example cont.) Step 2: Given the fact base (A, C), forward chaining rules will fire in the following order: Rule 8 fires thus the current FB becomes (A, C, G). Rule 4 fires thus the current FB becomes (A, C, G, B). The derivation of B satisfies one of the goals on the stack of goals. The remaining goals are F and E. Rule 7 fires thus the current FB becomes (A, C, G, B, D). Rule 5 fires, thus the current FB becomes (A, C, G, B, D, H). Rule 6 fires, thus the current FB becomes (A, C, G, B, D, H, E). The derivation of E satisfies one of the current goals, which leaves F as the only goal that remains to be proved. However, no more forward chaining rules can fire meaning that F cannot be proved by either forward chaining or backward chaining rules. Step 3: Because our top-level goal was to prove K, we look for another backward chaining rule, which has K as its conclusion. Rule 2 is such a rule, and it can fire because both of its premises are satisfied at this point, thus proving our original goal K. Completeness of the chaining algorithms Consider the following set of sentences: x hungry(x) => likes(x, apple) x ¬hungry(x) => likes(x, grapes) x likes(x,apple) => likes(x, fruits) x likes(x,grapes) => likes(x, fruits) Assume that we want to prove likes(Bob, fruits). Obviously, this is true if likes(Bob, apple) v likes(Bob, grapes) is true, which is always true, because the first disjunct depends on hungry(Bob) and the second disjunct depends on ¬hungry(Bob). None of the chaining algorithms, however, will allow us to infer likes(Bob, fruits). The reason is that x ¬hungry(x) => likes(x, grapes) is not a Horn formula. Chaining algorithms, which use the generalized MP as the only inference rule, are incomplete for non-Horn KBs. Why do we need a stronger inference rule? Consider the following example: Bob wants to take CS 501 next semester. This class will meet either MW 6:45 -- 8:00, or TR 6:45 -- 8:00. Bob has to be at his soccer sessions MTF 5:30 -- 8:30. Can he take CS 501? Initial KB: MW(CS501, 645--800) v TR(CS501, 645--800) MW(CS501, 645--800) & Busy(Bob, M, 530--830) => nogood-class(Bob) TR(CS501, 645--800) & Busy(Bob, T, 530--830) => nogood-class(Bob) Busy(Bob, M, 530--830) Busy(Bob, T, 530--830) Possible inferences: MW(CS501, 645--800) => nogood-class(Bob) …. (A) TR(CS501, 645--800) => nogood-class(Bob) ….. (B) The resolution rule can help We can draw more inferences if we look at MW(CS501, 645--800) v TR(CS501, 645--800) as describing two different cases: • Case 1: MW(CS501, 645--800) is true, in which case nogood-class(Bob) is true by means of (A). • Case 2: TR(CS501, 645--800) is true, in which case nogood-class(Bob) is true by means of (B). The answer to the initial query is derived no matter which is the right case. This type of reasoning is called case analysis, and it can be carried out by means of the resolution rule as follows: ¬MW(CS501, 645--800) v nogood-class(Bob) MW(CS501, 645--800) v TR(CS501, 645--800) nogood-class(Bob) v TR(CS501, 645--800) ¬TR(CS501, 645--800) v nogood-class(Bob) nogood-class(Bob) v nogood-class(Bob) nogood-class(Bob) The resolution rule revisited Recall the resolution rule for propositional logic: (A v B, ¬B v C A v C) (¬A => B, B => C ¬A => C) There are two different ways to interpret this rule: 1 As describing 2 cases, namely • B is true, ¬B is false, in which case C is true. • ¬B is true, B is false, in which case A is true. In both cases, A v C is true. 2 Because the implication operation is transitive, the resolution rule let us link the premise of one implication with the conclusion of the second implication, ignoring intermediate sentence B, i.e. let us derive a new implication. None of these can be done with MP, because MP derives only atomic conclusions. Consider the following propositional version of generalized MP: A1 & A2 & … & Am => B D1 & D2 & … & Dn => C From these two formulas, we can infer the following one making use of the monotonicity of the PL: A1 & A2 & … & Am & D1 & D2 & … & Dn => B Assume now that B = B1 v B2 v … v Bk, and C = Ai A1 & A2 & … & Am => B1 v B2 v … v Bk D1 & D2 & … & Dn => C A1 & A2 & …& A(i-1) & D1 & D2 & … & Dn & A(i+1) & … & Am => B1 v B2 v … v Bk Consider the following cases: 1 If A’s hold, then at least one B holds. 2 If m = 0, then our formula degenerates to a form B1 v B2 v … v Bk 3 If k = 1, then our formula has the form A1 & A2 & … & Am => B1 4 If k = 0, then A1 & A2 & … & Am => False, which is equivalent to ¬(A1&A2&…&Am), which is in turn equivalent to ¬A1 v … v ¬Am. Note that at the same time m = 1, then we can represent negated formulas such as ¬student(Bob), or in its equivalent form, student(Bob) => False. If m = 0, then we have True => False, which represents a contradiction. That is, formulas of the form A1 & A2 & … & Am => B1 v … v Bk are general enough to represent any logical formula. If a KB is comprised of only formulas of this type, we say that it is in a normal form. We need a generalized version of the resolution rule to work with such KBs. The generalized resolution rule: definition Generalized resolution is the following rule of inference: A1 & A2 & … & Am => B1 v B2 v … v Bk D1 & D2 & … & Dn => C1 v C2 v … v Cx A1 & … & Ai & … & Am & D1 & … & Dn => B1 v … v Bk v C1 v… v Cj v… v Cx Here Ai = Cj, which is why we can ignore them from the l.h.s. and r.h.s. of the implication, respectively. Here is the alternative version of generalized resolution: A1 v A2 v … v Ai v … v Ap C1 v C2 v … v Cj v … v Ck A1 v A2 v … v Ai v… v Ap v C1 v… v Cj v… v Cx Here Ai = ¬Cj, which is why we can ignore them both. Proving formulas by the resolution rule If we have a FOL KB, then Ai and Cj (in the first case, or ¬Cj in the second case), will be the same if there is a substitution such that subst(, Ai) = subst(, Cj) (or subst(, Ai) = subst(, ¬Cj), respectively). Assuming that all formulas in the KB are in a normal form, we can apply the resolution rule in forward or backward chaining algorithms. Example: Consider the following KB ¬P(w) v Q(w), P(x) v R(x), ¬Q(y) v S(y), ¬R(z) v S(z). Using forward chaining, the following conclusions can be derived: ¬P(w) v Q(w) ¬Q(y) v S(y) {y / w} ¬P(w) v S(w) P(x) v R(x) {w / x} S(x) v R(x) ¬R(z) v S(z). {x /A, z /A} S(A) v S(A) S(A) These are called resolvents. Completeness of the chaining process with the resolution rule Chaining with the resolution rule is still an incomplete inference procedure. To see why, consider an empty KB from which you want to derive P v ¬P. Note that this is a valid formula, therefore it follows from any KB including the empty KB. However, using only the resolution rule, we cannot prove it. Assume that we add ¬(P v ¬P) (¬P & P) (¬P, P). Adding a negation of a valid formula to the exiting KB introduces a contradiction in that KB. If we can prove that KB & ¬A => False, where KB |= A, then we prove that KB |-- A. In the example above { P , ¬P} => Nil, which proves P v ¬P. The refutation method The inference procedure that proves a formula by showing that its negation, if added to the KB, leads to a contradiction is called refutation. The following procedure implements the refutation method: 1 Negate the theorem to be proved, and add the result to the set of axioms. 2 Put a list of axioms into a normal form 3 Until there is no resolvable pair of clauses do: a Find resolvable clauses, and resolve them. b Add the result of the resolution to the list of clauses. c If Nil is produced, stop (the theorem has been proved by refutation). 4 Stop (the theorem is false). The refutation method is a complete inference procedure. Example Consider the following set of axioms: ¬hungry(w) v likes(w, apple) hungry(x) v likes(x, grapes) ¬likes(y,apple) v likes(y, fruits) ¬likes(z,grapes) v likes(z, fruits) Assume that we want to prove likes(Bob, fruits). Therefore, we must add ¬likes(Bob, fruits) to the set of axioms. ¬hungry(w) v likes(w, apple) ¬likes(y,apple) v likes(y, fruits) {y / w} ¬hungry(w) v likes(w, fruits) hungry(x) v likes(x, grapes) {w / x} likes(x, fruits) v likes(x, grapes) ¬likes(z,grapes) v likes(z, fruits) {z / x} likes(x, fruits) ¬likes(Bob, fruits) {x / Bob} Nil Problems with the resolution rule The resolution proof is exponential, which is why we must use some strategies to direct it. The following two ideas can help here: 1 Every resolution involves the negated theorem or a derived clause which has used the negated theorem directly or indirectly. 2 Always remember what your goal is, so that given what you currently have, you can find the difference, and using your intuition try to reduce this difference to get closer to the goal. This ideas can be implemented as resolution strategies working at a metalogical level. Among the most popular strategies are the following: 1 Unit preference. Always prefer a single literal when doing resolution. This strategy was found efficient only for relatively small problems. 2 Set of support. The current resolution involves the negated theorem or the new clauses directly or indirectly involving it. The set of such clauses plus the theorem are called the support set. Initially, the support set involves only the negated theorem. Resolution strategies (cont.) 3 The breadth-first strategy. First resolve all possible pairs of initial clauses, then resolve all possible pairs of the resulting set together with the initial set, and so on. 4 Input resolution. Every resolution uses one of the initial clauses or the negated theorem. If we allow resolutions to use also clauses where one clause is an ancestor or another clause, we have the strategy called linear resolution. 5 Subsumption. Eliminate all sentences subsumed by other sentences. For example, if P(x) KB, then P(A) will not be added if inferred, because it is subsumed by P(x). Note that in order to apply the refutation method, we must first convert the initial set of sentences into a normal form.