CaMeL: Learning Method Preconditions for HTN Planning H´ector Mu ˜noz-Avila

CaMeL: Learning Method Preconditions for HTN Planning Okhtay Ilghami & Dana S. Nau Héctor Muñoz-Avila David W. Aha Department of Computer Science University of Maryland College Park, MD 20742-3255 {okhtay,nau}@cs.umd.edu Dept. of Comp. Sci. and Eng. Lehigh University Bethlehem, PA 18015 munoz@cse.lehigh.edu Navy Center for Applied Research in AI Naval Research Laboratory (Code 5515) Washington, DC 20375 aha@aic.nrl.navy.mil Abstract A great challenge in using any planning system to solve real-world problems is the difficulty of acquiring the domain knowledge that the system will need. We present a way to address part of this problem, in the context of Hierarchical Task Network (HTN) planning, by having the planning system incrementally learn conditions for HTN methods under expert supervision. We present a general formal framework for learning HTN methods, and a supervised learning algorithm, named CaMeL, based on this formalism. We present theoretical results about CaMeL’s soundness, completeness, and convergence properties. We also report experimental results about its speed of convergence under different conditions. The experimental results suggest that CaMeL has the potential to be useful in real-world applications. Introduction A great challenge in using any planning system to solve real-world problems is the difficulty of acquiring the domain knowledge and the associated control rules (i.e., rules that help the planner to search the search space efficiently) that abstract the real-world domain. One way to address this issue is to design the planning system to learn the constituents of the planning domain and the associated control rules. This requires the system to be supervised by a domain expert who solves instances of the problems in that domain. This will result in a supervised learning process. In this paper, we discuss a supervised incremental learning algorithm in a Hierarchical Task Network (HTN) planning context. In recent years, several researchers have reported work on the HTN planning formalism and its applications (Wilkins 1990; Currie & Tate 1991; Erol, Hendler, & Nau 1994). The hierarchical semantics of this kind of planning gives us the ability to model planning problems in domains that are naturally hierarchical. A good example is planning in military environments, where conventional linear STRIPSstyle planners (Fikes & Nilsson 1971) cannot be exploited to abstract the planning problems accurately. An example of using HTN planning in such environments is a system called HICAP (Muñoz-Avila et al. 1999), which has been used to c 2002, American Association for Artificial IntelliCopyright gence (www.aaai.org). All rights reserved. assist with the authoring of plans for noncombatant evacuation operations (NEOs). To support plan authoring, HICAP integrates the SHOP hierarchical planner (Nau et al. 1999) together with a case-based reasoning (CBR) system named NaCoDAE (Aha & Breslow 1997). As with any incremental learning problem, there are at least two approaches that one might consider for learning HTN methods. First, a lazy (e.g., CBR) approach can be used to directly replay plans previously generated by the human expert. It assumes that plans which were successfully used in situations similar to the current situation are likely to work now. Second, an eager approach can be used to induce methods that could be used to mimic human expertise. In either approach, adding new training samples, which represent human expert activities while solving an HTN planning problem, is expected to yield better approximations of the domain. However, due to the complexity of the semantics of HTN planning, one should carefully define the inputs and outputs of the learning algorithm and what learning means in this context. In this paper, we adopt an eager approach. We also introduce a theoretical basis for formally defining algorithms that learn preconditions for HTN methods. This formalism models situations when: • General information is available concerning the possible decompositions of tasks into subtasks, but without sufficient details to determine the conditions under which each decomposition will succeed. • Plan traces, known to be successful or unsuccessful, are available for certain problem instances. Such situations occur in several important practical domains, such as the domain of NEOs(DoD 1994; Lamber 1992), in which military doctrine provides the planner with general information about how to conduct a NEO mission, but does not specify details. We also introduce CaMeL (Candidate Elimination Method Learner), an algorithm that instantiates this formalism. We state theorems about CaMeL’s soundness, completeness, and convergence properties. Our experimental results reveal the speed with which CaMeL converges in different situations and suggest that CaMeL has the potential for use in deployed systems. AIPS 2002 131 Hierarchical Task Network Planning In an HTN planning system, instead of having traditional STRIPS-style operators with delete and add lists used to achieve goal predicates (Fikes & Nilsson 1971), the main goal of planning is to accomplish a list of given tasks. Each task can be decomposed into several subtasks using predefined methods. Each possible decomposition represents a new branch in the search space of the problem. At the bottom level of this hierarchy lie primitive tasks, whose actions can be executed using an atomic operator. In summary, the plan still consists of a list of instantiations of operators, partially ordered in some planners and fully ordered in other ones, but the correctness definition of the plan differs. In traditional planning, a plan is correct if it is executable, and the goal state is a subset of the world state after the plan’s execution (i.e., each goal atom is achieved by some operator in the plan). In HTN planning, a plan is correct if it is executable in the initial state of the world, and it achieves the task list that is given as an input in the planning problem using the methods defined as a part of the planning domain. In other words, the main focus of an HTN planner is task decomposition, while a traditional planner focuses on achieving the desired state. In this paper we use a form of HTN planning called Ordered Task Decomposition (Nau et al. 1999) in which, at each point in the planning process, the planner has a totally ordered list of tasks to accomplish. An HTN domain is a triple (T, M, O) where: • T is a list of tasks. Each task has a name and zero or more arguments, each of which is either a variable symbol1 or a constant symbol. Each task can be either primitive or non-primitive. A primitive task represents a concrete action, while a non-primitive task must be decomposed into simpler subtasks. • M is a collection of methods, each of which is a triple m = (N T, DEC, P ), where N T is a non-primitive task, DEC is a totally-ordered list of tasks called a decomposition of N T , and P (the set of preconditions) is a boolean formula of first-order predicate calculus. Every free variable in P must appear in the argument list of N T , and every variable in DEC must appear either in the argument list of N T or somewhere in P . We will assume that each method m can be uniquely identified by its first two parts, N T and DEC (i.e., there will be no two different methods (N T, DEC, P ) and (N T, DEC, P ) such that P = P ). • O is a collection of operators, each of which is a triple o = (P T, DEL, ADD), where P T is a primitive task, and DEL and ADD are the sets of logical atoms that will be respectively deleted from and added to the world state when the operator is executed. All variables in DEL and ADD must appear in the argument list of P T . We also assume that, for each primitive task t ∈ T , there is at most one operator (P T, DEL, ADD) such that t unifies with P T (i.e., each primitive task can be performed in at most one way). 1 We denote variable symbols by names that begin with question marks, such as ?x. 132 AIPS 2002 An HTN planning problem is a triple (I, S, D), where I (the initial task list) is a totally ordered list of ground instances of tasks (i.e., members of T ) to be performed, S (the initial state of the world) is a set of ground logical atoms, and D = (T, M, O) is an HTN domain. A binding θ is a set of pairs (V, C), where V is a variable symbol and C is a constant symbol. In other words, θ is a substitution that replaces variables with constants. The result of applying a binding θ to an expression e is denoted by eθ. The inverse of the binding θ, which replaces every constant C with its corresponding variable V (in θ) is denoted by θ−1 . Let m = (N T, DEC, P ) be a method. Then m̄ = (N T, DEC) is called the incomplete version of m. Similarly, if mθ = ((N T )θ, (DEC)θ, P θ) is an instance of a method m, then the incomplete version of mθ is m̄θ = ((N T )θ, (DEC)θ). We assume that no two different methods (N T, DEC, P ) and (N T, DEC, P ) can exist such that P = P . Therefore, there is a one-to-one correspondence between a set of methods {m1 , . . . , mk } and their incomplete versions {m̄1 , . . . , m̄k }, allowing each incomplete version m̄i to represent mi . Let D = (T, M, O) be an HTN domain. Then, if M̄ denotes the set of incomplete versions of all methods in M , D̄ = (T, M̄ , O) is defined to be the incomplete version of D. A method (N T, DEC, P ) is applicable in a state S if there is a binding θ that binds all variables in N T ’s argument list such that P θ is satisfied in S. A partial solution tree for a ground instance g of a task in an HTN domain D = (T, M, O) is an ordered tree τ having the following properties: • τ ’s root is g; • Each node of τ is a ground instance of a task; • For each non-leaf node N of τ , there is a method m = (N T, DEC, P ) and a binding θ such that N = (N T )θ and (C1 , . . . , Ck ) = (DEC)θ, where (C1 , . . . , Ck ) is the list of the children of N . mθ = ((N T )θ, (DEC)θ, P θ) is called the method instance for N . A partial solution forest for an HTN planning problem (I, S, D) is a totally ordered list (τ1 , . . . , τk ) of partial solution trees, one for each member of I. In a partial solution forest F = (τ1 , . . . , τk ), a node u occurs before a node v in each of the following cases: • u and v are in different trees τi and τj , respectively, where i < j. • There is a node w such that u is the i’th child of w and v is the j’th child of w, where i < j. • u and v have ancestors u and v , respectively, such that u occurs before v . (Note that we may have u = u and/or v = v.) Given these definitions, it follows that the leaves of a partial solution forest are totally ordered by the “occurs before” relation. A solution for a planning problem (I, S, D) is a partial solution forest F for (I, S, D) having the following properties: • For every leaf node L of F , there is an operator o = (P T, DEL, ADD) and a binding θ such that L = (P T )θ. oθ is called the operator instance for L. • For every non-leaf node N of F , let m = (N T, DEC, P ) be the method instance for N . Let (L1 , . . . , Lk ) be the totally ordered list of leaf nodes that occur before N , and let o1 , . . . , ok , respectively, be the operator instances for these nodes. Then P is satisfied in the state produced by starting with S and applying o1 , . . . , ok sequentially. Two HTN domains D1 and D2 are equivalent if and only if, for any arbitrary task list I and state of the world S, the planning problems (I, S, D1 ) and (I, S, D2 ) have exactly the same set of possible solution forests. For example, let T = {N T1 (?x), N T2 (?x), P T (?x)} be a set of two non-primitive tasks and one primitive task, let o = (P T (?x), {}, {}) be an operator with empty add and delete lists that achieves the primitive task P T (?x), let m1 = (N T1 (?x), (N T2 (?x)), a(?x)) and m1 = (N T1 (?x), (N T2 (?x)), a(?x) ∧ b(?x)) be two methods that can decompose the non-primitive task N T1 (?x) into N T2 (?x), and let m2 = (N T2 (?x), (P T (?x)), b(?x)) be a method for decomposing the non-primitive task N T2 (?x) into the primitive task P T (?x). Then the domains D1 = (T, {m1 , m2 }, {o}) and D2 = (T, {m1 , m2 }, {o}) are equivalent. Inputs to the Learning Algorithm Motivation For supervised learning of domains (either in an actionbased or an HTN planning environment), two possible forms of input are: • A set of previously generated plans. These plans can be generated in several ways (e.g., by a human expert in that domain). The learning process consists of extracting domain information from these plans. • A collection of plan traces, which contain not only the correct solution for a planning problem, but also information about inferences derived and decisions made while this plan was generated (e.g., by a human expert). The second form of input is preferable because it will result in faster and more accurate learning; plan traces contain much more information about the domain being learned than plans. For this reason, most previous related work has used the second form of input. For example, in PRODIGY, a system that uses learning techniques in an action-based planning context, derivational traces are used in the learning process (Veloso & Carbonell 1993). These traces contain information about the planner’s internal right and wrong decisions in addition to the final solution. In this paper, we also use this second form of input, with appropriate adaptations for use in an HTN-planning environment rather than an action-based planning environment. In addition to its efficiency advantages, this form is well suited to our other goals. Ultimately, we want to develop a learning mechanism that can be used for HICAP (Muñoz-Avila et al. 1999), an interactive plan authoring system for HTN plans that allows manual editing of the plans by the user. We intend to soon develop a supervisor module on top of HICAP that tracks a user’s edits during plan authoring. This sequence of edits corresponds roughly to the plan traces that we use as input to CaMeL. In addition to the plan traces, we will assume that the input to the learning algorithm includes the incomplete version of the domain. By definition, the incomplete version includes the operator definitions, which seems reasonable given that operators usually denote concrete/simple actions with obvious effects. Definitions We are now ready to formally define plan traces and our inputs. A plan trace Π consists of (1) a solution to an HTN planning problem and (2) for each internal node N in the solution forest, an incomplete version of all instances of methods that were applicable. This will obviously include the instance of the method that was actually used to decompose N. The inputs for an HTN method learning algorithm consist of: • The incomplete version D̄ of a domain D. • A set of n HTN planning problems {(Ij , Sj , D)}, where 1 ≤ j ≤ n. • A plan trace Πj for each of these problems. Outputs of the Learning Algorithm One of the challenges in any learning algorithm is how to define a criterion to evaluate its output. Often, there is not enough information in the input or there are not enough training samples to derive an optimal output. Therefore, learning algorithms may return a set of candidate answers instead of a single answer. This phenomenon can affect the definition of soundness and completeness, which play a crucial role in evaluating outputs in a planning context. We will assume that every constituent of the domains with which we are dealing is deterministic and training samples are not noisy. Therefore, there may be two reasons why a complete and correct definition of a method cannot be induced: • Lack of knowledge about the structure of the domain: For example, suppose that a learning algorithm does not know whether the preconditions of methods being learned consist of only conjunctions, or whether disjunctions are also allowed. Suppose this learning algorithm is trying to learn a method m that decomposes the task move(?t, ?s, ?d), and assume that the algorithm’s inputs include two instances of m: an instance to decompose the task instance move(truck1 , city1 , city2 ) and the other one to decompose the task instance move(truck2 , city3 , city4 ). Suppose also that these two methods are applied, respectively, in the following world states: {truck(truck1 ), at(truck1 , city1 ), color(truck1 , blue)} {truck(truck2 ), at(truck2 , city3 ), color(truck2 , red)} AIPS 2002 133 If we give these two instances of m to an HTN method learner, then both truck(?t) ∧ at(?t, ?s) and truck(?t)∧at(?t, ?s)∧(color(?t, blue)∨color(?t, red)) can be preconditions for method m, and the learner has no way to tell which one is the precondition of m if the learner doesn’t know that only conjunctions are allowed. • Insufficient coverage of the domain information by training samples: Consider another example, with the same method as in last example, and three instances: move(truck1 , city1 , city2 ), move(truck2 , city3 , city4 ) and move(truck3 , city5 , city6 ). The states of the world in which these three instances are applied are respectively: {truck(truck1 ), at(truck1 , city1 ), color(truck1 , blue)} {truck(truck2 ), at(truck2 , city3 ), color(truck2 , blue)} {truck(truck3 ), at(truck3 , city5 ), color(truck3 , red)} Suppose that the algorithm knows the preconditions of the methods in this specific domain contain only conjunctions. Knowing this, if the algorithm only encounters the first two examples, it will not be able to induce that color(?t, blue) is not in the preconditions of this method, no matter how well it works, simply because there is not enough information in the given input to infer this. The best the algorithm can do is to say that the precondition of this method is a generalization2 of {truck(?t), at(?t, ?s), color(?t, blue)}. When given the third training sample, the algorithm will be able to determine that color(?t, blue) is not part of the method’s precondition. In this paper, we will assume that the exact form of the preconditions of methods for the target domain is known a priori. Thus, the only reason that the exact method preconditions cannot be induced will be due to insufficient training. Before we can formally present soundness and completeness definitions in the context of HTN method learning, we need to define consistent answers: A domain D1 is consistent with another domain D2 with respect to a set of pairs (Ij , Sj ) of a task list Ij and a state of the world Sj if and only if, for every j, the plan traces for HTN planning problems (Ij , Sj , D1 ) and (Ij , Sj , D2 ) are exactly the same. This definition says that although there may be differences in the methods and/or operators in D1 and D2 when solving problems relative to the task list Ij and state Sj for each j, the resulting traces are identical. 2 A conjunction C1 of logical atoms is more general than another conjunction C2 of logical atoms if and only if the set of logical atoms appearing in C1 is a subset of the set of logical atoms appearing in C2 . 134 AIPS 2002 Using the above definition, we can now formally define soundness and completeness for HTN method learning algorithms. In order to simplify these definitions, we will assume that output of an HTN method learner is a set of HTN domains rather than a set of possible method preconditions. This is a reasonable assumption because the incomplete version of the domain is given to the learning algorithm as input, and the algorithm induces method preconditions for that domain. These preconditions are the only missing part in the domain definition. Therefore, after inducing these preconditions, the learner can build a complete HTN domain from the incomplete domain given as input. The definition of soundness is as follows: Consider an HTN method learning algorithm whose inputs are • D̄, the incomplete version of a domain D; • A set of HTN planning problems {(Ij , Sj , D)}, where 1 ≤ j ≤ n; • A plan trace Πj for each of these HTN planning problems. An HTN method learning algorithm is sound if, whenever it returns a set of HTN domains, each of them is consistent with D with respect to the set of all (Ij , Sj ) pairs. Consider an HTN method learner whose inputs are the three listed above. Then this algorithm is complete if, for every domain D that is consistent with D with respect to the set of all (Ij , Sj ) pairs, the algorithm’s answer includes a domain that is equivalent to D . Another useful notion is convergence. Intuitively, it tells us whether an algorithm is expected to find a final answer in finite time. An algorithm converges to the correct answer in a domain D that satisfies our restrictions and assumptions, if and only if it is given a finite set of plan traces for the HTN planning problems (Ij , Sj , D) as input, and it terminates and outputs a set of HTN domains, each of which is consistent with D with respect to the set of all possible pairs of an initial task list and a world state. Apparently, all of the domains in the output set of a method learner that has already converged must be equivalent to each other. Algorithm Implementation Motivation The main goal of learning HTN methods is to be able to generate plans for new planning problems (or queries). This ability will be obtained by learning how to plan using a set of previously generated plans or plan traces. In the machine learning literature, two entirely different kinds of learning, namely lazy learning and eager learning are discussed. In the purest form of lazy learning, training samples are simply stored. At query time, the algorithm compares the query instance with its recorded training samples. Thus, learning time is minimized while query time can be high, especially if no attempt is made to prune the number of stored training samples. On the other hand, eager learners induce an abstract concept during the training process. At query time, this concept, rather than the training samples themselves, are used to answer the query. Thus, learning time is higher than for purely lazy algorithms, while query time is usually lower. In the context of learning method preconditions, lazy learning has been done using CBR, which involves locating those training samples that are most similar to the planning problem given as the query (Veloso, Muñoz-Avila, & Bergmann 1996; Hanney & Keane 1996; Lenz et al. 1998). This problem is then solved by adapting the solutions stored in the retrieved training samples. Our focus in this paper is on eager learning. Some of the advantages of using eager learning in our context are: • Less query time: Lazy learning is useful when the number of training samples and frequency of query arrival is small. However, when these numbers are large, finding the most similar training samples can be time consuming (i.e., assuming that no smart indexing method is used). In such situations, eager learners are preferable. • Reduced knowledge base size: Once the methods are learned, the system can discard the training samples and use the induced methods to solve new planning problems. In other words, the learned methods will act as a compact summary of the training cases. In contrast, purely lazy approaches require that, for every new planning problem, all previously seen examples must be revisited and therefore must be stored in the algorithm’s knowledge base. Also, although several algorithms exist for significantly reducing storage requirements for case-based classifiers, they do not yet exist for case-based HTN planners. • Easier plan generation: If we learn methods completely, then the process of generating a plan for a new planning problem will be much easier. This requires inducing methods for a hierarchical planner so that it can automatically generate plans for new planning problems. In contrast, a case-based planner must dynamically decide, for each new problem, which stored case or combination of cases to apply. As mentioned earlier, HTN method learning algorithms may not be given enough information in the training set to derive a single exact domain as output. Therefore, the algorithm may return a set of domains. In these situations, a policy is needed to decide which possible domains should be output by the algorithm. Two extreme policies are: • The minimal policy: Any possible domain is added to the output set if and only if there is enough evidence in the input to prove that this domain must be in the output. • The maximal policy: Any possible domain is added to the output set if and only if there is not enough evidence in the input to prove that this domain must not be in the output. For example, suppose that an HTN method learner is trying to learn the preconditions of a method m that is used to decompose the task move(?t, ?s, ?d). Consider the following two cases: 1. Suppose the algorithm is told that there was an instance move(truck1 , city1 , city2 ) of method m that was applied when the world state was {truck(truck1 ), at(truck1 , city1 )}. Using this training sample (which is a positive sample, since it tells us when a method is applicable), a minimal approach will yield: “The method m can be applied whenever {truck(?t) ∧ at(?t, ?s)} is true in the world state.” However, a maximal approach will yield “Method m is always applicable.” This is because the algorithm has no way to prove that method m is not indeed applicable in some cases, according to its input. 2. Suppose the algorithm is told that there was a world state where {truck(truck1 ), at(truck1 , city2 )} was true, but m was not applicable to decompose the task move(truck1 , city1 , city2 ). (Recall that algorithm’s inputs are plan traces, so it knows exactly at each state which methods are applicable to decompose a task.) If this negative training sample is the only input to the learning algorithm, a minimal approach will answer “Method m is never applicable” because no examples have been given that can prove that m is applicable in some cases. However, a maximal policy will answer with “Method m is applicable whenever the expression ¬(truck(?t) ∧ at(?t, ?d)) is satisfied in the world state.” The minimal policy yields a sound algorithm while the maximal policy yields a complete algorithm. However, both of these extreme approaches perform poorly and neither is both sound and complete. This is because there may be possible domains whose existence in the output set cannot be proved or disproved using the current input, simply because there is not enough information in the input to do so and more training samples are required. These possible domains are discarded in the minimal view and added to the output set in the maximal view. One way to obtain better performance is to track the possible domains that cannot be proved or disproved so that they can be assessed in the future, after more training samples are acquired. Thus, we need an algorithm for tracking possible domains, while preferably maintaining its soundness and completeness. Candidate Elimination Candidate elimination is a well-known machine learning algorithm introduced in (Mitchell 1977). Several extensions of the original algorithm have been proposed (Hirsh 1994; Sebag 1995; Hirsh, Mishra, & Pitt 1997). Candidate elimination is based on the concept of a version space, the set of possible explanations of the concept that is being learned. This concept is represented by two sets: a set G of maximally general possible predicates to explain the concept, and a set S of maximally specific possible such predicates. Every concept between these two borders is a member of the version space, and is a possible explanation of the concept being learned. If enough training samples are given, the version space converges to a single answer (i.e., sets S and G become equivalent). It is also possible that the version space can collapse if the training samples are inconsistent. In our context, using candidate elimination has two main advantages. First, the resulting algorithm is sound and complete if our output is the set of all possible domains, where each method’s set of preconditions can be any member of its corresponding version space. Second, candidate elimination is an incremental algorithm: whenever the algorithm AIPS 2002 135 acquires a new training sample, it just updates its version spaces and discards that training sample afterwards. There is no need to keep the training samples. Candidate elimination is indeed a very general algorithm. However, to apply this algorithm for a specific application, a generalization topology (a lattice) on the set of possible concepts must be defined (i.e., the generalization/specialization relation, along with the top and bottom of the lattice). In our algorithm CaMel, every method has a corresponding version space, and each member of a method’s version space is a possible precondition for that method. CaMeL requires both negative and positive examples. The concept of a “negative” example in a planning context may not be clear. However, for our context, negative examples can be generated easily. In our definitions, the input to a method learner includes plan traces, each of which lists all applicable method instances that can be used to decompose a task instance in a specific world state. Therefore, if other methods can be used to decompose this task (i.e., for some other world states), we can infer that they were not applicable in those specific world states, and can hence serve as negative examples. CaMeL: An HTN Method Learner Based On Candidate Elimination Before detailing CaMeL, we need to introduce the notion of normalization. Suppose that an HTN method learner is trying to learn the preconditions of a method m that is used to decompose the task move(?t, ?s, ?d). Assume that two instances of this method are given. The first instance is used to decompose the task m1 = move(truck1 , city1 , city2 ) in the state S1 = {truck(truck1 ), at(truck1 , city1 )}, while the second instance is used to decompose the task m2 = move(truck2 , city3 , city4 ) in the state S2 = {truck(truck2 ), at(truck2 , city3 )}. Apparently, these two training samples contain the same piece of information. Intuitively, it is: “You can move an object ?t from any starting location ?s to any destination ?d if ?t is a truck and it is initially at ?s.” But how is a learning algorithm supposed to derive such a general statement from such a specific example? This statement contains three variables, while the facts are about specific constants such as truck1 , city1 , and city2 . A generalization process is required that changes these constants to variables. This is roughly what happens during a normalization process. Consider a ground instance mθ of a method m that decomposes task t for world state S. Then, Sθ−1 is called a normalization of S with respect to mθ, and θ−1 is called a normalizer for S. For the above example, both the normalization of S1 with respect to m1 and the normalization of S2 with respect to m2 yield {truck(?t), at(?t, ?s)}. The normalization process replaces constants in different training examples that play the same role (e.g., truck1 and truck2 ) with a variable (e.g., ?t) in order to generalize the facts that are given as input. Fact generalization is indeed a basic strategy in most eager learning algorithms. The pseudo-code of our algorithm, CaMeL, is given in Figure 1. The algorithm subroutines are as follows: 136 AIPS 2002 Given: D̄ = (T, M̄ , O), the incomplete version of D = (T, M, O) I = {I1 , . . . , In }, a set of task lists S = {S1 , . . . , Sn }, a set of world states Π = {Π1 , . . . , Πn }, a set of plan traces, one per planning problem (Ii ,Si ,D) CaMeL(D̄,I,S,Π) = FOR each method m̄j ∈ M̄ Initialize a version space V Sj FOR each plan trace Πi ∈ Π FOR each method m̄j ∈ M̄ inst = LocateInstances(Πi ,m̄j ) FOR each (loc,θ,np)∈ inst Si = ComputeState(D̄,Si ,Πi ,loc) V Sj = CE(Normalize(Si ,θ),V Sj ,np) IF Converged(V S1 ,...,V Sm ) RETURN all V Sj ’s RETURN all V Sj ’s Figure 1: The CaMeL Method-Learning Algorithm • LocateInstances(Π,m̄), where Π is a plan trace and m̄ = (N T, DEC) is an incomplete method, is a function that returns a set of triples (loc, θ, np). Each of these triples corresponds to one of the places where an instance (N T )θ of a non-primitive task N T was decomposed in Π. loc denotes the number of operators that occur before this instance of a task in the plan trace Π, and np is a boolean variable indicating whether method instance mθ was applicable to decompose (N T )θ. • ComputeState(D̄,S,Π,loc) is a function that computes the world state after applying the first loc operators of the plan trace Π in incomplete HTN domain D̄, where the initial world state is S. • CE(S,V S,np) is an implementation of the candidate elimination algorithm on version space V S with training sample S. np is a boolean variable that indicates whether S is a negative or positive example. • Converged(V S1 ,...,V Sm ) checks if the algorithm has converged. As mentioned before, a method learning algorithm converges to an answer when and only when all of the members in its output set are equivalent to each other. CaMeL uses this fact to verify whether it has converged to an answer after processing each training sample. In the general case, this function can be extremely expensive to evaluate in practice. However, in many cases where additional restrictions on the form of the preconditions are given, this function can be computed more quickly. Algorithm analysis In this section, we will discuss the assumptions and restrictions a domain must satisfy in order for CaMeL to work correctly. We will also propose a few theorems about CaMeL in the framework we have defined. Two kinds of constants may appear in an HTN plan trace Π or its corresponding HTN planning problem (I, S, D). First, explicit constants appear explicitly in the domain definition D (e.g., in the effects of operators, argument list of tasks, or decomposition list of methods). Second, implicit constants are those that do not appear in the D explicitly. These constants appear in a plan trace because some of the variables in D were instantiated to them while the plan trace was created. For example, consider an HTN method m for decomposing the task go(?x, ?y) that decomposes this task to a primitive task walk(?x, ?y). The precondition of this method is weather(good) ∧ at(?x). Now, if there are two atoms at(home) and weather(good) in the current world state and an instance of the method m is used to decompose the task go(?x, ?y) to the subtask walk(home, station) in the corresponding plan trace, then the constants home and station are implicit constants, while in the HTN domain definition, good is an explicit constant. As another example, in the blocks world domain, table is an explicit constant while names of blocks are implicit constants. When normalization is used to generalize the training samples, the following crucial assumption is made: Assumption 1 No constant can be both implicit and explicit. Another assumption should be made because we use the candidate elimination algorithm. In order for candidate elimination to work properly, the terms more general, more specific, etc., must be defined for the set of possible members of the version space. CaMeL uses version spaces to show the possible preconditions of methods, which in general can be any first order predicate calculus formula. Unfortunately, these terms cannot be defined for the set of all possible boolean formulas in first order predicate calculus. Assumption 2 Preconditions of the methods have a known form, and this form is such that the relations more general, less general, more specific, less specific, maximally general, maximally specific, minimal generalization and minimal specialization can be defined for them. This assumption defines the learning algorithm’s representational bias. A representational bias defines the states in the search space of a learning algorithm (Gordon & desJardins 1995). It guarantees that we can generalize given facts about the training samples (Mitchell 1980). Given these assumptions, the following theorems can be proved: Theorem 1 CaMeL is a sound and complete HTN method learner. Theorem 2 For any given HTN domain D that satisfies our restrictions and assumptions, there exist a finite set of plan traces that cause CaMeL to converge to D. We omit the proofs here, due to lack of space. The proof of Theorem 1 is straightforward: It proceeds by applying the relevant theorems about version space algorithms repeatedly, once for each method to be learned. The proof of Theorem 2 is more complicated, as it has to deal with interactions among tasks and their subtasks, sub-subtasks, and so forth. Experiments Theorem 2 says that there always exists a set of training samples that causes CaMeL to converge if the domain definition satisfies our restrictions and assumptions. However, this theorem does not give us any information about the number of training samples in such a set. What we need in practice is an answer to the question “How many samples will be needed to converge on average?”. In this section, we will discuss our experiments to answer this question. Test Domain The domain we used is a simplified and abstracted version for planning a NEO (Noncombatant Evacuation Operation). NEOs are conducted to assist the U.S.A. Department of State with evacuating noncombatants, nonessential military personnel, selected host-nation citizens, and third country nationals whose lives are in danger from locations in a host foreign nation to an appropriate safe haven. The decision making process for a NEO is conducted at three increasingly-specific levels: strategic, operational, and tactical. The strategic level involves global and political considerations such as whether to perform the NEO. The operational level involves considerations such as determining the size and composition of its execution force. The tactical level is the concrete level, which assigns specific resources to specific tasks. This hierarchical structure makes HTNs a natural choice for NEO planning. Simulating a Human Expert One goal in our work is to learn method for HTN planners in military planning domains. These domains are usually complicated, requiring many samples to learn each method. It is difficult to obtain these training samples for military domains. Even if we had access to the real world NEO training samples, those samples would need to be classified by human experts and the concepts learned by CaMeL would need to be tested by human experts to assess their correctness. This would be very expensive and time-consuming. In order to overcome this problem, we decided to simulate a human expert. We used a correct hierarchical planner to generate planning traces for random planning problems on an HTN domain. Then we fed these plan traces to CaMeL and observed its behavior until it converged to the set of methods used to generate these plan traces. The hierarchical planner we used is a slightly modified version of SHOP (Nau et al. 1999). In SHOP, if more than one method is applicable in some situation, the method that appears first in the SHOP knowledge base is always chosen. Since in our framework there is no ordering on the set of methods, we changed this behavior so that SHOP chooses one of the applicable methods randomly at each point. We AIPS 2002 137 also changed the output of SHOP from a simple plan to a plan trace. Results After generating the eleven sets of training samples for P = 1 2 11 12 , 12 ,. . . , 12 , we fed each training set to CaMeL until it converged. Figure 2 shows the number of plan traces needed in each case in order for CaMeL to converge. Figure 3 shows the time in seconds CaMeL needed to converge in each of those cases3 . As can be seen, the number of required plan traces and time is minimized when the probability P of a city having an airport is approximately 50%. For other values of P , methods are harder to learn. When P is close to 0, the hard-to-learn methods are those whose preconditions require cities to have airports, because the cases where these methods are applicable somewhere in given plan traces are so rare that the learner cannot easily induce their preconditions. When P is close to 1, the hard-to-learn methods are those whose preconditions do not require cities to have airports: the probability that there accidentally is an airport whenever these methods are applicable is so high that the learner cannot induce that an airport’s presence is not indeed required. Although it takes CaMeL tens and sometimes a few hundred training samples to learn all of the methods in the domain, CaMeL learns several of the methods in the NEO domain very quickly. Figure 4 shows how many of the methods are learned completely as a function of the number of 3 5 7 9 plan traces, for the cases where P = 12 , 12 , 12 , 12 . ¿From examining the raw data that went into this figure, we have observed that: 1. When P is close to 50%, all methods are learned very quickly. 2. When P is close to 0, methods whose preconditions do not require cities to have airports are learned very quickly. 3 These experiments were conducted on a Sun Ultra 10 machine with a 440 MHz SUNW UltraSPARC-IIi CPU and 128 megabytes of RAM. 138 AIPS 2002 Plan Traces Needed In order to generate each plan trace, we had to generate a random NEO planning problem and feed it to the modified version of SHOP. To generate a random NEO planning problem, every possible state atom was assigned a random variable, indicating whether or not it should be present in the initial state of the world (e.g., should there be an airport in a specific city), or what value its corresponding state atom should have (e.g., should hostility level be hostile, neutral, or permissive). In our preliminary experiments, we noticed that the distribution of these random variables did not affect our experiments very much. However, there is one exception to this rule: the probability P that there is an airport in a city makes a lot of difference. Therefore, we decided to assign a uniform distribution to all random variables other than P and to perform experiments with several different values of P . We conducted eleven sets of experiments, with 1 2 P = 12 , 12 ,. . . , 11 12 . 1000 800 600 400 200 0 1 2 Training Set Figure 2: Number of plan traces needed to converge 45 Learning Time (Seconds) Generating the Training Set 1200 40 35 30 25 20 15 10 5 0 1 2 Training Set Figure 3: CPU time used by CaMeL to converge 3. When P is close to 1, methods whose preconditions require cities to have airports are learned very quickly. We believe that these observations are quite important: When P is close to zero, methods that do not use airports are more likely to be used to decompose the tasks and are therefore of more importance than the other methods. The opposite is true when P is close to 1. In other words, CaMeL learns the most useful methods quickly, suggesting that CaMeL may potentially be of use in real world domains even if only a small number of training samples are available. Related Work Much of the work done on the integration of learning and planning is focused on conventional action-based planners. Usually, this work, as formulated in (Minton 1990), is aimed at speeding up the plan generation process or to increase the quality of the generated plans by learning search control rules. These rules give the planner knowledge to help it decide at choice points and include selection (i.e., rules that recommend to use an operator in a specific situation), rejection (i.e., rules that recommend not to use an operator in a specific situation or avoid a world state), and preference (i.e., rules that indicate some operators are preferable in specific situations) rules. Generally speaking, the input 18 Methods Learned 16 14 12 10 P=3/12 P=5/12 P=7/12 P=9/12 8 6 4 2 0 (x10) Plan Traces Given Figure 4: Speed of convergence for different training sets. The total number of methods in the domain is 17. for this kind of learning, as mentioned in (Langley 1996), consists of partial given knowledge of a problem-solving domain and a set of experiences with search through the problem’s search space. The idea that this set of experiences can contain solution paths (or in our terminology, plan traces) was suggested in (Sleeman, Langley, & Mitchell 1982). In (Mitchell, Mahadevan, & Steinberg 1985), learning apprentices, which acquire their knowledge by observing a domain expert solving a problem, were suggested to be used as control rule learning algorithms for the first time. ExplanationBased Learning (EBL) has been used to induce control rules (Minton 1988). STATIC (Etzioni 1993) uses a graph representation of problem spaces to derive EBL-style control knowledge. Kautukam and Kambhampati (Kautukam & Kambhampati 1994), discuss the induction of explanationbased control rules in partial ordered planning. In (Leckie & Zukerman 1998), inductive methods are used to learn search control rules. There has been some recent work on applying various learning algorithms to induce task hierarchies. In (Garland, Ryall, & Rich 2001), a technique called programming by demonstration is used to build a system in which a domain expert performs a task by executing actions and then reviews and annotates a log of the actions. This information is then used to learn hierarchical task models. KnoMic (van Lent & Laird 1999) is a learning-by-observation system that extracts knowledge from observations of an expert performing a task and generalizes this knowledge to a hierarchy of rules. These rules are then used by an agent to perform the same task. Another aspect concerning the integration of planning and learning is automatic domain knowledge acquisition. In this framework, the planner does not have the full definition of the planning domain and tries to learn this definition by experimentation. In (Gil 1992; 1994), a dynamic environment in which the preconditions or effects of operators change during the time is introduced and methods to derive these preconditions and effects dynamically is discussed. In (Gil 1993), instead of revising existing operators, new operators are acquired by direct analogy with existing operators, decomposition of monolithic operators into meaningful suboperators and experimentation with partially-specified operators. Several systems have integrated machine learning and planning. For example, PRODIGY (Minton et al. 1989) is an architecture that integrates planning and learning in its several modules (Veloso et al. 1995). SCOPE (Estlin 1998) is a system that learns domain-specific control rules for a partial-ordered planner that improve both planning efficiency and plan quality (Estlin & Mooney 1997) and uses both EBL and Inductive Logic Programming (ILP) techniques. SOAR (Laird, Rosenbloom, & Newell 1986) is a general cognitive architecture for developing systems that exhibit intelligent behavior. Conclusion and Future Work In this paper, we introduced CaMeL, an algorithm that integrates machine learning techniques with HTN planning. Our ultimate goal is embedding CaMeL as a module in HICAP to help its users in the planning process. CaMeL is supposed to observe domain experts while they are solving instances of HTN planning problems, and gather and generalize information on how these experts solved these problems, so that it can assist other users in future planning problems. As our preliminary experiments suggest, CaMeL can quickly (i.e., with a small number of plan traces) learn the methods that are most useful in a planning domain. This suggests that CaMeL may potentially be useful in real-world applications, because it may be able to generate plans for many problems even before it has fully learned all of the methods in a domain. CaMeL is an incremental algorithm. Therefore, even if has not been given enough training samples in order to converge, it should be able to approximate the methods that have not yet been fully learned. Our future work will include developing techniques to do these approximations. We also intend to integrate CaMeL into HICAP, and conduct subject study experiments with domain experts to obtain their judgments about the quality and validity of the generated plans. Acknowledgments This work was supported in part by the following grants, contracts, and awards: Air Force Research Laboratory F306029910013 and F30602-00-2-0505, Army Research Laboratory DAAL0197K0135, and the University of Maryland General Research Board. Opinions expressed in this paper are those of authors and do not necessarily reflect opinions of the funding agencies. References Aha, D. W., and Breslow, L. A. 1997. Refining conversational case libraries. In Proceedings of the Second International Conference on Case-Based Reasoning, 267–278. Providence, RI: Springer-Verlag. Currie, K., and Tate, A. 1991. O-plan: The open planning architecture. Artificial Intelligence 52:49–86. AIPS 2002 139 DoD. 1994. Joint tactics, techniques and procedures for noncombat evacuation operations. Technical Report Joint Report 3-07.51, Second Draft, Department of Defense. Erol, K.; Hendler, J.; and Nau, D. S. 1994. HTN planning: Complexity and expressivity. In Proceedings of the Twelfth National Conference on Artificial Intelligence. AAAI Press/MIT Press. Estlin, T. A., and Mooney, R. J. 1997. Learning to improve both efficiency and quality of planning. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, 1227–1232. Estlin, T. 1998. Using Multi-Strategy Learning to Improve Planning Efficiency and Quality. Ph.D. Dissertation, Department of Computer Sciences, University of Texas at Austin. Etzioni, O. 1993. A structural theory of explanation-based learning. Artificial Intelligence 60:93–139. Fikes, R. E., and Nilsson, N. J. 1971. STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2:189–208. Garland, A.; Ryall, K.; and Rich, C. 2001. Learning hierarchical task models by defining and refining examples. In First International Conference on Knowledge Capture, 44–51. Gil, Y. 1992. Acquiring Domain Knowledge for Planning by Experimentation. Ph.D. Dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213. Gil, Y. 1993. Learning new planning operators by exploration and experimentation. In Proceedings of the AAAI Workshop on Learning Action Models. Gil, Y. 1994. Learning by experimentation: Incremental refinement of incomplete planning domains. In Proceedings of the Eleventh International Conference on Machine Learning. Gordon, D., and desJardins, M. 1995. Evaluation and selection of biases in machine learning. Machine Learning 20:5–22. Hanney, K., and Keane, M. T. 1996. Learning adaptation rules from a case-base. In Proceedings of the Third European Workshop on CBR. Lausanne, Switzerland: Springer. Hirsh, H.; Mishra, N.; and Pitt, L. 1997. Version spaces without boundary sets. In Proceedings of the Fourteenth National Conference on Artificial Intelligence. AAAI Press/MIT Press. Hirsh, H. 1994. Generalizing version spaces. Machine Learning 17:5–45. Kautukam, S., and Kambhampati, S. 1994. Learning explanation-based search control rules for partial-order planning. In Proceedings of the 12th National Conference on Artificial Intelligence. Laird, J. E.; Rosenbloom, P. S.; and Newell, A. 1986. Chunking in SOAR: The anatomy of a general learning mechanism. Machine Learning 1:11–46. 140 AIPS 2002 Lamber, K. S. 1992. Noncombatant evacuation operations: Plan now or pay later. Technical report, Naval War College. Langley, P. 1996. Elements of Machine Learning. San Francisco, CA: Morgan Kaufmann Publishers. Leckie, C., and Zukerman, I. 1998. Inductive learning of search control rules for planning. Artificial Intelligence 101:63–98. Lenz, M.; Bartsch-Sporl, B.; Burkhard, H. D.; and Wess, S., eds. 1998. Case-Based Reasoning Technology: From Foundations to Applications. Berlin: Springer. Minton, S. N.; Knoblock, C. A.; Kuokka, D. R.; Gil, Y.; Joseph, R. L.; and Carbonell, J. G. 1989. PRODIGY 2.0: The manual and tutorial. Technical Report CMU-CS-89146, School of Computer Science, Carnegie Mellon University. Minton, S. 1988. Learning effective search control knowledge: An explanation-based approach. Technical Report TR CMU-CS-88-133, School of Computer Science, Carnegie Mellon University. Minton, S. N. 1990. Quantitative results concerning the utility of explanation based learning. Artificial Intelligence 42:363–391. Mitchell, T. M.; Mahadevan, S.; and Steinberg, L. 1985. LEAP: A learning apprentice for VLSI design. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 573–580. Los Angeles, CA: Morgan Kaufmann. Mitchell, T. M. 1977. Version spaces: A candidate elimination approach to rule learning. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence, 305–310. Cambridge, MA: AAAI Press. Mitchell, T. M. 1980. The need for biases in learning generalizations. Technical Report CBM-TR-117, Rutgers University. Muñoz-Avila, H.; McFarlane, D.; Aha, D. W.; Ballas, J.; Breslow, L. A.; and Nau, D. S. 1999. Using guidelines to constrain interactive case-based HTN planning. In Proceedings of the Third International Conference on CaseBased Reasoning, 288–302. Providence, RI: Springer Press. Nau, D. S.; Cao, Y.; Lotem, A.; and Muñoz-Avila, H. 1999. SHOP: Simple hierarchical ordered planner. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 968–973. Stockholm: AAAI Press. Sebag, M. 1995. 2nd order understandability of disjunctive version spaces. In Workshop on Machine Learning and Comprehensibility, IJCAI-95. Sleeman, D.; Langley, P.; and Mitchell, T. M. 1982. Learning from solution paths: An approach to the credit assignment problem. AI Magazine 3:48–52. van Lent, M., and Laird, J. 1999. Learning hierarchical performance knowledge by observation. In Proceedings of the 16th International Conference on Machine Learning, 229–238. San Francisco, CA: Morgan Kaufmann. Veloso, M., and Carbonell, J. G. 1993. Derivational analogy in PRODIGY: Automating case acquisition, storage, and utilization. Machine Learning 10:249–278. Veloso, M.; Carbonell, J. G.; Pérez, A.; Borrajo, D.; Fink, E.; and Blythe, J. 1995. Integrating planning and learning: The PRODIGY architecture. Journal of Experimental and Theoretical Artificial Intelligence 7:81–120. Veloso, M.; Muñoz-Avila, H.; and Bergmann, R. 1996. Case-based planning: selected methods and systems. AI Communications 9:128–137. Wilkins, D. E. 1990. Can AI planners solve practical problems? Computational Intelligence 6:232–246. AIPS 2002 141

CaMeL: Learning Method Preconditions for HTN Planning H´ector Mu ˜noz-Avila

Related documents

Products

Support

CaMeL: Learning Method Preconditions for HTN Planning H´ector Mu ˜noz-Avila

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib