From: AAAI Technical Report SS-94-06. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. A Decision Theoretic Planning Assistant * Nathaniel G. Martin and James F. Allen Department of Computer Science University of Rochester 14627-0226 {martin, j ames}@cs,rochester,edu Abstract 1.1 Constraints on Decision Theory in Planning Though decision theory allows one to encode information about uncertainty using probability and to encode information about multiple goals using utilities, its use in planning systems is complicated by three factors. Wediscuss three difficulties in applying decision theory directly to planning: the computational complexity of reasoning about probability, the difficulty of gathering knowledgeof probability and the difficulty of specifying behavior using utilities. Wethen describe a decision theoretic planning assistant that avoids someof the problems. In particular, instead of generating plans it assists a traditional planner, thereby avoiding the complexity of generating and evaluating plans. It reasons about probability based on its experience, so does not require precise probabilities supplied in advance and its behavior is specified through goals. 1 1. Reasoning about probability complex, 2. necessary probabilities Introduction Traditional planners operate in restricted domains in which knowledgeis certain and a single goal is specified. In realistic situations, planners must be able to balance achieving several goals of lower priority against achieving a single high priority goal. Without multiple prioritized goals, a planner will produce the best plan even if that plan is terrible. For example, if a planner knows only about trains, it will build a plan to ship oranges to Hawaii by rail. Also, unless it attends to circumstances unrelated to their goals, it mayachieved its goals at an unacceptable cost. For example, a robot railroad engineer will speed unless it knows that losing an engine is worse than missing a schedule. Because traditional planners are driven only by their goals, they cannot decide to forego a goal that can be achieve only at unacceptable cost. Decision theory computes a preference order over competing goals of differing priorities and probabilities and so promises a significant increase in the expressiveness of planning languages. It allows for a unified treatment of multiple prioritized goals and uncertainty. Unfortunately, this increased expressiveness is costly. *This material is based on worksupported by the U.S. Air Force -- RomeLaboratory research contract no. F30602-91C-0010 175 is computationaily are often unknownand 3. the relationship between utility and behavior is not well understood. The computational complexity of reasoning about probability is high. ThoughBayes nets have limited expressiveness (they can only encode propositional knowledge); the computational complexity of inferring probabilities using them is in NP [Cooper, 1987]. Because first-order logics of probability are more expressive, the complexity of inferring probabilities using them is worse than undecidable [Abadi and Haipern, 1989]. Decision theoretic planners can reduce the complexity of their task by encoding the STRIPSassumption in independence assumptions [Wellman, 1990]. Such planners can use Bayes nets [Pearl, 1988] to provide a graphical method of representing these assumptions, allowing one to encode reasonable independence assumptions for probabilistic systems. Unfortunately, these techniques only minimizethe increased cost of decision theory, they do not eliminate it. The precise knowledgeof probability required for decision theory also constrains its use. Probability can be understood as a representation of humanuncertainty. In areas like medicine where articulate experts have access to statistical data, it can represent these experts’ uncertainty. Unfortunately, such subjective probabilities are impractical in domains lacking articulate experts. As an alternative to asking experts, probabilities can be estimated from observations. For example, no one knows the probability that customers leave things under tables in fast food restaurants, even though robots designed to clean such restaurants might benefit from this information. The robot cleaning the floor does have the opportunity to gather information about the likelihood of things being left on the floor. From these observations, it could estimate the desired probability. It can also use information about the precision of the estimates it has computed. Though recent work [Haddawy and Hanks, 1990; Wellmanand Doyle, 1992] explores the relationship between goals and utilities, specifying a utility function that will cause a planner to behave properly is still poorly understood. Whenwe specify a goMto a traditional planner we knowsomething about the resulting state of the world in which we execute a successful plan, namely that the goal will hold. Weknow nothing about the state of the world after the execution of a decision theoretic planner’s plan. Wemay want our planner to achieve manythings, but a successful decision theoretic plan maydo nothing because the risk action is too great. This uncertainty about the state of the world after a decision theoretic planner’s plan is executed makes it hard to design the utilities so a planner will achieve something concrete. It also makes it hard to evaluate the plan produced. Indeed, the only reasonable evaluation is performing the same analysis the planner did. Wehave built a decision theoretic planning assistant that collects statistics and suggests the actions that are most likely to achieve effects given assumptions about the state of the world. In doing so, it avoids the computational complexity of generating and evaluating plans and it requires no prior knowledgeof probability. It also treats each event mentioned in the plan it executes as single goals Mlowinga simple utility function. 2 A Decision Assistant Theoretic numberof occurrences of a reference event type that are also occurrences of a target event type. For example, if an engineer has been asked to load a car one hundred times, but has taken the appropriate action only fifty times, the probability at the .05 confidence level would be [.4.6], (i.e., the probability is somewherebetween .4 and .6.) Utilities are numbersand expected utilities are calculated by multiplying the lower bound of the confidence interval and the upper bound of the confidence interval by the utility. Therefore, like probabilities, expected utilities are intervals. The language provides a function best-event-type that takes a confidence level, a set of possible actions event types and a goal event type as parameters. It returns the set of the possible action event types among which it cannot choose. It also provides a function execute that takes a plan (i.e., a set of event types and constraints on these event types) as an argument. This function send a requests to the agents who are most likely to be able to cause the events specified in the plan. The DTPArepresents causation by the predicate might-cause. This predicate focuses the DTPAattention on only those actions for which the programmer has explicitly stated might cause the goal. Because the goal is possible after any action, the DTPAconsiders too manyactions if it relies only of probability. Planning 2.2 The Decision Theoretic Planning Assistant (DTPA)was constructed using an extension of Rhet[Martin, 1993]. This planning assistant provides three services to a traditional planner: 1. it advises the planner about the likelihood of events; 2. it executes plans by choosing the actions most likely to accomplish the planner’s goals; and, 3. it monitors the actions to insure that they have the effects the traditional planning modulehypothesized and notifies the planner whena low probability event causes the plan to go awry. The planning assistant’s knowledge of probability is based on the number of instances of event types it has seen. The event types are arranged in a hierarchy such that all event instances of children are event instances of the parent. Therefore, the planning assistant has more accurate knowledge of the probability of event types higher in the hierarchy than about event types lower in the hierarchy. 2.1 The Language Rhet [Allen and Miller, 1991] is an knowledge representation that provides, amongother things, inference like Prolog’s, a type system, and an equality reasoning system that allows function terms. To build the DTPA, Rhet was extended to represent probability, utility and the number of occurrences of events efficiently. The probability function is the confidence interval for the 176 Example As a running example, consider a computerized planner trying to get some oranges from Avon to Bath. There is a rail line between Avon and Bath. In Avon, there is a warehouse with oranges, an engines, El, and a boxcar, B1. The warehouse is manages by manager, W1. One plan that achieves this goal is: ask W1to load the oranges into B1 then ask E1 to couple to B1, drive to Bath and unload the car. Wewill concentrate of the first step of the plan, loading B1 with oranges. This example comes from the TRAINSdomain [Allen and Schubert, 1991], a transportation domain containing engineers, managers of factories, warehouses and production centers. The DTPAwas tested in this domain and this example was derived from one of those tests. In the TRAINSdomain, the DTPAdoes not itself perform physical actions. Instead, all of its actions are requests to appropriate agents to perform actions. For example, when the planning assistant chooses Wi-Load to try to get oranges loaded into a railroad car, it will actually send a radio message (i.e., a Lisp expression through a TCP/IP connection) to the agent requesting that the agent load the car. Agents send reports when they receive a request, whenthey initiate a requested action and when they complete the requested action. Actions are programs the agents execute; they may or may not have their intended effects. Therefore, any one of these reports may fail to reach the DTPA.If the planning assistant does not receive a report either the action could have failed or the report could have been lost. 2.3 Advising a Planner All planners must make assumptions as they plan. Some early planning systems build their assumptions into their knowledge representation language. For example, STRIPS[Fikes and Nilsson, 1971] cannot represent events it does not cause. First-order languages can represent more complex interactions between actions but this increased expressiveness is bought at the cost of making inference difficult. STRIPScan generate plans because it assumes that its knowledgeis complete and accurate. Planners built with more expressive languages cannot makethat assumption so it is difficult to prove that the result of the plan will be the goal. For example, suppose the planner asks Wl to load oranges into B1, then asks E1 to take B1 to Avon. The oranges will get to Avon only if the oranges are in the car when E1 leaves and remain into the car until E1 arrives at Bath. Twopossible problems arise. E1 might not wait until the oranges are loaded and E1 might not walt until it gets to Bath to unload the oranges. In the first case, the oranges remain in Avon; in the second, they end up on the tracks 1between Avon and Bath. Allen et. al. [1991] describe a planner that generates plans by proving that the planner’s goals will hold if it executes its plan. The planner generates plans by searching through the space of plans, which, here, is a space of assumptions about the future. The planner can make two types of assumptions: that it can execute any action at any time and that the effects of the actions persist until they axe needed. If the planner can prove a contradiction using an assumption, it abandons that assumption mudtries another. Whenfinds a set of assumptions that is consistent with the current state of the world and that allows it to prove its goal will hold, it accepts the assumed actions under the assumed constraints as its plan. The DTPAhelps the planner choose appropriate assumptions. To do so, the planner supplies the planning assistant with a confidence level, a set of candidate event types, an event type that it wants to cause and a temporal interval. The planning assistant return the set of event types most likely to cause the desired event type. The set of event types are those amongwhich the planning assistant cannot choose. This indicates that any of these assumptions are equally valid as far as the planning assistant knows. Example: In constructing its simple plan, the planner can choose Wl and E1 to load the oranges into B1. Unless the planner has explicit knowledge that one agent is preferable to the other, it cannot choose. The planning assistant can use it knowledge of previous occurrences of requests to couple to make the choice. Suppose we have three event types named: 1. El-Load 1Robotsare stupid. Weforgot to tell one of our simulated engineers to wait until it got to the station to unload its oranges and the oranges endedup on the track. It took quite awhile to find themtoo. 177 2. Wl-Load 3. Loaded The first is the set of event instances that include a request to E1 to load a car, the second the set of event instances that include a request to W1to load a car and the third the set of event instances in which and engine and a car are coupled. The planner-and the planning assistant-know that either request might result in the cars being coupled. [might-causeEi-LoadLoaded] [might-causeWi-LoadLoaded] Which engine should the planner try to couple to the car? The planner asks the DTPAfor its suggestion using best-event-type as follows (best-event-type .95 Loaded El-LoadWi-Load) Dependingon the statistics and asserted probabilities, this function call will return either El-Load, W1-Load, or both. Given the statistics below, best-event-type returns W1-Load. [occ so-fari00 El-Load] [occ so-far35 Ei-LoadLoaded] [occ so-far100 Wi-Load] [occ so-far 65 Wl-LoadLoaded] On the other hand, if the number of occurrences are different, calling best-event-type returns (W1-Load El-Load). [occ [occ [occ [occ so-far so-far so-far so-far 100 El-Load] 50 El-Load Load] 100 W1-Load] 60 W1-Load Load] In the second case the DTPAhas no useful information. The DTPAcan answer questions about probability quickly because the planner constrains the question. It gives the DTPAthe target event type and a set of reference event-types. It is easy for the DTPAto make decision such constrained circumstances. 2.3.1 Persistence assumptions Persistence assumptions (i.e., assumptions that properties the planner knowsto be true now will continue to be true in to the future) are commonin planning because, even if the planner knowsit caused an event that made a property true, it does not knowthat the property remains true until that property is needed. For example, assuming that oranges remain in the railroad car until they get to their destination is a persistence assumption. One can describe event types in which the event instances of the event types are separated by another event instance. In the previous example, we describe an event type that consisted of event instances that were constructed from one event type following another, we could describe such an event type consisting of three event instances: the two we are interested in and a third that separates the other two. Example: Suppose the planning assistants knowledge base is the same as in the previous example. In addition, the planner has two additional event types called Load-Couple in which loading the oranges is followed by coupling the loaded car to the train and Load-Event-Couple in which some unspecified event occurs between the loading and the coupling The planner describes this event using the following Rhet axioms, which are Horn clauses where the head is separated from the rest of the clause by the symbol "<". The first states that an instance of the Load-Couple action event type occurs when an instance of the Load action event type occurs before an instance of a Couple event type. The second states that an instance of the Load-Event-Couple type occurs when an instance of the Load event type occurs, then any event instance occurs and finally an instance of the Couple event type occurs. [occ [occ [occ [occ [occ [occ [occ [occ 100 Load-Event-Couple] 35 Load-Event-Couple Move] 100 Load-Couple] 65 Load-Couple Move] so-far so-far so-far so-far 100 Load-Event-Couple] 50 Load-Event-Couple Move] 100 Load-Couple] 60 Load-Couple Move] Calling best-event-type, as above, returns the set (Load-Couple Load-Event-Couple). The planner can reason that it has insufficient information at the .95 confidence level to assume that an event instance separating the loading of the oranges and the coupling of the engine makethe action fail to achieve its goal. The planner may choose to assume that the necessary properties for the event to hold continue to hold through any event instance it can describe. [[element-of?ei4 Load-Event-Couple] < [element-of?eil Load] [element-of?ei2 Yet] [element-of?ei3 Couple] [Time-Before[time ?eil] [time ?ei2]] [Time-Before[time ?el2] [time ?ei3]]] 2.4 Executing a Plan The DTPAexecutes a plan by assuming that there are no interactions between the event types specified in the plan. It can make this assumption because the planner and the planning assistant work in the same domain of discourse. The planner will have considered all known interactions between the events; if there are other interactions, the DTPAwill also be unaware of them. The assumption allows the planning assistant to deal with each event in the plan as a separate goal. The DTPAtreats the execution of a plan as a sequence of forced choices. It first tries to makethe choice by comparing confidence intervals. It this fails, it abstracts the choice by removingconstraints on the actions. If it cannot make a choice at an acceptable level of abstraction, it tries to make a choice using the maximumlikelihood estimate. If this fails it chooses at random. of these [might-causeLoad-CoupleMove] [might-causeLoad-Event-Couple Move] The plan would then cMlbest-event-type ~llows. so-far so-far so-far so-far Here, calling best-event-type returns the event type Load-Couple. The planner would know that it is important whether or not an event instance occurs between the load of the oranges and the coupling of the engine. Suppose, on the other hand, that the number of occurrences are different. [[element-of?ei_3 Load-Couple]< [element-of?ei_l Load] [element-of?ei_2Couple] [Time-Before[time ?ei_l][time ?ei_2]]] The planner Mso knows that either event typesmightmove the oranges. has no information indication that an event occurring between the loading and the coupling changes the probability of getting the commodity moved.Therefore, as far as the planning assistant knows, the properties necessary to move successfully the commodities persists through the occurrence of any event. Suppose the number of occurrences are as below. as (best-event-type .95 Move Load-Couple Load-Event-Couple) Depending on the statistics, this function call will return either Load-Couple, Load-Event-Couple, or both. Here, if the confidence intervals overlap, the planner knows that the planning assistant had no information that contradicts the persistence assumptions necessary. Because the confidence intervals are incomparable, the planning assistant 2.4.1 Abstraction Hierarchy The abstraction hierarchy has the set of all event instances at its root. Belowthe root are more specific sets 178 of events. For example, all actions are event types that have an agent associated with them. The DTPAchooses events to execute from the set of actions it can cause. This is a specialization of action events. The actions the DTPAcan cause are further specialized by adding constraints onto them. For example, loading a car in Avonis a specialization of loading a car anywhere. Defining event types is one of the most difficult tasks in writing programs for the DTPA.The task is difficult because describing even simple causal models requires many event types. The task is more difficult when one tries arrange event types into a useful hierarchy. The task of generating event hierarchies is difficult because the programmer controls the DTPA’slearning process by defining event types. The DTPAcan only learn probabilities about the event types, so if its programmer has not added a particular event type, the DTPA will not waste time considering it. This attribute is vitally important because the effectiveness of the DTPA’s choices is constrained by the number of event types it must consider. If it must consider manyevent types, it will need a great deal of experience before it can say with certainty that one is better than all the others. On the other hand, if it chooses from only a few choices, a few examples may allow it to make a clear choice. The DTPAuses the event type hierarchy by looking for the most specific event type that might cause the event type requested by the planner. If the plan specifies an event type in the planning assistant’s event hierarchy, it searches for the action event type most likely to cause event type directly. The situation is more complicated when the planning assistant’s event hierarchy does not match the planner’s. Here, the planning assistant chooses a random element of the planner’s event type and searches for the most specific event type in its hierarchy that contains the random element. In doing this, the planning assistant assumes that the plan the planner has specified depends on any instance of the event type specified. The random element of the event type chosen will be such an instance and any element of an event type containing the random element will also be such and instance. The properties that characterize the event type specified will also characterize the event type planner eventually chooses. Once the planner has found appropriate event types, it assigns a constant utility to each of the event types so selected. It then uses the DTPA’ssearch capabilities to look for the event type that it can cause that is most likely to cause that event type. Example: The following shows a part of the abstraction hierarchy for loading cars. All actions are events and all loadings are actions. At the next level the hierarchy splits between a manager loading or an engineer loading. The event instances in which an manager loading are further split into loading by the different managers.. Load Manager-Load Wi-Load Engineer-Load EI-Load Engineer-Load The abstraction hierarchy is further elaborated by selectively removing preconditions execution of the program associated with the action. The hierarchy below shows that M1can load a car whenthe car and and stuff to load are available, or it can ignore one or both of the preconditions. Every execution in which the precondition holds is also an instance of an execution in which the precondition may or may not hold. W1-Load W1-Load(Here (Car)) W1-Load(Here (Stuff), W1-Load (Here (Stuff)) W1-Load(Here (Stuff), Here (Car)) Here (Car)) This type of abstraction is problematicM. Because event types may appear more than once in the hierarchy (e.g., W1-Load(Here(Stuff), Here (Car)) ), there maybe more event types the more abstract level than at the more concrete levels. If this occurs, abstracting make choices harder rather than easier. 2.4.2 Choosing an Event The planning assistant causes each event specified in the plan by searching for the action event most likely to cause the specified event and executing the action associated with that event. It first selects the set the event types that might-cause the event specified in the plan. This set can be further restricted by the programmer. ~omthe restricted set, it selects those events that it can cause (i.e., requests for action), then chooses among this set based on the conditional probability of the specified event given those event types by choosing the event type with the highest such conditional probability. The DTPAfirst chooses at the .95 a level and, if a choice can be made at that level, it is returned. If the choice function returns a set of candidates among which it cannot choose, the planning assistant abstracts the choice by removing some of the pre-conditions of the choice. This tack might be successful because the DTPA’s set of observations may be spread among a smaller set of event types makingthe confidence intervals narrower. If abstraction does not help it choose based on confidence intervals, it chooses from the most specific set using the maximumlikelihood estimate 2. If the maximumlikelihood estimates for the necessary conditional 2The planning assistant com~putesa maximum likelihood estimate using the formula ~ where y is the number of times the event type under consideration has caused the de- Event Action 179 probabilities are equM,choice is impossible, so the plan3. ning assistant chooses randomly Example: If the the DTPA has the following statistics, it cannot decide at the most concrete level of abstraction. The probability intervals it generates are [0.37 0.63] for the concrete action involving E1 and [0.19 0.44] for the action involving W1. computes are [occ so-far 50 E 1-Load (Here (Stuff),Here(Car))] [occ so-far 15 El-Load (Here (Stuff),Here(Car))Loaded] [occ so-far 50 W1-Load (Here (Stuff),Here(Car))] [occ so-far 25 Wl-Load(Here(Stuff), Here(Car)) Loaded] At the next level of abstraction the situation becomes worse because the DTPA’s experience is spread amonga larger number of event types. The planning assistant has exactly the same experience with El-Load(Here(Car)) and Wi-Load(Here(Stuff)), so here even maximumlikelihood estimate fails to give a clear choice. [occ so-far25 El-Load(Here(Car))] [occ so-farlO EI-Load(Here(Car)) Loaded] [occ so-far25 EI-Load(Here(Stuff))] [occ so-far 5 Ei-Load(Here(Stuff)) Loaded] [occ so-far 25 Wl-Load(Here(Car))] [occ so-far 15 Wi-Load(Here(Car)) Loaded] [occ so-far 25 Wi-Load(Here(Stuff))] [occ so-far I0 Wl-Load(Here(Stuff)) Loaded] At the third level of abstraction, we are back to the situation described previously, so we can make a choice. The planning assistant chooses to have the manager load the oranges. [occ so-fari00 El-Load] [occ so-far35 Ei-LoadLoaded] [occ so-fari00 Wi-Load] [occ so-far65 Wi-LoadLoaded] the planning assistant itself cannot load oranges, so the planning assistant’s representation of such action event instances has no program associated with it. Once the planning assistant has chosen an appropriate action event type, it generates a hypothetical randominstance of that event type and constrains this newly generated event instance so that all its parameters match the plan. For example, it may specify that this random instance should occur at the time the planner specified. The DTPAoften adds further constraints to the event type it chose when it’s choice was made at an abstract level. Even when a concrete event type is chosen, the DTPA’sevent types rarely specify the time the program should start so it uses the planner’s temporal specification. The DTPAgenerates a random instance of an action event type, which, because it is an action event instance the DTPAcan cans, has a program is associated with it. Because it knows that executing the program associated with the event type causes an instance of that event type, it immediately updates the statistics on that event type. It has observed its owncreation of an event instance. In current DTPAcan only execute requests in which it sends a message to an agent. As usual, the event instances generated by requests are collected into event types to gather statistics. The planning assistant hopes that the right agent will receive the request and act on it appropriately but it does not knowthis. If the agent does act on the request appropriately, other event instances occur. The planning assistant hopes that one of the event instances that will occur is one of the events specified in the plan. It also hopes that some of the events that will occur are observable so that it can tell that the goal has actually been achieved. The planning assistant maximizes its expectation that its goMswill be fulfilled by maximizingthe probability that the request will result in the goal event types. It monitors observable event instance to see if the goal does occur. Once the planning assistant has performed all the tasks associated with one event in the plan, it performs the same process on the next event. It continues processing the event specified in the plan until none are left. 2.5 2.4.3 Causing an Event The DTPAalways chooses an action event type that it can cause. Action event types are event types that occur because some agent runs a program. Action event instances the planning assistant can execute include the program that makes them occur. Other actions instances represent other agents running programs. For example, sired event type and n is the numberof times the event type under consideration has been observed. It uses this formula because it gives the lowest score to the event that has failed most often even if none has succeeded. 3It would be interesting to try choosing using the maximumlikelihood estimate at different levels of abstraction before choosing randomly.At present it does not. 180 Monitoring the Execution of a Plan The planning assistant does not make observations automatically. It can monitor events in one of two ways: it can cause an instance of an event type called an anticipation, or it can track an event type. Whenthe planning assistant chooses an action, it reasons about events that are likely to result from this action. If the possible event is observable, it anticipates that event. It then assumes that when an anticipated event is observed, that event is the result of its actions. Alternatively, the planning assistant can track all occurrences of an event type by performing appropriate actions whenever sufficient evidence for the event type occurs. The planning assistant anticipates by generating a hypothetical event instance and waiting until it sees a set of conditions that would lead it to assume that such an event instance has occurred. Usually, It cannot directly observe the event instances that represent the success of its actions. Whenit cannot observe success directly, it collects the set of possible consequences of success and anticipates subset of these events that are observable. Whenit makes an observation that matches the observations that would lead it to conclude that an anticipated event instance has occured, it changes the hypothetical event instance it anticipated into an actual event instance of the same type. It then removes the event instance from the list of anticipations and updates the statistics for the event types of which the event instance is a member. By anticipating, the planning assistant can update the statistics on the event type in which it tries an action and the event type in which the action succeeds. Once the planning assistant has chosen an appropriate request and has executed the action associated with that request, it collects the set of events that the request might-cause. It then chooses from this set the subset that is observable. Example: The Load actions mentioned above are actually requests to the appropriate agent to load oranges into a railroad car. Therefore, the DTPA expects three reports from the agent. A subset of the DTPA’sknowledge about these reports appears below. The planing assistant needs new event types to represent the reports being generated and being received. Load-Rec Load-Started Loaded Loaded-Rep Loaded-Rep-Rec [might-causeWl-LoadLoad-Rec] [might-causeLoad-RecLoad-Started] [might-causeLoad-StartedLoaded] [might-causeLoadedLoaded-Rep] [might-causeLoaded-RepLoaded-Rep-Rec] [observableLoaded-Rep-Rec] [Loaded-Rep-Rec(ei) > Loaded(ei)] Using this knowledge, the planning assistant selects all of the event types that might occur given the an event instance of the type it just caused W1-Load. These events are: Load-Rec, Load-Started, Loaded-Rep, Loaded-Rep-Rec. It knowsonlyLoaded-Rep-Rec is observable, soit onlyanticipates an instance of thisevent type. The planningassistantknows that if it receives a report stating thatloading has been completed that an instance of a Loaded-Rep-Recoccurs. From this it can infer that the loading was successfully completed. Once the planning assistant has selected a set of observable events that the agent’s action might cause, it then chooses the event type that makes such an observation most likely. Each observable event type has an 181 anticipation action event type. The DTPAchooses the anticipation event type that is most likely to cause the observation the same way that it chooses the actions that cause the events specified directly in the plan. That is, the planning assistant chooses an appropriate anticipation and executes the program associated with the chosen anticipation event type. 2.5.1 Observations The planning assistant monitors the effects of its actions to update its statistics so that it will be able to make better predictions in the future. Because monitoring the execution of the plan is not specified in the plan itself, the planning assistant must assign utility to monitoring. The planning assistant assumes that the planner will use its services for manysubsequent similar activities and that the best possible answers are desired. Unfortunately, the planning assistant cannot compute this utility without knowledgeof the future. Instead, it uses a fixed utility. The planning assistant also runs a process that monitors its sensory inputs. Each time a sensory input occurs the DTPAchecks each of the event instances it is anticipation. If the sensory inputs matches the input expected from an anticipated event instance, it updates the statistics on all the event types that contain this event instance and all event type implied by the occurrence of this event instance. The all observable event instances that were anticipated as a result of the planning assistant are then removed from the set of event instances that are anticipated. In this way the planning assistant avoids updating the statistics twice for an event instance that caused more than one observable event instance. Example: In the previous example, the planning assistant would have chosen the action event type Anticipate-Load-Rep-Rec. The program associated with all instances of this event type puts a pair consisting of the observation expected-a report from the agent performing the action-and an instance of the event type Load-Rep-Rec. Using the axiom from the previous example, the planning assistant infers that a Loaded-Rep-Rec occurred and therefore that an instance of the Loaded event type occurred. It updates the number of successes for the W1-Loadaction event type because an instance of that event type lead to the loading. 2.5.2 Tracking Events The planning assistant tracks event types by noting that a certain set of conditions is sufficient cause to assume that an instance of an event type has occurred. Whenit sees such a set of conditions, it generates a random instance of the event type it is tracking and updates the statistics on the event type. By tracking event types, the planning assistant can collect information about event types when it does not expect a particular instance of that type. Tracking could also be used for other purposes. For example, it could reason that a likely observation resulting from requesting a particular inept engineer to traverse a segment of track is that the train will wreck. The planning assistant could anticipate an accident and notify the planner if it observes the accident. To be able to use this kind of information, the planner using the DTPA would have to be able to accept reports that its plan is in trouble. 3 Conclusion The planning assistant described above avoids the three pitfalls of decision theoretic planners. It avoids the first by relying on the planner to ensure that its choices are independent. It also arranges the probabilities for which it has information into a hierarchy allowing it to find appropriate information quickly. The planning assistant does not require the person programmingit to knowthe probability of effects given actions. Instead, it infers these probabilities from observed effects of the actions. It gathers these observations by monitoring the plans it executes. Finally, because it assists a planner, it can assumethat the choices it makesare also independentin terms of utility. This allows it to treat each event it tries to achieve as a simple goal. As Haddawy and Hanks[Haddawy and Hanks, 1990] show, such simple goals can be captured by constant utilities. These solutions are not a panacea. Each one introduces new problems into using decision theory for planning.. Because probabilities a inferred from statistics, the DTPAplaces an even higher premium on knowledge of probability. Only when it has simple choices, the number of alternative are few and it has large experience, can it choose correctly. It attempts to deal with the lack of information by arranging information in a hierarchy and using it as an abstraction hierarchy. A better use of the statistical information it has would improve its performance but it will still need to focus its attention before applying probability. Another limitation on the planning assistant’s behavior is its reliance on the simple utility structure. The ability to trade off likely success on a low probability goal against likely failure on a high probability goal would greatly improve its performance. The current DTPA cannot make such tradeoffs because it concentrates on one plan event at a time. It focuses its knowledgethis way but it cannot switch tasks if one proves too difficult. It cannot shift to another part of the plan because presumably all of the steps in the plan are necessary for 0 completion. One solution to this problem would be to have the planning assistant give up if it cannot find a reasonable way to cause an event specified in the plan and have the planner generate a new one that does not contain this event. RJ 7220 (67987), IBM Research, Almadan Research Center, 1989. [Allen and Miller, 1991] James Allen and Bradford Miller, "The RHETSystem: A Sequence of SelfGuided Tutorials," Computer Science 325, University of Rochester, 1991. [Allen et al., 1991] James F. Allen, Henry A. Kautz, Richard N. Pelavin, and Josh D. Tenenberg, Reasoning About Plans, Morgan Kaufman Publishing Co., San Mateo, CA, 1991. [Allen and Schubert, 1991] James F. Allen and Lenhart K. Schubert, "The TRAINSProject," ComputerScience 91-1, University of Rochester, 1991, TRAINSTechnical Note. [Cooper, 1987] Gregory F. Cooper, "Probabilistic Inference Using Belief Networks is NP-Hard," Technical Report KSL-87-27, Stanford Univeristy, Stanford, California 94305, May 1987. [Fikes and Nilsson, 1971] R. E. Fikes and N. J. Nilsson, "STRIPS: A new approach to the application of theorem proving to problem solving," Artificial Intelligence, 2:189-205, 1971. [Haddawy and Hanks, 1990] Peter Haddawy and Steve Hanks, "Issues in Decision-Theoretic Planning: Symbolic Goals and NumericUtilities," In Proceedings of the DARPAWorkshop on Innovative Approaches to Planning, Scheduling and Control, pages 48-58, 1990. [Martin, 1993] Nathaniel G. Martin, Using Statistical Inference to Plan Under Uncertainty, PhD thesis, University of Rochester, Computer Science Department, Rochester, NY14627, 1993. [Pearl, 1988] Judea Pearl, Probabilistic Reasoning In Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, Inc, San Mateo, CA, 1988. [Wellman, 1990] Michael P. Wellman, "The STRIPSassumption for planning under uncertainty," In American Association of Artificial Intelligence National Conference 1990, pages 198-203, 1990. [Wellman and Doyle, 1992] Michael P. Wellman and Jon Doyle, "Modular Utility Representation for Decision-Theoretic Planning," In Proceedings of the First International Conference on AI Planning Systems, pages 236-242, 1992. References [Abadi and Halpern, 1989] M. Abadi and J. Y. Halpern, "Decidability and expressiveness of first-order logics of probability," Computer Science Technical Report 182