From AAAI Technical Report SS-96-04. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. About Planning under Uncertainty in Dynamic Systems Exploiting Probabilistic Information Ute C. Sigmund and Thomas Stiitzle Intellektik, Informatik, TH-Darmstadt AlexanderstraBe 10, D-64283 Darmstadt Germany {ute, tom }@intellektik.informatik.th-darmstadt.de Abstract planning in complex, uncertain domains are attended by using a table of condition-action rules, see (13). Reactive Planning probably has most utility in situations where real-time reactions are necessary and a classical planning algorithm might be not fast enough. Replanning on the other side is usually used to fix a plan when something has gone wrong and the initial plan is not executable any more, see (15). Interleaving planning and execution handles problems with a large number of alternatives, most of which would not be used . This is the reason why conditional planning fails in these situations. On the other hand replanning can not be used because execution failures could be harmful (1). To sUIvive in a real world environment, a reasoning agent has to be equipped with several capabilities ;.n order tc deal with incomplete information about its target domain. One aspect of the incomplete information is the uncertainty about the effects of actions and events . Although the concrete outcome of an action or event cannot be predicted, often their relative likelihood can be quantified. In this paper we will discuss planning under uncertainty exploiting the available probabilistic information in dynamic worlds, especially considering delayed effects of actions modeled as causally invoked events. Research Issues Yet in all these approaches dealing with uncertainty in planning systems one aspect has been completely ignored. This aspect refers to the problems associated with delayed effects caused by actions of the agent. In this paper we are going to introduce the notion of delayed effects under the aspect of conditional planning. Furthermore we will shortly discuss the consideration of probabilistic information. Classical Planning systems have largely ignored the issue of uncertainty, because in classical systems it is usually assumed that complete and exact knowledge about the state of the world is available and the results of the available actions are completely known. Nowadays the research paradigm in planning is shifting towards complex real-world applications, see (12). Thus, handling of issues like nondeterministic effects of actions, uncertainty, incomplete knowledge about the actual state of the world, reasoning about changing environments, and reasoning about interactions between different agents and the environment have to be incorporated into modern planning systems. In classical planning approaches like the situational calculus (11) and Strips, (7), agents act in a static world , i.e. state transitions can only be caused by actions performed by the agent. In contrast to that, in a dynamic system (9; 14) state transitions occur consecutively while time passes by. Possible changes of an system are caused by actions performed by one or more agents as well as the occurrence of events. An important property of these events is that they can be causally invoked by other events or actions performed by an agent and may have impact on following states. This gives us the possibility to model delayed effects of actions and events. As an example, consider a table with a glass of water on it. Here an action lift-table has the immediate effect that the table is lifted. As a delayed effect the glass standing on this table may drop on the floor , water spills out of the glass, the floor gets slippery, and when stepping towards the the door our Considering these issues a reasoning agent has to be equipped with several capabilities. These include reactive behaviour, conditional planning, replanning, as well as interleaving planning and execution. The philosophy behind these approaches can be seen as follows. In conditional planning or contingency planning non-determinism is dealt by considering each possible alternative explicitly. Sensory information is needed to actually decide which part of the plan has to be executed. In reactive planning it is tried to meet real-time requirements (e.g. getting out of the way of obstacles in order to avoid crashes) and the complexities of 107 notion of parallelism will be an important feature to represent dynamic systems. ditional probability distributions over the possible successor states. The objective is to obtain plans that are guaranteed to succeed with a certain probability, see also (5). The main difference to approaches that try to maximize the expected utility is, that these approaches work directly on a state space representation of the problem and try to find a policy, i.e. a reaction strategy, to maximize future utilities. One may argue that the effects of a sequence of events initiated by an action performed by the agent can be combined and represented as one complex state transition. This is what most classical planning systems actually do. But there is one important argument against that. Representing a sequence of state transitions caused by delayed effects of an action as one complex state transition the agent has no possibility to influence the world while this complex state transition happens. In our example the agent will never have a chance to catch the glass gliding on the table before it drops on the floor. We will follow a decision-theoretic approach . I.e. we generalize the notion of reasoning about change by assigning probabilities to the possible alternative effects of actions and events. It may be inherently unpredictable which of these alternative outcomes of actions or events will happen in a concrete scenario. N evertheless, it often is possible to make use of the probabilistic aspects of these different alternatives in the following way. For complex problems finding a complete plan considering all possible alternatives leads to a combinatorial explosion of the search space, see also (2; 6). So it is considered necessary to avoid or at least to reduce this combinatorial explosion. This can be done by introducing heuristics to guide the search and reduce the search space by using information about the reward function and the probability of the possible outcomes of actions. 2 When no explicit reward function is available this reduces to cutting of very unlikely contingencies. 3 This yields advantages compared to pure conditional planning, as by using the pro babilistic information only a part of the possible states is considered, i.e. those states that have a high probability of occurrence. Within the decision-theoretic planning aspect we especially will consider modeling the world as a dynamic system with delayed effects. To model dynamic systems with delayed effects we consider a state not only to consist of the collection of facts that hold in some situation, but in addition a sei of events that occur in this situation will also be a part of the representation. Consequently state transition are no ~onger transitions from one situation to one or more following situations but transitions from a pair of a situatIOn and a set of events, (S, E), to one or more possible following pairs {(Sf, EI)}. In this context the planning problem will no longer be the question how to find a sequence of actions to reach a given goal situation from a given initial state. The reasoning agent will rather influence and direct the development of the system towards his goals by initiating actions at certain states. In the following, we will introduce the notion of non-determinism augmented with reasoning about probabilities (assigned to the transition relations) to our representation of dynamic systems. In systems that explicitly handle uncertainty, two different approaches can be distinguished. One may be called (symbolic) goal-directed planning and the other decision-theoretic planning. In symbolic goal-directed Planning, that also has been called qualitative planning, see (8), a model of the possible states of the world is maintained and a plan is constructed that tries to achieve a specific goal, considering all the possible outcomes. Here the likelihood of the different states is not quantified. On the other side in decisiontheoretic planning a probabilistic model of the states is maintained and this kind of planners usually try to maximize the expected utility of the plan, see (4; 2; 6; 3). Another approach is taken e.g. in (10), where imperfect information about initial world states is taken into account by using a probability distribution over Acknowledgments The first author is supported by ESPRIT within basic research action MEDLAR-II under grant no . 6471, the second author is supported by a grant of DFG (Deutsche Forschungsgemeinschaft). The authors would like to thank Michael Thielscher for valuable comments and technical assistance. References [1] Jose A. Ambros-Ingerson and Sam Steel. Integrating planning, execution and monitoring. In Proceedings of the AAAI National Conference on Artificial Intelligence, pages 83-88, 1988 . 2A similar approach was also used e.g. by (8; 3). 3E.g. When throwing a coin, nobody will consider the case that the coin will stay on its edge even though some people claim that this is not impossible. 1 The effects of spilling out some water might be quite exaggerated, but this should only serve as an example for delayed effects. 108 Planning Under Uncertainity: Structural AsIn sumptions and Computational Leverage. EWSP,1995 . [3J Thomas Dean, Leslie Pack Kaelbling, Jak Kirman, and Ann Nicholson . Planning Under Time Constraints in Stochastic Domains. Artificial Intelligence, 76:35-74, 1995. [4J Thomas Dean and Michael Wellman. Planning and Control. Morgan Kaufmann, San Mateo, California, 1991. [5] Mark Drummond and John Bresina. Anytime Synthetic Projection: Maximizing the Probability of Goal Satisfaction. In Proceedings of the AAAI National Conference on Artificial Intelligence , pages 138-143 , 1990. [6] Jak Kirman et.al. Using Goals to Find Plans with High Expected Ustility. In 2nd European Workshop on Planning, 1993. [7] Richard E. Fikes and Nils J. Nilsson. STRIPS: ANew Approach to the Application of Theorem Proving to Problem Solving. Artificial Intelligence, 2:189-208,1971. [8] Nir Friedman and Daphne Koller. Qualitative Planning under Assumptions: A Preliminary Report. In Working Notes of the 1994 AAAI Spring Symposium, 1994. [9] Gerd GroBe. Propositional State-Event Logic. In C. MacNish , D. Peirce, and 1. M. Peireira, editors, Proceedings of the European Workshop on Logics in AI (lELIA), volume 838 of LNAI, pages 316331. Springer, 1994. [10] Nicholas Kushmerick, Steve Hanks, and Daniel Weld. An Algorithm for Probabilistic Planning. Artificial Intelligence, 76(1-2):239-286,1995. [11] J. McCarthy and P. J. Hayes. Some philosophical problems from the standpoint of artificial intelligence. In B. Meltzer and D. Michie, editors, Machine Intelligence 4, pages 463 - 502. Edinburgh University Press, 1969. [12] Drew McDermott and James Hendler. Planning: What it is, What it could be, An Introduction to the Special Issue on Planning and Sched uling. Artificial Int elligence, 76 :1-16,1995. [13J Marcel J. Schoppers. Universal Plans for Reactive Robots in Unpredictible Environments. In Proceedings of the International loint Conference on Artificial Intelligence, pages 1039-1046, 1987. [14J Michael Thielscher. The Logic of Dynamic Systems . In Proceedings of the International loint 109 1962 , 1995. [15J David E. Wilkins. Kaufmann , 1988. Practical Planning. Morgan