Planning as an Iterative Process David E. Smith

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Planning as an Iterative Process† David E. Smith NASA Ames Research Center Moffet Field, CA 94035–0001 david.smith@nasa.gov Abstract allow the user to edit a partial or completed plan by moving actions around, locking actions down, or removing actions. The MAPGEN planning system, which was used to do daily planning for the two Mars Exploration Rovers (MER), follows this model (Bresina et al. 2005). Human Tactical Activity Planners (TAPs) used MAPGEN in an interactive mode where they would select and place activities on timelines, and MAPGEN would instantiate details and enforce constraints. The TAP could also remove and reorder activities, and MAPGEN would identify and flag any violated constraints. This approach was quite successful, and has led to similar follow-on systems being adopted for the Phoenix Mars Lander and the recently launched Mars Science Laboratory. However, this approach addresses only a small part of the planning problem, and does not take full advantage of the power of automated planning, as many of us in AI would like. There are a number of technical difficulties that stand in the way of a more fully automated approach to planning for such missions, and I discuss some of these in the next section. However, there is a bigger issue with the planning process – it is still being considered as a separate, isolated component that is used after the scientific team has fully specified their goals and preferences. It is this issue which I would like to bring to the fore – integrating planning into an iterative process where the goals, objectives, and preferences are only partially understood. Activity planning for missions such as the Mars Exploration Rover mission presents many technical challenges, including oversubscription, consideration of time, concurrency, resources, preferences, and uncertainty. These challenges have all been addressed by the research community to varying degrees, but significant technical hurdles still remain. In addition, the integration of these capabilities into a single planning engine remains largely unaddressed. However, I argue that there is a deeper set of issues that needs to be considered – namely the integration of planning into an iterative process that begins before the goals, objectives, and preferences are fully defined. This introduces a number of technical challenges for planning, including the ability to more naturally specify and utilize constraints on the planning process, the ability to generate multiple qualitatively different plans, and the ability to provide deep explanation of plans. Introduction Often, planning systems are regarded as simple isolated components that accept a set of goals, a set of initial conditions, and a description of the possible actions that can be performed, as illustrated in Figure 1. The output is a plan – a program of actions – that can be executed to achieve the goals. Increasingly, planning systems are being applied to real world problems where responsiveness may be important, where replanning is the norm, and where the planning system must interface with humans. When humans actively take part in the decision making and planning process, the process is often referred to as mixed-initiative planning. Much of the work on mixed-initiative planning has focused on low-level guidance of the planning process – allowing the user to choose which goals or subgoals are considered next, to choose which actions should be used to achieve goals or subgoals, to choose when goals or subgoals are achieved, and to choose where actions should be placed in a plan. In addition, mixed initiative systems often c 2012, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. † This paper was invited as a Challenge paper to the AAAI’12 Sub-Area Spotlights track. Figure 1: The traditional view of planning as an independent component. 2180 MER Planning achieve more than is possible, given the time, energy and data storage available. As is typical in such problems there are different values to different goals, and combinations of goals, so that it is not an easy matter to identify an optimal subset of goals to pursue, even if the goals were independent of each other. Temporal Planning This is a temporal planning problem, which means that different actions have different durations, and concurrent actions are necessary. There are also many time constraints on various activities, due to illumination constraints, temperature constraints, solar power availability, atmospheric conditions, and communication windows. Resources There are discrete and continuous resources, such as state of battery charge and data storage, that are temporarily used, consumed, or produced by different activities. Preferences There are preferences involved – scientists may have preferences for one objective over another, but they may also have preferences for time windows, or for the order in which experiments are done. Uncertainty There is uncertainty about the initial state of the rover (battery charge, pose, terrain map), about the exogenous events (atmospheric conditions, dust devils, communication bandwidth, solar radiation), and about the outcomes of actions (pose, energy usage, time taken). All of these issues have received attention over the last ten years and considerable progress has been made. However, there are still some significant shortcomings to this work. For oversubscription planning, work has largely been limited to the special case of net-benefit planning, where actions are augmented with costs, goals are augmented with rewards, and the optimal plan is defined as the one with the greatest sum of rewards less action costs (Benton, Do, and Kambhampati 2009). What is missing is work on the tougher problem of oversubscription planning where actions consume resources and there are limits on the resources available (energy, data storage). For this class of problems, action “costs” (resource usage) are not directly comparable to goal rewards. For temporal planning, relatively little attention has been paid to problems with large numbers of exogenous events and time constraints. For this kind of problem, it is not at all clear that forward state-spaced search with any of the currently popular search heuristics will be very effective. For preferences, there has been little work on dealing with time preferences, namely preferences on the order in which certain activities are performed, or the time windows in which activities are performed.1 For planning under uncertainty, most work has been limited to consideration of instantaneous actions, and sequential plans. Dealing with concurrent actions that have uncertain duration, or have uncertain use of continuous resources such as energy, is particularly problematic. In a state-space approach, one is forced to encode time in the state space, and Figure 2 shows a meeting of the Science Operations Working Group (SOWG) for one of the MER rovers. During the first year of MER operations, a meeting like this took place each Martian night, in order to decide on the science goals and activities for the next day. There are a number of different scientists in the room, including planetary geologists, atmospheric scientists, and biologists. In addition, there are many engineers in the room with specialized knowledge of particular instruments, rover mobility and driving, arm manipulation and placement, thermal characteristics of the rover, the power system, communication systems, and various software systems. As with any group this large you would not expect there to be complete agreement about the goals for the next day. Different scientists have different places they want the rover to go and different measurements they want it to take. These measurements are not entirely independent; a scientist may want multiple different measurements of a specific rock, or might want atmospheric measurements at regular intervals. There are time constraints and preferences as well, due to lighting and temperature considerations. For example, when taking a visible image of a location, good illumination is important. However, when using an infrared spectrometer, the instrument needs to be cold and dark. There are many additional constraints on resources, such as energy and power available throughout the day, data storage available, and available communication windows. Figure 2: Science Operations Working Group (SOWG) meeting for one of the MER rovers. What we’d like to think is that the scientists would produce a nice clean set of goals, which, together with the objective criteria and current rover state (initial conditions), could be fed into a planning engine. We could then turn the crank and get out a detailed plan (like that shown in Figure 3) that could then be uplinked to the rover. Unfortunately, this is not a simple STRIPS-style planning problem, for a number of reasons: Oversubscription This is an oversubscription planning problem, which means that not all the goals can be accomplished, given the resources available. With a diverse group of scientists, it is no surprise that they want to 1 It is generally rather awkward to express many of these preferences in PDDL 3.0 (Gerevini et al. 2009). 2181 consider combinations of actions starting at different times. If actions have uncertain durations, the branching factor is large, and the search space quickly becomes intractable. There have been some notable attempts at addressing these problems (e.g. Younes and Simmons 2004, Aberdeen and Buffet 2007, Mausam and Weld 2008, Meuleau et al. 2009), but, so far, these techniques make many simplifying assumptions, and are far from achieving the kind of performance typical of state-of-the-art classical planning systems. Figure 4: Rover activity planning viewed as an iterative process of plan revision under constraints. while abandoning others. Thus, they might want to say something like: Keep activities A, B, and C, but make sure you do A before 4pm, and B before C. or Don’t do both D and E unless there is extra energy available after doing all the other primary goals. Figure 3: A rover plan. Note the prevalence of concurrency and the range of activity durations. While it may be possible to enforce such constraints by using PDDL 3.0 preferences (Gerevini et al. 2009) or cleverly modifying operator descriptions, it is awkward to do so, and it is unclear whether existing planners are able to efficiently cope with such constraints. Apart from these individual issues, there has been little attempt to integrate all of the above capabilities together. Each one of these problems is hard enough, and the techniques developed so far are not exactly plug-and-play. Multiple Plans In the early stages of the process, the scientists have not yet completely settled on or elucidated their preferences or their optimization criteria. As a result, it is not clear what the best plan is. The planner needs to be able to return multiple solutions, that are “qualitatively different” and somehow reflect the range of possible preferences or optimization criteria that might be considered by the scientists. The Broader Problem Although the above issues are a significant barrier to producing a system capable of tackling the MER planning problem, this alone is not enough. The broader problem is that initially the scientists don’t have a clear idea of what is achievable, of what their goals should be, of the relative value of different goals, or of their preferences. Through a process of proposing and examining different options they eventually arrive at a set of primary and secondary goals that they can hand off to a TAP to produce a detailed plan like that shown in Figure 3. If they could use a planner much earlier in the process it would help them to develop their goals and preferences. In effect, it would allow them to perform trade studies to examine the space of possible goals, preferences, and plans. This makes the planning part of an iterative process, like that shown in Figure 4, which has several interesting implications: Plan Explanation These plans are complicated and the scientists need to be able to ask questions and get back useful answers. Some questions, like: Why is activity A in the plan? would seen to be relatively easy to answer, but others such as: Why is action A done before action B? What would happen if I delayed action A until 4pm? Why wasn’t goal G chosen for inclusion in the plan? Plan Constraints As the process goes on, the scientists increasingly place constraints on the nature of the plan. For example, from an initial run, they might decide they like particular choices or activities, and want to keep them, Why didn’t you satisfy preference P ? are much tougher to answer because they require deeper analysis of the relationship between actions in the plan, 2182 are hypothetical in nature, or are negative questions asking why something wasn’t done differently. (2003), Gough, Fox, and Long (2004), and Foss, Onder, and Smith (2007). Addressing the above three issues would allow us to embed a planning system into an iterative process like that envisioned in Figure 4, and use it to perform trade studies that can help the scientists refine and elucidate their goals, objectives, and preferences. One could regard this overall process as being preference elicitation (Chen and Pu 2004; Brafman and Domshlak 2009). We are in fact, trying to help the scientists converge on the right “product”, in this case a plan to achieve their goals. As with many other instances of preference elicitation, direct questioning of the scientists to try to elicit their preferences would likely be annoying and would converge very slowly. In addition, the scientists may not even be aware of some of their preferences or their implications. The process we have described is more like that of example-critiquing as described by Viappiani, Faltings, and Pu (2006), in which examples (plans) are presented, and criticisms lead to the expression of additional preferences. In our case the preferences are often very complex. The scientists generally don’t select a specific plan or directly indicate a preference for one plan over another – instead, they often zero in on parts of plans that they want preserved or changed, and provide additional constraints that would enforce these preferences. In addition, these preferences tend to be heavily dependent on the the current state and resources available, as well as the set of goals being considered. These preferences also tend to evolve with the process of scientific discovery. • Plan revision under constraints. We need simple ways of expressing constraints on how a planner should be allowed to revise a plan (keep this subset of activities but do A after 4pm, and make sure you do B before C), and good search techniques for producing plans that satisfy those constraints. Being able to place constraints on the nature of a plan could be seen as a special case of a very old idea – McCarthy’s Advice Taker (McCarthy 1990). • Producing multiple, qualitatively different plans. While there has been some preliminary work in this area (Tate, Dalton, and Levine 1998; Myers and Lee 1999), we need to take a deeper look at this problem. The root of the problem is that the planner doesn’t initially have a complete model of the value of different goals or the costs (resource usage) of different actions. A real solution to this problem needs to explicitly consider uncertainty in the valuation of goals and uncertainty in resource usage of actions. A critical part of what it means for two plans to be qualitatively different is for those plans to make different assumptions about goal rewards, and action durations or resource usage. One possible source of inspiration here is work on presenting distinct solutions to users for purposes of preference elicitation (e.g. Viappiani, Faltings, and Pu 2006). • Plan explanation. Literature on the problem of plan explanation appears to be surprisingly sparse. Questions like: Why is activity A in the plan? Carving off Pieces can be answered relatively easily by elucidating the causal structure of the plan, thereby identifying what conditions the action achieves that are needed to support other actions and achieve desired goals. Questions like: The overall problem of building an integrated planning system that addresses all of the issues I raised in the preceding sections is quite daunting. Fortunately, there are some separable components and issues that I believe can be addressed by the research community: Why is this action done before that one? require a deeper analysis of the relationship between two activities, and the constraints that govern their relative positions in the plan. Bresina and Morris (2006) have done some work on explaining temporal inconsistencies in plans. Hypothetical questions, such as: • Oversubscription under resource constraints. It’s time to move beyond the simple net-benefit model and deal with the problem of generating plans for oversubscribed problems where actions use resources, and there are limits on the resources available. Benton, Do, and Kambhampati (2009) have developed effective heuristics for net-benefit problems, but it is not obvious how to extend these heuristics to this more general class of problems. What would happen if I delayed this action until 4pm? would seem to require modifying the plan and simulating it to determine what parts of the plan still work, and what parts will now violate time or resource constraints. A reasonable answer to this type of question might be something like: • Temporal planning with time windows, time constraints, and temporal preferences. While this can be done now, there is little indication that current heuristics will be effective on larger scale problems. What is needed here is the development of more effective search strategies and heuristics for these problems. Delaying this action until 4pm would mean delaying actions A4 A5 and A6 , so the goal of photographing Rock13 could no longer be completed while it is in direct sunlight, violating preference P2 . • Planning under time uncertainty. Here, what is needed is a practical, computationally tractable approach that worries less about constructing complete and optimal policies, and more about simple, partial policies that do things like introducing slack in important places, and making sure that the resulting plan will not result in a dead end with poor reward. Some early forays in this direction are Musliner, Durfee, and Shin (1993), Dearden et al. Questions, such as: Why wasn’t this resource used instead of that one? might require reinvoking the planner with additional constraints, and comparing the resulting plan with the original to determine what is achieved by the two different plans, 2183 References as well as how they differ structurally. Finally, purely negative questions, such as: Aberdeen, D., and Buffet, O. 2007. Concurrent probabilistic temporal planning with policy gradients. In Proc. of the Seventeenth Intl. Conf. on Automated Planning and Scheduling (ICAPS-07). Benton, J.; Do, M.; and Kambhampati, S. 2009. Anytime heuristic search for partial satisfaction planning. Artificial Intelligence 173:562–592. Brafman, R. I., and Domshlak, C. 2009. Preference handling – an introductory tutorial. AI Magazine 30(1):58–86. Bresina, J. L., and Morris, P. H. 2006. Explanations and recommendations for temporal inconsistencies. In Proc. of the Fifth Intl. Workshop on Planning and Scheduling for Space (IWPSS-06). Bresina, J.; Jonsson, A.; Morris, P.; and Rajan, K. 2005. Activity planning for the Mars Exploration Rovers. In Proc. of the Fifteenth Intl. Conf. on Automated Planning and Scheduling (ICAPS-05), 40–49. Chen, L., and Pu, P. 2004. Survey of preference elicitation methods. Technical Report IC/2004/67, Ecole Politechnique Federale de Lausanne (EPFL). Dearden, R.; Meuleau, N.; Ramakrishnan, S.; Smith, D.; and Washington, R. 2003. Incremental contingency planning. In Proc. of ICAPS 2003 Workshop on Planning under Uncertainty and Incomplete Information. Foss, J.; Onder, N.; and Smith, D. 2007. Preventing unrecoverable failures through precautionary planning. In Proc. of ICAPS 2007 Workshop on Moving Planning and Scheduling Systems into the Real World. Gerevini, A.; Haslum, P.; Long, D.; Saettia, A.; and Dimopoulos, Y. 2009. Deterministic planning in the Fifth International Planning Competition: PDDL3 and experimental evaluation of the planners. Artificial Intelligence 173:619– 668. Gough, J.; Fox, M.; and Long, D. 2004. Plan execution under resource consumption uncertainty. In Proc. of ICAPS 2004 Workshop on Connecting Planning Theory with Practice, 24–29. Mausam, and Weld, D. 2008. Planning with durative actions in stochastic domains. Journal of Artificial Intelligence Research 31:33–82. McCarthy, J. 1990. Programs with common sense. In Lifschitz, V., ed., Formalizing Common Sense: Papers by John McCarthy. Ablex. Meuleau, N.; Benazera, E.; Brafman, R.; Hansen, E.; and Mausam. 2009. A heuristic search approach to planning with continuous resources in stochastic domains. Journal of Artificial Intelligence Research 34:27–59. Musliner, D.; Durfee, E.; and Shin, K. 1993. Circa: a cooperative intelligent real-time control architecture. IEEE Transactions on Systems, Man and Cybernetics 23(6):1561–1574. Myers, K., and Lee, T. 1999. Generating qualitatively different plans through metatheoretic biases. In Proc. of the Sixteenth National Conf. on Artificial Intelligence (AAAI-99), 570–576. Why didn’t you satisfy this preference? also seem to require replanning with additional constraints – in this case enforcing the preference. An answer to this type of question would again seem to require comparison of the new plan with the previous plan. Considering this problem in the context of planning appears to give us the ability to actually answer such tougher hypothetical or counterfactual questions, which goes well beyond current work on inferential question answering in other areas of AI. Conclusions Activity planning for the MER rovers presents many technical challenges, including consideration of time, concurrency, resources, preferences, and uncertainty. These have all been addressed by the research community to varying degrees, but significant technical hurdles still remain. The integration of these techniques into a single planning engine also remains largely unaddressed. In addition, I have argued that there is a deeper set of issues that needs to be addressed – namely the integration of planning into an iterative process that begins before the goals, objectives, and preferences are fully defined. This has a number of technical implications for planning, including the need to more naturally specify and utilize constraints on the planning process, the need to generate multiple qualitatively different plans, and the need to provide deep explanation of planning decisions. Although I introduced these challenges in the context of planning for Mars Rovers, the process and issues I’ve outlined are quite typical of the planning that goes on in many complex science missions. In particular, similar processes take place in planning for crew activities aboard the International Space Station (ISS). For the ISS, there is the additional complication that the resulting plans will be executed by astronauts, rather than robots. The plans must therefore be easily understandable by the astronauts, who may want to ask their own questions. As smart executives, astronauts don’t readily tolerate stupidity or obvious inefficiency in the plans. In addition, they may take the liberty of reordering actions, interleaving tasks, collaboration, or substituting resources. This means that any run-time replanning must be able to model and take these deviations into account as well. Acknowledgments This paper grew out of a panel presentation I originally gave at ICAPS 2010. Thanks to John Bresina, Paul Morris, and Tony Barrett for their insights and observations on rover planning. Thanks to Jeremy Frank for insights on ISS crew activity planning. Thanks to Jörg Hoffman for comments on the paper, and for pointing out the connection with preference elicitation Thanks to Ronen Brafman for additional insights on work in preference elicitation. This work has been supported by the NASA Exploration Systems Program and the NASA Aviation Safety Program. 2184 Tate, A.; Dalton, J.; and Levine, J. 1998. Generation of multiple qualitatively different plan options. In Proc. of the Fourth Intl. Conf. on Artificial Intelligence Planning and Scheduling (AIPS-98). Viappiani, P.; Faltings, B.; and Pu, P. 2006. Preferencebased search using example-critiquing with suggestions. Journal of Artificial Intelligence Research 27:465–503. Younes, H., and Simmons, R. 2004. Solving generalized semi-markov decision processes using continuous phasetype distributions. In Proc. of the Nineteenth National Conf. on Artificial Intelligence (AAAI-04), 742. 2185

Planning as an Iterative Process David E. Smith

Related documents

Products

Support

Planning as an Iterative Process David E. Smith

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib