Planning with Local Search MERS Seminar Lecture March 6, 2003 Jonathan Kennell Presentation Outline Planning Overview – – What is planning? – 5 mins. Taxonomy of planners – 40 mins. (or everything you ever wanted to know about planning in approximately 40 minutes) 5 minute break LPG – – – – Background information (WalkSAT) – 10 mins. Linear action graphs and precedence graphs – 10 mins. WalkPlan planning algorithm – 10 mins. Example – 10 mins. What is Planning? Input – – – – Set of world-states Action operators (fn: world-state world-state) Initial world-state Goal (possibly a partial state / set of world-states) Output – Ordering of actions From 6.834J POP lecture World State Set of facts and their degree of truth – Examples: (Student Jonathan) (Likes Jonathan Golf) (Graduating Jonathan June) // true // false // unknown * Note: lisp notation used extensively in planning community * Most planners don’t consider unknown facts Planning Operators Fn: world-state world-state Generally use STRIPS format: – Preconditions: facts that must be true before action can occur – Effects: facts that become true (or false) after the action occurs Extra properties: – Separate start / invariant / end conditions and effects – Durations – Resource constraints (:action Move (:params ((robot ?r) (location ?a) (location ?b)) (:preconds (at ?r ?a)) (:effects (and (not (at ?r ?a)) (at ?r ?b)))) Mutual Exclusion Sometimes planning operators conflict with each other – we call a pair of conflicting operators mutex Examples of mutex actions: – – Interference: A deletes precondition or effect of B Competing Needs: A and B have mutex preconditions Planner must ensure no mutex actions co-occur. What is a plan? A plan is an ordering of actions that will transition the system from the initial state to the goal state. Activity-A fact-L Activity-C fact-J fact-O fact-M Start End Activity-B fact-K Activity-D fact-N fact-P Completeness / Consistency / Minimality Complete Plan – – A plan is complete IFF every precondition of every activity is achieved. An activity’s precondition is achieved IFF: Consistent Plan – – The precondition is the effect of a preceding activity (support), and No intervening step conflicts with the precondition (mutex). The plan is consistent IFF the temporal constraints of its activities are consistent (the associated distance graph has no negative cycles), and no conflicting (mutex) activities can co-occur. Minimal Plan – The plan is minimal IFF every constraint serves a purpose, i.e., If we remove any temporal or symbolic constraint from a minimal plan, the new plan is not equivalent to the original plan Variations on Classical Planning Temporal planning – Planning with resources – Actions have durations Facts can be quantified Planning with uncertainty – Effects / durations of actions not guaranteed Taxonomy of Planners TLPlan Global Search Forward Chaining / Backward Propagation (entire plan-space) Kirk Deductive Controller Local Search Graphplan Planners Plan Graph (condensed plan-space) LPGP Macro Decomposition (restricted plan-space) SHOP2 Kirk TPN Planner LPG Forward Chaining / Backward Propagation Searches through entire plan-space by nondeterministically adding actions to plan candidates. Advantages: – – generative (does not require strategies) expressive (can handle time, resources, easily) Disadvantages: – Inherently slow (plan-space is enormous) Forward Chaining Example Etc. Familiar tradeoff: Efficient pruning methods versus optimality. Case Study: TLPlan TLPlan (Temporal Logic Planner) by Fahiem Bacchus and Froduald Kabanza TLPlan is based on a forward-chaining planner TLPlan uses domain-dependent temporal logic to prune the search space TLPlan: First-order Temporal Logic Definition: First-order linear temporal logic – standard first-order logic, plus: U (until), □ (always), ◊ (eventually), ○ (next) Bounded quantifiers: [x:y] x . y(x)(x) – [x:y] x . y(x)(x) – Example: – □(on(B,C) (on(B,C) U on(A,B))) – Asserts that whenever we enter a state in which B is on C it remains on C until A is on B TLPlan: Formula Progression Algorihtm The Progress algorithm is used to check control strategies as the system searches for a plan. Inputs: An LTL formula f and a world w (generated by forwardchaining) Output: A new formula f+, also expressed as an LTL formula, representing the progression of f through the world w. Algorithm: Progress(f,w) – Case 1. 2. 3. 4. f = is atomic: if w entails f, f+ := TRUE, else f+ = FALSE f = f1 f2: f+ := Progress(f1,w) Progress(f2,w) f = f1: f+ := Progress(f1,w) … etc. … (see paper for complete algorithm) TLPlan Example Forward chaining begins… Rules: Etc. (Any color) This thread is efficiently guided by the rules This thread is not guided well since no rules apply. This results in pure forward-chaining search. TLPlan Review TLPlan has been around in various implementations since 1995, although improvements have been made as recently as last year. TLPlan functions initially as a forward-chaining planner, but can use logical rules to guide its search and prune unfeasible threads. TLPlan was the fastest domain-specific planner in the 2002 AIPS competition. Domain Knowledge Planning is hard – the most general planners are extremely slow To increase speed, some planners sacrifice generality by using domain-specific strategies. TLPlan encodes the strategy into the goal specification, while other planners decouple the goals and the strategies. Forward Chaining Speedup Many researches have focused on discovering ways to help speedup domain-independent forward chaining planners. – Ex. SAPA by Minh B. Do & Subbarao Kambhampati Methods focus on estimating plan cost using: – Relaxed plan-graphs – Estimated remaining cost to goal Cost metrics Ex. # actions, plan duration, etc. Taxonomy of Planners TLPlan Global Search Forward Chaining / Backward Propagation (entire plan-space) Kirk Deductive Controller Local Search Graphplan Planners Plan Graph (condensed plan-space) LPGP Macro Decomposition (restricted plan-space) SHOP2 Kirk TPN Planner LPG Plan Graph Plan-graph based planners first construct a compact representation of the planspace (the plan-graph), and then search that space. Plan-graphs contains all possible plans up to a certain size, excluding incomplete plans with co-occurring binary mutex actions. Plan-graphs do not exclude all invalid plans, and depending on the domain may yield extremely efficient or inefficient results. Advantages: – – – generative much faster than most forward-chaining planners plan-graph can be generated in polynomial time and space Disadvantages: – – plan-graphs are less expressive (resources and time difficult) in certain domains, search of plan-graph can be very inefficient Forward Chaining vs. Plan Graph Forward Chaining Plan Graph Case Study: Graphplan Note the compact structure in this graph – it’s polynomial in size! Mutex Relationships Case Study: LPGP Idea: – – Advantage: – use Graphplan to identify complete plan (action structure) then use Linear Programming to determine plan consistency and perform scheduling (assign durations to actions) Two-phase approach accomplishes temporal planning with the speed of a plan-graph based planner Disadvantages: – – Cannot optimize over time (only optimizes over makespan) Two-phase approach is potentially very inefficient no temporal conflicts are used to guide Graphplan search search not incremental – LP must be started from scratch each time Taxonomy of Planners TLPlan Global Search Forward Chaining / Backward Propagation (entire plan-space) Kirk Deductive Controller Local Search Graphplan Planners Plan Graph (condensed plan-space) LPGP Macro Decomposition (restricted plan-space) SHOP2 Kirk TPN Planner LPG Macro Decomposition Operates similar to context-free grammar – – planner non-deterministically expands “macro-activities” until all plan actions are primitive. rules ensure that planner only explores space of complete plans Planner still must ensure plan consistency. Advantages – Fast Disadvantages – – all achieving strategies must be pre-encoded into macros non-optimal: explores restricted plan-space, potentially excluding optimal solutions Case Study: SHOP2 SHOP2 by Dana Nau, Hector Munoz-Avila, Yue Cao, Amnon Lotem and Steven Mitchell SHOP2 works similar to the task-decomposition mechanism in Kirk SHOP2 problems consist of: – – – Operators (with preconditions, add-effects and delete-effects) Methods (rules for how to progress the plan) Initial conditions and goals SHOP2 is fairly fast, but all plan happenings must be pre-designed (at some level) by a programmer. SHOP2 plans do not support concurrency SHOP2 Example (defdomain basic-example ( (:operator (pickup ?a) () () ((have ?a))) (:operator (drop ?a) ((have ?a)) ((have ?a)) ()) (:method (swap ?x ?y) ((have ?x)) ((drop ?x) (pickup ?y)) ((have ?y)) ((drop ?y) (pickup ?x))))) (defproblem problem1 basic-example ((have banjo)) ((swap banjo kiwi))) Initial Condition Start Strategy Preconds Delete-effects Add-effects Condition Strategy Allows one method to decompose into multiple possible subplans, depending on the current state SHOP2 In Action (defdomain basic-example ( (:operator (pickup ?a) () () ((have ?a))) (:operator (drop ?a) ((have ?a)) ((have ?a)) ()) ? (:method (swap ?x banjo ?y) kiwi) ((have ?x)) banjo)) ((drop ?x) banjo) (pickup (pickup ?y))kiwi)) ((have ?y)) kiwi)) ((drop ?y) kiwi)(pickup (pickup?x))))) banjo))))) (defproblem problem1 basic-example (have banjo) ((have banjo)) ((swap banjo kiwi))) DONE State: (have kiwi) Case Study: SHOP2 Case Study: Kirk TPN Planner Macro-Activity() [l,u] Decomposition 1 Decomposition 2 5 Minute Break Presentation Outline Planning Overview – – What is planning? – 5 mins. Taxonomy of planners – 40 mins. (or everything you ever wanted to know about planning in approximately 40 minutes) 5 minute break LPG – – – – Background information (WalkSAT) – 10 mins. Linear action graphs and precedence graphs – 10 mins. WalkPlan planning algorithm – 10 mins. Example – 10 mins. Taxonomy of Planners TLPlan Global Search Forward Chaining / Backward Propagation (entire plan-space) Kirk Deductive Controller Local Search Graphplan Planners Plan Graph (condensed plan-space) LPGP Macro Decomposition (restricted plan-space) SHOP2 Kirk TPN Planner LPG Local Search: WalkSAT WalkSAT is a randomized algorithm for solving SAT (propositional satisfiability) problems. It builds on the DPLL algorithm, but utilizes local search and randomness. WalkSAT Problem: – Find a satisfying assignment to a logic formula (A || !B) && (B || !C) && (C || !A) && (A || B || C) WalkSAT: – – Pick a random assignment to the variables Until formula satisfied (or up to some max # of iterations), Choose an unsatisfied clause and enumerate the ways of adjusting the variables in order to satisfy it With probability p – Choose the best-utility adjustment Else – Choose a random adjustment WalkSAT Example (A || !B) && (B || !C) && (C || !A) && (A || B || C) Pick !A, !B, !C – – Pick A, !B, !C – – (A || !B) && (B || !C) && (C || !A) && (A || B || C) Options are to switch A or C Pick A, !B, C – – (A || !B) && (B || !C) && (C || !A) && (A || B || C) Options are to switch A, B, or C (A || !B) && (B || !C) && (C || !A) && (A || B || C) Options are to switch B or C Pick A, B, C – – (A || !B) && (B || !C) && (C || !A) && (A || B || C) Formula Satisfied! WalkSAT Discussion WalkSAT has proven to be very fast at solving complicated SAT problems – WalkSAT can solve some problems that systematic algorithms simply can’t handle Due to randomness, WalkSAT is incomplete – WalkSAT may fail to discover a solution Introduction to LPG LPG (local search for plan-graphs) – by Alfonso Gerevini and Ivan Serina Blackbox mapped the planning problem to a CSP and solved it using a SAT solver. LPG unifies the planning and WalkSAT algorithms to create the WalkPlan search algorithm. LPG Big Idea Big Idea: – – Start with a random plan While plan is incorrect / inconsistent Identify and repair conflict Basically the same idea of WalkSAT, but applied to a special form of plan-graph Temporal Action Graphs Definitions: – Action-graph: the subset of a plan-graph containing the action layers – Support: a fact is said to be “supported” if it is achieved by some action in the previous action layer – Conflict: a mutex between two actions an action with an unsupported precondition Linearization of Action Graphs An Action Graph can be made linear by allowing only one action per action layer. The layers no longer explicitly represent an ordering of time (temporal concurrency is still possible) The layer ordering simply presents an action sequence for the purposes of establishing fact support relationships. Example: Linear Action Graph A A0 A A1 A A0 A No-op No-op A No-op B B B B C C C C B A2 C A plan-graph consists of alternating fact layers and action layers. The actions alone constitute an action graph. LPG operates directly on the action graph structure, inserting and removing actions from various action layers as it repairs incomplete plans. Example: Temporal Action Graph Conflicts and Repair An incomplete plan is manifested as an action graph with conflicts. Example conflicts with resolution (repair) strategies: Conflict Description Conflict Resolution Strategies Permanent mutex between two actions in the same action layer Remove one of the actions Precondition mutex between two actions in the same action layer Unsupported precondition for an action in an action layer Remove one of the actions Add support for one of the mutex preconditions Add an action to the previous action layer that achieves the unsupported precondition Remove the action whose preconditions are not satisfied LPG’s WalkPlan Planning Algorithm LPG Algorithm LPG: Generate an initial dummy plan, P, either… 1. – Randomly – By adding actions to support all facts ignoring mutexes, or – Via some front-end plan generator Generate Initial Plan 2. Randomly choose a conflict in the action-graph, C 3. Identify all possible ways of resolving C and evaluate them using the action evaluation function Choose Conflict – Resolution techniques include: removing one of two mutex actions, adding a supporting action for an unsupported fact, or removing an action that has an unsupported precondition – If a conflict resolution has cost 0, the plan is complete – Note: The action evaluation function uses Lagrange multipliers to dynamically weight the different factors in the action evaluation function Resolve & Evaluate If a resolution introduces no new conflicts, apply it and go to step (2) Else, 4. Resolution Selection – with probability p, randomly choose a resolution, apply it and go to step (2) – with probability 1-p, choose the lowest cost conflict resolution, apply it and go to step (2) – Note: The resolution step includes a mechanism for extending the plan-graph LPG Example No-op A0 A Permanently mutex actions A1 A1 in the same action layer No-op No-op by removing No-op (resolved one B B B B of the two actions) Unsupported Unsupported precondition precondition (resolved achieving Unsupported precondition (resolved by by adding removing the No-op No-op A2 action at previous layer)A2 achieving conflicting action) action C C(resolved by adding C C action at previous action layer) A A0 A Initial Conditions: ( nil ) Goals: ( A, B, C ) Actions: A Note: No-ops are propagated during conflict resolution A0: preconds ( nil ) effects ( A ) Initial dummy plan Identify conflict Resolve conflict A1: preconds ( A ) effects ( A, B ) A2: preconds ( A, B ) effects ( C ) Plan complete A B C LPG Analysis Advantages: – – – LPG is fast – four orders of magnitude faster than the leading optimal planners LPG is domain-independent LPG can easily handle resources and durative actions Disadvantages: – LPG is randomized, so plans are not usually optimal and often contain extraneous actions LPG includes option to continue searching for multiple solutions, in the hope of finding better plans While maintaining expressivity, LPG sacrifices optimality for speed. AIPS 2002 Results (subset) Planner Problems Solved Problems Attempted Success Ratio Capabilities 99% (Strips, Numeric, HardNumeric, SimpleTime, Time, Complex) 100% (Strips, Numeric, HardNumeric, SimpleTime, Time, Complex) 87% (Strips, Numeric, HardNumeric, SimpleTime, Time) SHOP2 2nd place (hand-coded) 899 904 TLPlan 1st place (hand-coded) 894 894 LPG 1st place (fully-automated) 372 428 Summary Planning is hard! – We want planners that Want a speedup? – – – are fast are domain-independent are optimal handle durative actions / resources / uncertainty Sacrificing expressivity helps Sacrificing optimality helps more Sacrificing generality helps the most LPG is today’s best planner than is domain-independent, expressive, and fast – to achieve speed, it sacrifices optimality and uses local search. Planning References Planning in general: – AIPS International Planning Competition, 2002: – http://www.dur.ac.uk/d.p.long/competition.html Graphplan: – – Russell and Norvig, “Artificial Intelligence: A Modern Approach”, section IV, Prentice Hall; 2nd edition (December 20, 2002) A. Blum and M. Furst, “Fast Planning Through Planning Graph Analysis”, Artificial Intelligence, 90:281—300 (1997). www.cs.cmu.edu/~avrim/graphplan.html LPG: – – A. Gerevini and I. Serina, “Planning through Stochastic Local Search and Temporal Action Graphs”, technical report from Universita degli Studi di Brescia, November, 2002. prometeo.ing.unibs.it/lpg/