Planning Graph Based Reachability Heuristics

Planning Graph Based Reachability Heuristics Daniel Bryce & Subbarao Kambhampati ICAPS’06 Tutorial 6 June 7, 2006 http://rakaposhi.eas.asu.edu/pg-tutorial/ dan.bryce@asu.edu rao@asu.edu, http://verde.eas.asu.edu http://rakaposhi.eas.asu.edu Motivation Ways to improve Planner Scalability Problem Formulation Search Space Reachability Heuristics Domain (Formulation) Independent Work for many search spaces Flexible – work with most domain features Overall compliment other scalability techniques Effective!! June 7th, 2006 ICAPS'06 Tutorial T6 2 Scalability of Planning Problem is Search Control!!! Before, planning algorithms could synthesize about 6 – 10 action plans in minutes Significant scaleup in the last 6-7 years Realistic encodings of Munich airport! Now, we can synthesize 100 action plans in seconds. The primary revolution in planning in the recent years has been domain-independent heuristics to scale up plan synthesis June 7th, 2006 ICAPS'06 Tutorial T6 3 Topics Classical Planning Rao Cost Based Planning Dan Partial Satisfaction Planning Resources (Continuous Quantities) Rao Temporal Planning Non-Deterministic/Probabilistic Planning Hybrid Models June 7th, 2006 ICAPS'06 Tutorial T6 Dan 4 Classical Planning June 7th, 2006 ICAPS'06 Tutorial T6 5 ICAPS'06 Tutorial T6 6 Rover Domain June 7th, 2006 Classical Planning Relaxed Reachability Analysis Types of Heuristics Level-based Relaxed Plans Mutexes Heuristic Search Progression Regression Plan Space Exploiting Heuristics June 7th, 2006 ICAPS'06 Tutorial T6 7 Planning Graph and Search Tree Envelope of Progression Tree (Relaxed Progression) Proposition lists: Union of states at kth level Lowerbound reachability June 7th, 2006 information ICAPS'06 Tutorial T6 8 Level Based Heuristics The distance of a proposition is the index of the first proposition layer in which it appears Proposition distance changes when we propagate cost functions – described later What is the distance of a Set of propositions?? Set-Level: Index of first proposition layer where all goal propositions appear Admissible Gets better with mutexes, otherwise same as max Max: Maximum distance proposition Sum: Summation of proposition distances June 7th, 2006 ICAPS'06 Tutorial T6 9 Example of Level Based Heuristics set-level(sI, G) = 3 max(sI, G) = max(2, 3, 3) = 3 sum(sI, G) = 2 + 3 + 3 = 8 June 7th, 2006 ICAPS'06 Tutorial T6 10 Distance of a Set of Literals Prop list Level 0 Action list Level 0 At(0,0) noop At(0,0) Action list Level 1 Prop list Level 1 x Move(0,0,0,1) x xx Move(0,0,1,0) Key(0,1) noop At(1,0) key(0,1) …... noop noop At(0,0) Move(0,1,1,1) At(0,1) Prop list Level 2 …... x Move(1,0,1,1) noop Pick_key(0,1) x noop x At(0,1) xx x At(1,1) x x x At(1,0) x Have_key ~Key(0,1) Key(0,1) Admissible x x Mutexes h(S) = ∑p∈S lev({p}) Sum Partition-k h(S) = lev(S) Set-Level Adjusted Sum Set-Level with memos Combo lev(p) : index of the first level at which p comes into the planning graph lev(S): index of the first level where all props in S appear non-mutexed. If there is no such level, then If the graph is grown to level off, then ∞ Else k+1 (k is the current length of the graph) June 7th, 2006 ICAPS'06 Tutorial T6 11 How do Level-Based Heuristics Break? P0 q A0 B1 B2 B3 B99 B100 P1 q p1 p2 p3 p99 p100 A1 B1 B2 B3 B99 B100 A P2 q p1 p2 p3 p99 p100 g The goal g is reached at level 2, but requires 101 actions to support it. June 7th, 2006 ICAPS'06 Tutorial T6 12 Relaxed Plan Heuristics When Level does not reflect distance well, we can find a relaxed plan. A relaxed plan is subgraph of the planning graph, where: Every goal proposition is in the relaxed plan at the level where it first appears Every proposition in the relaxed plan has a supporting action in the relaxed plan Every action in the relaxed plan has its preconditions supported. Relaxed Plans are not admissible, but are generally effective. Finding the optimal relaxed plan is NP-hard, but finding a greedy one is easy. Later we will see how “greedy” can change. June 7th, 2006 ICAPS'06 Tutorial T6 13 Example of Relaxed Plan Heuristic Support Goal Propositions Individually June 7th, 2006 Count Actions RP(sI, G) = 8 ICAPS'06 Tutorial T6 Identify Goal Propositions 14 Results Relaxed Plan Level-Based June 7th, 2006 ICAPS'06 Tutorial T6 15 Results (cont’d) Relaxed Plan Level-Based June 7th, 2006 ICAPS'06 Tutorial T6 16 Optimizations in Heuristic Computation Taming Space/Time costs Bi-level Planning Graph representation Partial expansion of the PG (stop before level-off) It is FINE to cut corners when using PG for heuristics (instead of search)!! Goals C,D are present Example: •A •B •C Select actions in lev(S) vs Levels-off (incomplete) Consider action appearing in RP June 7th, 2006 ICAPS'06 Tutorial T6 (incomplete) •E •E 100000 Levels-off Lev(S) Time(Seconds) 1000 100 10 1 1 11 21 31 41 51 61 71 81 91 101 111 121 131 0.1 Problems 17 Adjusting for Negative Interactions Until now we assume actions only positively interact, but they often conflict Mutexes help us capture some negative interactions Types Actions: Interference/Competing Needs Propositions: Inconsistent Support Binary are the most common and practical |A| + 2|P|-ary will allow us to solve the planning problem with a backtrack-free GraphPlan search An action layer may have |A| actions and 2|P| noops Serial Planning Graph assumes all non-noop actions are mutex June 7th, 2006 ICAPS'06 Tutorial T6 x x •D Heuristic extracted from partial graph vs. leveled graph 10000 Branching factor can still be quite high Use actions appearing in the PG (complete) •C x x •D •D x •B •C •C x ff •A x •B Trade-off •B x •B els o Le v •A •A •A x Discarded 18 141 151 161 Binary Mutexes P0 A0 avail(soil, α) sample(soil, α) avail(soil, α) A1 avail(soil, α) sample(soil, α) have(soil) only supporter is mutex with at(β) only supporter drive(α, β) --Inconsistent Support-avail(image, γ) avail(rock, β) avail(rock, β) avail(rock, β) avail(image,γ) P1 have(soil) has a supporter not mutex with a supporter of at(β), P2 2---feasible together at level drive(α, β) drive(α, γ) avail(image, γ) drive(α, γ) at(α) at(α) sample needs at(α), drive negates at(α) --Interference-- at(α) sample(image,γ) have(soil) have(soil) sample(rock, β) have(image) commun(soil) have(rock) drive(β, γ) at(β) drive(β, α) Set-Level(sI, {at(Β),.have(soil)}) = 2 Max(sI, {at(Β),. have(soil)}) = 1 June 7th, 2006 at(β) comm(soil) drive(γ, α) at(γ) ICAPS'06 Tutorialdrive(γ, T6 β) at(γ) 19 Adjusting the Relaxed Plans Start with RP heuristic and adjust it to take subgoal interactions into account Negative interactions in terms of “degree of interaction” Positive interactions in terms of co-achievement links Ignore negative interactions when accounting for positive interactions (and vice versa) PROBLEM Level Sum AdjSum2M Gripper-25 - 69/0.98 67/1.57 Gripper-30 - 81/1.63 77/2.83 Tower-7 127/1.28 127/0.95 127/1.37 Tower-9 511/47.91 511/16.04 511/48.45 8-Puzzle1 31/6.25 39/0.35 31/0.69 8-Puzzle2 30/0.74 34/0.47 30/0.74 Mystery-6 - - 16/62.5 8/0.49 Mistery-9 8/0.53 8/0.66 Mprime-3 4/1.87 4/1.88 4/1.67 Mprime-4 8/1.83 8/2.34 10/1.49 Aips-grid1 14/1.07 14/1.12 14/0.88 Aips-grid2 - - 34/95.98 HAdjSum2M(S) = length(RelaxedPlan(S)) + max p,q∈S δ(p,q) Where δ(p,q) = lev({p,q}) - max{lev(p), lev(q)} /*Degree of –ve Interaction */ June 7th, 2006 ICAPS'06 Tutorial T6 [AAAI 2000] 20 Anatomy of a State-space Regression planner Action Templates Planning Graph Graphplan Graph Extension Phase (based on STAN) Actions in the Last Level Extraction of Heuristics Heuristic Regression Planner (based on HSP-R) Problem Specification (Initial and Goal State) Problem: Given a set of subgoals (regressed state) estimate how far they are from the initial state [AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003] June 7th, 2006 ICAPS'06 Tutorial T6 21 Rover Example in Regression s4 comm(soil) s3 comm(soil) s2 comm(soil) s1 comm(soil) sG comm(soil) avail(rock, β) avail(rock, β) have(rock) comm(rock) comm(rock) avail(image, γ) have(image) have(image) have(image) comm(image) at(β) at(β) at(γ) commun(rock) commun(image) sample(rock, β) sample(image, γ) Should be ∞, s4 is inconsistent, how do we improve the heuristic?? Sum(sI, s4) = 0+0+1+1+2=4 June 7th, 2006 Sum(sI, s3) = 0+1+2+2=5 ICAPS'06 Tutorial T6 22 AltAlt Performance Logistics Domain(AIPS-00). Schedule Domain (AIPS-00) 100000 10000 100000 STAN3.0 HSP2.0 HSP-r AltAlt1.0 Level-based Adjusted RP Logistics STAN3.0 HSP2.0 AltAlt1.0 1000 Time(Seconds) Time(Seconds) 1000 10000 100 10 100 10 Scheduling 1 1 1 0.1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 0.1 0.01 0.01 Problems. Problems Problem sets from IPC 2000 June 7th, 2006 ICAPS'06 Tutorial T6 23 Plan Space Search Then it was cruelly UnPOPped The good times return with Re(vived)POP In the beginning it was all POP. June 7th, 2006 ICAPS'06 Tutorial T6 24 POP Algorithm 1. Plan Selection: Select a plan P from the search queue 2. Flaw Selection: Choose a flaw f (open cond or unsafe link) 3. Flaw resolution: If f is an open condition, choose an action S that achieves f If f is an unsafe link, choose promotion or demotion Update P Return NULL if no resolution exist 4. If there is no flaw left, return P 1. Initial plan: g1 g2 Sinf S0 2. Plan refinement (flaw selection and resolution): q1 S0 p S1 oc1 oc2 S2 S3 g1 Sinf g2 g2 ~p Choice points • Flaw selection (open condition? unsafe link? Non-backtrack choice) • Flaw resolution/Plan Selection (how to select (rank) partial plan?) June 7th, 2006 ICAPS'06 Tutorial T6 25 PG Heuristics for Partial Order Planning Distance heuristics to estimate cost of partially ordered plans (and to select flaws) If we ignore negative interactions, then the set of open conditions can be seen as a regression state q1 S1 p S0 Post disjunctive precedences and use propagation to simplify June 7th, 2006 ICAPS'06 Tutorial T6 g1 S5 S4 q r S2 Mutexes used to detect indirect conflicts in partial plans A step threatens a link if there is a mutex between the link condition and the steps’ effect or precondition S3 g2 g2 ~p p Si Sinf Sj q Sk r if mutex( p, q) or mutex( p, r ) S k p Si ∨ S j p S k 26 Regression and Plan Space s4 comm(soil) s3 comm(soil) s2 comm(soil) s1 comm(soil) sG comm(soil) avail(rock, β) avail(rock, β) have(rock) comm(rock) comm(rock) avail(image, γ) have(image) have(image) have(image) comm(image) at(β) at(β) at(γ) commun(rock) commun(image) sample(rock, β) sample(image, γ) at(α) S0 avail(soil,α) avail(rock,β) avail(image, γ) avail(rock, β) at(β) have(rock) S3 have(rock) S2 comm(rock) commun(rock) sample(rock, β) have(image) comm(soil) comm(rock) S∞ comm(image) S1 comm(image) commun(image) at(α) S0 avail(soil,α) avail(rock,β) avail(image, γ) June 7th, 2006 avail(rock, β) at(β) have(rock) S3 have(rock) comm(rock) commun(rock) sample(rock, β) have(image) avail(image, γ) at(γ) S2 comm(soil) comm(rock) S∞ comm(image) S1 comm(image) commun(image) ICAPS'06 Tutorial T6 sample(image, γ) S4 have(image) 27 RePOP’s Performance RePOP implemented on top of UCPOP Dramatically better than any other partial order planner before it Competitive with Graphplan and AltAlt VHPOP carried the torch at ICP 2002 Problem UCPOP RePOP Graphplan AltAlt Gripper-8 - 1.01 66.82 .43 Gripper-10 - 2.72 47min 1.15 Gripper-20 - 81.86 - 15.42 Rocket-a - 8.36 75.12 1.02 Rocket-b - 8.17 77.48 1.29 Logistics-a - 3.16 306.12 1.59 Logistics-b - 2.31 262.64 1.18 Logistics-c - 22.54 - 4.52 Logistics-d - 91.53 - 20.62 Bw-large-a 45.78 (5.23) - 14.67 4.12 Bw-large-b - (18.86) - 122.56 14.14 (137.84) - - 116.34 Bw-large-c - Written in Lisp, runs on Linux, 500MHz, 250MB You see, pop, it is possible to Re-use all the old POP work! [IJCAI, 2001] June 7th, 2006 ICAPS'06 Tutorial T6 28 Exploiting Planning Graphs Restricting Action Choice Use actions from: Last level before level off (complete) Last level before goals (incomplete) First Level of Relaxed Plan (incomplete) – FF’s helpful actions Only action sequences in the relaxed plan (incomplete) – YAHSP Reducing State Representation Remove static propositions. A static proposition is only ever true or false in the last proposition layer. June 7th, 2006 ICAPS'06 Tutorial T6 29 Classical Planning Conclusions Many Heuristics Set-Level, Max, Sum, Relaxed Plans Heuristics can be improved by adjustments Mutexes Useful for many types of search Progresssion, Regression, POCL June 7th, 2006 ICAPS'06 Tutorial T6 30 Cost-Based Planning June 7th, 2006 ICAPS'06 Tutorial T6 31 Cost-based Planning Propagating Cost Functions Cost-based Heuristics Generalized Level-based heuristics Relaxed Plan heuristics June 7th, 2006 ICAPS'06 Tutorial T6 32 Rover Cost Model June 7th, 2006 ICAPS'06 Tutorial T6 33 Cost Propagation P0 avail(soil, α) avail(rock, β) A0 20 sample(soil, α) avail(image,γ) at(α) 10 drive(α, β) 30 drive(α, γ) P1 avail(soil, α) avail(rock, β) avail(image, γ) at(α) 20 have(soil) Cost Reduces because Of different supporter At a later level June 7th, 2006 10 at(β) 30 at(γ) ICAPS'06 Tutorial T6 A1 20 sample(soil, α) 10 drive(α, β) 30 drive(α, γ) 35 sample(image,γ) 35 sample(rock, β) 25 commun(soil) 25 drive(β, γ) 15 drive(β, α) 35 drive(γ, α) 40 drive(γ, β) P2 avail(soil, α) avail(rock, β) avail(image, γ) at(α) 20 have(soil) 35 have(image) 35 have(rock) 10 at(β) 25 comm(soil) 25 at(γ) 34 1-lookahead Cost Propagation (cont’d) P2 A2 avail(soil, α) avail(rock, β) avail(image, γ) at(α) 20 have(soil) 35 have(image) 35 have(rock) 10 at(β) 25 comm(soil) 25 at(γ) June 7th, 2006 20 sample(soil, α) 10 drive(α, β) 30 drive(α, γ) 30 sample(image,γ) 35 sample(rock, β) 25 commun(soil) 25 drive(β, γ) 15 drive(β, α) 30 drive(γ, α) 40 drive(γ, β) 40 commun(image) 40 commun(rock) P3 avail(soil, α) avail(rock, β) avail(image, γ) at(α) 20 have(soil) 30 have(image) 35 have(rock) 10 at(β) 25 comm(soil) 25 at(γ) 40 comm(image) 40 comm(rock) ICAPS'06 Tutorial T6 A3 20 sample(soil, α) 10 drive(α, β) 30 drive(α, γ) 30 sample(image,γ) 35 sample(rock, β) 25 commun(soil) 25 drive(β, γ) P 4 avail(soil, α) avail(rock, β) avail(image, γ) at(α) 20 have(soil) 30 have(image) 35 have(rock) 10 at(β) 15 drive(β, α) 30 drive(γ, α) 40 drive(γ, β) 35 commun(image) 40 commun(rock) 25 comm(soil) 25 at(γ) 35 comm(image) 40 comm(rock) 35 Terminating Cost Propagation Stop when: goals are reached (no-lookahead) costs stop changing (∞-lookahead) k levels after goals are reached (k-lookahead) June 7th, 2006 ICAPS'06 Tutorial T6 36 Guiding Relaxed Plans with Costs Start Extract at first level goal proposition is cheapest June 7th, 2006 ICAPS'06 Tutorial T6 37 Cost-Based Planning Conclusions Cost-Functions: Remove false assumption that level is correlated with cost Improve planning with non-uniform cost actions Are cheap to compute (constant overhead) June 7th, 2006 ICAPS'06 Tutorial T6 38 Partial Satisfaction (Over-Subscription) Planning June 7th, 2006 ICAPS'06 Tutorial T6 39 Partial Satisfaction Planning Selecting Goal Sets Estimating goal benefit Anytime goal set selection Adjusting for negative interactions between goals June 7th, 2006 ICAPS'06 Tutorial T6 40 Partial Satisfaction (Oversubscription) Planning In many real world planning tasks, the agent often has more goals than it has resources to accomplish. Example: Rover Mission Planning (MER) Need automated support for Over-subscription/Partial Satisfaction Planning PLAN EXISTENCE PLAN LENGTH Actions have execution costs, goals have utilities, and the objective is to find the plan that has the highest net benefit. PSP GOAL PSP GOAL LENGTH PLAN COST PSP UTILITY PSP NET BENEFIT PSP UTILITY COST June 7th, 2006 ICAPS'06 Tutorial T6 41 Adapting PG heuristics for PSP 0 Challenges: 5 Need to propagate costs on the planning graph The exact set of goals are not clear 0 4 l=0 Interactions between goals Obvious approach of considering all 2n goal subsets is infeasible Cost-sensitive Planning Graph Actions in the Last Level Problem Spec (Init, Goal state) Goal Set selection Algorithm Cost sensitive Search June 7th, 2006 5 0 5 4 4 3 l=1 0 5 0 8 12 4 l=2 Idea: Select a subset of the top level goals upfront Challenge: Goal interactions Action Templates Graphplan Plan Extension Phase (based on STAN) + Cost Propagation 0 Solution Plan Extraction of Heuristics Heuristics Approach: Estimate the net benefit of each goal in terms of its utility minus the cost of its relaxed plan Bias the relaxed plan extraction to (re)use the actions already chosen for other goals ICAPS'06 Tutorial T6 42 Goal Set Selection In Rover Problem Found By RP 50-25 = 25 comm(soil) 60-40 = 20 comm(rock) 20-35 = -15 comm(image) 50-25 = 25 110-65 = 45 comm(rock) Found By Cost Propagation Found By Biased RP 70-60 = 20 comm(image) 130-100 = 30 comm(image) June 7th, 2006 ICAPS'06 Tutorial T6 43 SAPAPS (anytime goal selection) A* Progression search g-value: net-benefit of plan so far h-value: relaxed plan estimate of best goal set Relaxed plan found for all goals Iterative goal removal, until net benefit does not increase Returns plans with increasing g-values. June 7th, 2006 ICAPS'06 Tutorial T6 44 Some Empirical Results for AltAltps Exact algorithms based on MDPs don’t scale at all [AAAI 2004] June 7th, 2006 ICAPS'06 Tutorial T6 45 Adjusting for Negative Interactions (AltWlt) Problem: What if the apriori goal set is not achievable because of negative interactions? What if greedy algorithm gets bad local optimum? Solution: Do not consider mutex goals Add penalty for goals whose relaxed plan has mutexes. Use interaction factor to adjust cost, similar to adjusted sum heuristic maxg1, g2 ∈ G {lev(g1, g2) – max(lev(g1), lev(g2)) } Find Best Goal set for each goal June 7th, 2006 ICAPS'06 Tutorial T6 46 The Problem with Plangraphs [Smith, ICAPS 04] 0 0 5 5 0 1 0 1 5 0 5 1 0 5 2 1 7 5 3 1 7 5 0 5 2 0 3 3 1 0 Assume independence between objectives For rover: all estimates from starting location June 7th, 2006 ICAPS'06 Tutorial T6 47 Approach – Construct orienteering problem – Solve it – Use as search guidance June 7th, 2006 ICAPS'06 Tutorial T6 48 Orienteering Problem TSP variant – Given: network of cities rewards in various cities finite amount of gas – Objective: collect as much reward as possible before running out of gas June 7th, 2006 ICAPS'06 Tutorial T6 49 Orienteering Graph 0 3 1 Loc1 7 Loc0 5 Sample1 Image1 3 0 4 2 8 3 Rover 3 6 Loc2 1 5 Sample3 Loc3 3 Image2 0 June 7th, 2006 ICAPS'06 Tutorial T6 50 The Big Question: How do we determine which propositions go in the orienteering graph? Propositions that: are changed in achieving one goal impact the cost of another goal Sensitivity analysis June 7th, 2006 ICAPS'06 Tutorial T6 51 Recall: Plan Graph 0 L0 0 7 L0 Im1 M0,1 1 I1 3 6 M0,2 Sa1 M1,2 8 M0,3 7 L1 6 M1,3 2 S1 L1 L2 M2,1 2 1 I2 Im2 M2,3 0 8 10 7 4 L2 8 L3 L0 3 L3 6 7 9 11 S3 3 June 7th, 2006 ICAPS'06 Tutorial T6 Sa3 52 Sensitivity Analysis ∞ L0 L0 L0 Im1 M0,1 0 I1 S1 Sa1 L1 M0,2 L1 M1,2 M1,3 L2 M0,3 L2 M2,1 I2 Im2 L3 L3 M2,3 S3 ICAPS'06 Tutorial T6 Sa3 June 7th, 2006 53 Sensitivity Analysis ∞ ∞ L0 L0 7 Im1 M0,1 1 I1 3 0 L1 6 M0,2 ∞ 8 1 ∞ Image1 2 4 3 8 Sample3 Rover 3 6 Image2 1 I2 Im2 L3 M2,3 Loc3 1 Loc June 7th, 22006 L1 L2 Sample1 Loc1 S1 ∞ 1 8 ⇒1 3 0 4 M2,1 2 3 7 M1,3 2 L2 M0,3 Loc0 Sa1 M1,2 L0 3 L3 S3 2 ∞ 7⇒3 4 ∞ 11 ⇒ 7 3 ICAPS'06 Tutorial T6 Sa3 54 Basis Set Algorithm For each goal: Construct a relaxed plan For each net effect of relaxed plan: Reset costs in PG Set cost of net effect to 0 Set cost of mutex initial conditions to ∞ Compute revised cost estimates If significantly different, add net effect to basis set June 7th, 2006 ICAPS'06 Tutorial T6 55 ICAPS'06 Tutorial T6 56 25 Rocks June 7th, 2006 PSP Conclusions Goal Set Selection Apriori for Regression Search Anytime for Progression Search Both types of search use greedy goal insertion/removal to optimize net-benefit of relaxed plans Orienteering Problem Interactions between goals apparent in OP Use solution to OP as heuristic Planning Graphs help define OP June 7th, 2006 ICAPS'06 Tutorial T6 57 Planning with Resources June 7th, 2006 ICAPS'06 Tutorial T6 58 Planning with Resources Propagating Resource Intervals Relaxed Plans Handling resource subgoals June 7th, 2006 ICAPS'06 Tutorial T6 59 Rover with power Resource Resource Usage, Same as costs for This example June 7th, 2006 ICAPS'06 Tutorial T6 60 Resource Intervals P0 A0 avail(soil, α) P1 avail(soil, α) sample(soil, α) avail(rock, β) A1 drive(α, γ) sample(soil, α) avail(rock, β) avail(rock, β) avail(image, γ) avail(image, γ) recharge recharge at(α) at(α) at(α) [-5,50] power 30 [25,25] power 5 Resource Interval assumes independence have(soil) among Consumers/Producers UB: UB(noop) + recharge = 25 + 25 = 50 at(β) LB: LB(noop) + sample + drive = 25 + (-20) + (-10) = -5 sample(rock, β) commun(soil) [-115,75] power 5 have(soil) drive(β, γ) at(β) drive(β, α) UB: UB(noop) + recharge = 50 + 25 = 75 LB: LB(noop) + drive + sample + drive + sample + commun + drive + drive = -5 + (-30) + (-20) + (-10) + (-25) + (-5) + (-15) + (-5) = -115 June 7th, 2006 avail(soil, α) drive(α, β) drive(α, β) avail(image, γ) P2 ICAPS'06 Tutorial T6 have(rock) comm(soil) 61 at(γ) Resource Intervals (cont’d) P2 avail(soil, α) avail(rock, β) avail(image, γ) at(α) [-115,75] power 5 have(soil) at(β) A2 drive(α, γ) sample(soil, α) drive(α, β) recharge sample(rock, β) commun(soil) drive(β, γ) drive(β, α) have(rock) P3 avail(soil, α) avail(rock, β) avail(image, γ) at(α) [-250,100] power 5 have(soil) at(β) have(rock) sample(image,γ) A3 drive(α, γ) sample(soil, α) drive(α, β) recharge sample(rock, β) commun(soil) drive(β, γ) drive(β, α) sample(image,γ) comm(soil) comm(soil) drive(γ, β) drive(γ, α) commun(rock) avail(image, γ) at(α) [-390,125] power have(soil) at(β) have(rock) comm(soil) at(γ) commun(image) June 7th, 2006 avail(rock, β) drive(γ, β) at(γ) at(γ) avail(soil, α) have(image) have(image) drive(γ, α) P4 ICAPS'06 Tutorial T6 comm(rock) commun(rock) comm(image) 62 comm(rock) Relaxed Plan Extraction with Resources Start Extraction as before Track “Maximum” Resource Requirements For actions chosen at each level May Need more than One Supporter For a resource!! June 7th, 2006 ICAPS'06 Tutorial T6 63 ICAPS'06 Tutorial T6 64 Results June 7th, 2006 Results (cont’d) June 7th, 2006 ICAPS'06 Tutorial T6 65 Planning With Resources Conclusion Resource Intervals allow us to be optimistic about reachable values Upper/Lower bounds can get large Relaxed Plans may require multiple supporters for subgoals Negative Interactions are much harder to capture June 7th, 2006 ICAPS'06 Tutorial T6 66 Temporal Planning June 7th, 2006 ICAPS'06 Tutorial T6 67 Temporal Planning Temporal Planning Graph From Levels to Time Points Delayed Effects Estimating Makespan Relaxed Plan Extraction June 7th, 2006 ICAPS'06 Tutorial T6 68 Rover with Durative Actions June 7th, 2006 ICAPS'06 Tutorial T6 69 SAPA Planning Problem f can have both Select state with Cost & Makespan lowest f-value components Generate start state Queue of TimeStamped states No Satisfies Goals? Expand state by applying actions Build RTPG Propagate Cost functions Heuristic Extract relaxed plan estimation Adjust for Mutexes; Resources Yes Partialize the p.c. plan Return o.c and p.c plans [ECP 2001; AIPS 2002; ICAPS 2003; JAIR 2003] June 7th, 2006 ICAPS'06 Tutorial T6 70 Search through time-stamped states Goal Satisfaction: S=(P,M,Π,Q,t) ⇒ G if ∀<pi,ti>∈ G Set <pi,ti> of predicates pi and the time of their last achievement ti < t. Set of protected persistent conditions (could be binary or resource conds). S=(P,M,Π,Q,t) either: ∃ <pi,tj> ∈ P, tj < ti and no event in Q deletes pi. ∃ e ∈ Q that adds pi at time te < ti. Set of functions represent resource values. Action Application: Event queue (contains resource as well As binary fluent events). ¬(in-city ?airplane ?city1) Action A is applicable in S if: All instantaneous preconditions of A are satisfied by P and M. A’s effects do not interfere with Π and Q. No event in Q interferes with persistent preconditions of A. A does not lead to concurrent resource change Time stamp of S. (in-city ?airplane ?city2) consume (fuel ?airplane) Flying (in-city ?airplane ?city1) (fuel ?airplane) > 0 Search: Pick a state S from the queue. If S satisfies the goals, end P is updated according to A’s instantaneous effects. Else non-deterministically do one of Persistent preconditions of A are put in Π --Advance the clock Delayed effects of A are put in Q. (by executing the earliest event in Qs June 7th, 2006 ICAPS'06 Tutorial T6 71 --Apply one of the applicable actions to S When A is applied to S: Temporal Planning drive(α, β) drive(α, γ) drive(β, α) drive(γ, α) drive(γ, β) sample(rock, β) commun(rock) commun(soil) drive(β, γ) sample(soil, α) sample(image, γ) commun(image) at(α) comm(image) avail(soil,α) avail(rock,β) avail(image,γ) Assume Latest Start Time for actions in RP Record First time Point Action/Proposition is First Reachable have(image) comm(soil) at(γ) have(soil) June 7th, 2006 0 10 20 30 comm(rock) have(rock) at(β) ICAPS'06 Tutorial T6 40 50 60 70 80 72 90 100 Planning Problem Select state with lowest f-value Generate start state SAPA at IPC-2002 Satellite (complex setting) Queue of TimeStamped states Build RTPG Propagate Cost functions Heuristic Extract relaxed plan estimation Adjust for Mutexes; Resources Satellite (complex setting) Rover (time setting) ICAPS'06 Tutorial T6 [JAIR 2003] 73 Temporal Planning Conclusion Levels become Time Points Makespan and plan length/cost are different objectives Set-Level heuristic measures makespan Relaxed Plans measure makespan and plan cost June 7th, 2006 Satisfies Goals? Expand state by applying actions Rover (time setting) June 7th, 2006 No f can have both Cost & Makespan components ICAPS'06 Tutorial T6 74 Yes Partialize the p.c. plan Return o.c and p.c plans Non-Deterministic Planning June 7th, 2006 ICAPS'06 Tutorial T6 75 Non-Deterministic Planning Belief State Distance Multiple Planning Graphs Labelled Uncertainty Graph Implicit Belief states and the CFF heuristic June 7th, 2006 ICAPS'06 Tutorial T6 76 Conformant Rover Problem June 7th, 2006 ICAPS'06 Tutorial T6 77 Search in Belief State Space avail(soil, α) at(α) have(soil) sample(soil, α) avail(soil, α) drive(α, β) avail(soil, α) at(β) have(soil) avail(soil, β) at(α) avail(soil, β) at(β) avail(soil, γ) avail(soil, γ) at(α) at(β) at(α) avail(soil, β) at(α) at(β) at(α) drive(α, γ) avail(soil, β) at(β) have(soil) avail(soil, γ) avail(soil, α) at(β) avail(soil, α) avail(soil, γ) avail(soil, α) at(β) have(soil) at(β) drive(α, γ) drive(α, β) sample(soil, β) sample(soil, β) avail(soil, β) at(β) avail(soil, β) at(β) have(soil) avail(soil, γ) avail(soil, γ) at(β) at(β) drive(b, γ) June 7th, 2006 ICAPS'06 Tutorial T6 78 Belief State Distance BS1 Compute Classical Planning Distance Measures Assume can reach closest goal state Aggregate State Distances BS3 30 15 P1 5 10 P2 BS2 20 Max = 15, 20 40 Sum = 20, 20 [ICAPS 2004] Union = 17, 20 June 7th, 2006 ICAPS'06 Tutorial T6 79 Belief State Distance BS1 BS3 15 P1 5 Estimate Plans for each state pair a1 a3 a4 a5 a8 a44 a27 a14 a22 a37 a19 a50 a80 a34 a11 a19 a13 a19 a3 a4 Capture Positive Interaction & Independence a5 a13 Union = 17 June 7th, 2006 ICAPS'06 Tutorial T6 [ICAPS 2004] 80 State Distance Aggregations [Bryce et.al, 2005] Max Sum Union Over-estimates Just Right Under-estimates June 7th, 2006 ICAPS'06 Tutorial T6 81 Extract Relaxed Plans from each Multiple Planning Graphs P0 avail(soil, α) at(α) avail(soil, β) at(α) A0 3 drive(α, β) drive(α, γ) sample(soil, α) drive(α, β) drive(α, γ) h=7 P1 avail(soil, α) A1 drive(α, β) at(α) at(β) at(γ) drive(α, γ) sample(soil, α) have(soil) commun(soil) avail(soil, β) at(α) at(β) at(γ) 3 drive(α, β) drive(α, γ) sample(soil, β) drive(β, α) drive(β, γ) drive(γ, α) drive(γ, β) avail(soil, γ) avail(soil, γ) at(α) drive(α, β) drive(α, γ) at(α) at(β) at(γ) Build A planning Graph For each State in the belief state June 7th, 2006 drive(α, β) drive(α, γ) sample(soil, γ) drive(β, α) drive(β, γ) drive(γ, α) drive(γ, β) ICAPS'06 Tutorial T6 Step-wise union relaxed plans P2 avail(soil, α) A2 1 P3 at(α) at(β) at(γ) comm(soil) have(soil) avail(soil, β) at(α) at(β) at(γ) have(soil) avail(soil, γ) at(α) at(β) at(γ) have(soil) drive(α, β) drive(α, γ) sample(soil, β) drive(β, α) drive(β, γ) drive(γ, α) drive(γ, β) commun(soil) drive(α, β) drive(α, γ) sample(soil, γ) drive(β, α) drive(β, γ) drive(γ, α) drive(γ, β) commun(soil) avail(soil, β) at(α) at(β) at(γ) have(soil) comm(soil) avail(soil, β) at(α) at(β) at(γ) have(soil) comm(soil) 82 Labelled Uncertainty Graph P0 A0 P1 A1 avail(soil, α) avail(soil, α) Action labels are the Conjunction (intersection) of their Precondition labels P2 A2 P3 avail(soil, α) avail(soil, α) avail(soil, β) avail(soil, β) avail(soil, β) avail(soil, β) avail(soil, γ) avail(soil, γ) avail(soil, γ) avail(soil, β) sample(soil, α) at(α) ∧ ( avail(soil, α) ∨ avail(soil, β) ∨ avail(soil,γ) )∧ ¬ at(β) ∧ … sample(soil, α) have(soil) sample(soil, β) sample(soil, γ) drive(α, β) at(α) drive(α, γ) Labels correspond To sets of states, and Are represented as Propositional formulas June 7th, 2006 drive(α, γ) at(β) drive(β, α) at(γ) sample(soil, α) have(soil) sample(soil, β) comm(soil) commun(soil) drive(α, β) at(α) have(soil) sample(soil, γ) comm(soil) commun(soil) drive(α, β) at(α) at(α) at(β) at(γ) drive(α, γ) at(β) drive(β, α) at(γ) drive(β, Stop whenγ)goal is drive(β,are γ) the Effect labels Disjunction (union) drive(γ, α) ofICAPS'06 supporterTutorial labels T6 Labeled with every drive(γ, α) state 83 drive(γ, β) drive(γ, β) Labelled Relaxed Plan P0 A0 P1 A1 avail(soil, α) avail(soil, α) P2 A2 P3 avail(soil, α) avail(soil, α) avail(soil, β) avail(soil, β) avail(soil, β) avail(soil, β) avail(soil, γ) avail(soil, γ) avail(soil, γ) avail(soil, β) sample(soil, α) have(soil) sample(soil, α) have(soil) sample(soil, β) h=6 sample(soil, γ) drive(α, β) at(α) drive(α, γ) at(β) drive(α, β) drive(α, γ) drive(β, α) at(γ) Subgoals and Supporters Need not be used for Every state June 7th,where 2006 they are reached (labeled) drive(β, γ) drive(γ, α) ICAPS'06 Tutorial T6 drive(γ, β) have(soil) sample(soil, β) comm(soil) commun(soil) at(α) sample(soil, α) sample(soil, γ) comm(soil) commun(soil) drive(α, β) at(α) at(β) at(γ) drive(α, γ) drive(β, α) drive(β, Must Pickγ)Enough Supporters to cover drive(γ, α) the (sub)goals 84 drive(γ, β) at(α) at(β) at(γ) Comparison of Planning Graph Types [JAIR, 2006] LUG saves Time Sampling more worlds, Increases cost of MG Sampling more worlds, Improves effectiveness of LUG June 7th, 2006 ICAPS'06 Tutorial T6 85 State Agnostic Planning Graphs (SAG) LUG represents multiple explicit planning graphs SAG uses LUG to represent a planning graph for every state The SAG is built once per search episode and we can use it for relaxed plans for every search node, instead of building a LUG at every node Extract relaxed plans from SAG by ignoring planning graph components not labeled by states in our search node. June 7th, 2006 ICAPS'06 Tutorial T6 86 SAG Build a LUG for all states (union of all belief states) G G 1 3 oG 1 3 5 o12 3 o34 5 1 5 o56 G oG 1 2 3 4 5 6 oG 1 o12 2 o23 3 o34 4 o45 5 o56 6 o67 7 Ignore irrelevant labels ¾ Largest LUG == all LUGs June 7th, 2006 [ AAAI, 2005] ICAPS'06 Tutorial T6 87 CFF (implicit belief states + PG) at(α) drive(α, β) sample(soil, α) at(β) avail(soil, α) at(α) avail(soil, β) at(α) drive(α, γ) SAT Solver drive(α, β) avail(soil, γ) at(α) sample(soil, β) at(β) at(β) drive(α, γ) Planning Graph drive(b, γ) June 7th, 2006 ICAPS'06 Tutorial T6 88 Belief Space Problems Conformant June 7th, 2006 Classical Problems More Results at the IPC!!! Conditional ICAPS'06 Tutorial T6 89 Conditional Planning Actions have Observations Observations branch the plan because: Plan Cost is reduced by performing less “just in case” actions – each branch performs relevant actions Sometimes actions conflict and observing determines which to execute (e.g., medical treatments) We are ignoring negative interactions We are only forced to use observations to remove negative interactions Ignore the observations and use the conformant relaxed plan Suitable because the aggregate search effort over all plan branches is related to the conformant relaxed plan cost June 7th, 2006 ICAPS'06 Tutorial T6 90 Non-Deterministic Planning Conclusions Measure positive interaction and independence between states cotransitioning to the goal via overlap Labeled planning graphs and CFF SAT encoding efficiently measure conformant plan distance Conformant planning heuristics work for conditional planning without modification June 7th, 2006 ICAPS'06 Tutorial T6 91 Stochastic Planning June 7th, 2006 ICAPS'06 Tutorial T6 92 Stochastic Rover Example June 7th, 2006 [ICAPS 2006] ICAPS'06 Tutorial T6 93 Search in Probabilistic Belief State Space 0.04 sample(soil, α) 0.4 0.5 0.1 avail(soil, α) at(α) 0.36 0.5 0.1 avail(soil, β) at(α) avail(soil, γ) 0.4 drive(α, γ) 0.5 0.1 June 7th, 2006 at(α) avail(soil, α) at(α) have(soil) 0.36 drive(α, β) 0.5 avail(soil, β) at(α) 0.1 avail(soil, γ) at(α) avail(soil, α) at(β) avail(soil, α) at(β) have(soil) avail(soil, β) at(β) avail(soil, γ) sample(soil, β) 0.04 0.36 avail(soil, α) at(β) avail(soil, α) at(β) have(soil) 0.05 avail(soil, β) at(β) 0.45 avail(soil, β) at(β) have(soil) at(β) drive(α, γ) drive(α, β) at(α) 0.04 avail(soil, α) 0.1 avail(soil, γ) at(β) avail(soil, α) at(β) avail(soil, β) at(β) avail(soil, γ) at(β) ICAPS'06 Tutorial T6 94 Handling Uncertain Actions [ICAPS 2006] Extending LUG to handle uncertain actions requires label extension that captures: State uncertainty (as before) Action outcome uncertainty Problem: Each action at each level may have a different outcome. The number of uncertain events grows over time – meaning the number of joint outcomes of events grows exponentially with time Solution: Not all outcomes are important. Sample some of them – keep number of joint outcomes constant. June 7th, 2006 ICAPS'06 Tutorial T6 Monte Carlo LUG (McLUG) 95 [ICAPS 2006] Use Sequential Monte Carlo in the Relaxed Planning Space Build several deterministic planning graphs by sampling states and action outcomes Represent set of planning graphs using LUG techniques Labels are sets of particles Sample which Action outcomes get labeled with particles Bias relaxed plan by picking actions labeled with most particles to prefer more probable support June 7th, 2006 ICAPS'06 Tutorial T6 96 McLUG for rover example P0 A0 P1 A1 P2 avail(soil, α) avail(soil, α) avail(soil, β) avail(soil, β) sample(soil, α) A2 P3 avail(soil, α) avail(soil, α) avail(soil, β) avail(soil, β) sample(soil, α) have(soil) [ICAPS 2006] sample(soil, α) have(soil) sample(soil, β) sample(soil, β) comm(soil) commun(soil) drive(α, β) at(α) drive(α, γ) drive(α, β) drive(β, α) Particles in action Label must sample action outcome June 7th, 2006 drive(α, γ) at(β) ¾ of particles ¼ of particles drive(γ, α) Support goal, Support goal, need Okay to stop At least ½ drive(γ, β) drive(γ, α) drive(γ, β) ICAPS'06 Tutorial T6 97 [ICAPS 2006] P4-2-2 time (s) P2-2-2 time (s) P2-2-4 time (s) 1000 1000 10000 100 100 1000 10 10 100 1 (16) (32) (64) (128) (CPplan) 0.1 .3 at(γ) drive(β, γ) drive(β, γ) Logistics Domain Results .2 at(β) drive(β, α) at(γ) at(γ) Sample States for Initial layer -- avail(soil, γ) not sampled at(α) at(α) drive(α, γ) at(β) .4 .5 P2-2-2 length .6 1 .7 .8 .9 10 (16) (32) (64) (128) (CPplan) 0.1 .1 comm(soil) commun(soil) drive(α, β) at(α) have(soil) .2 .3 .4 (16) (32) (64) (128) (CPplan) 1 .5 .6 .7 .8 0.02 0.05 0.1 Scalable, w/P4-2-2 reasonable qualityP2-2-4 length length 100 100 70 60 50 40 10 30 20 (16) (32) (64) (128) (CPplan) 1 .2 .3 June 7th, 2006 .4 .5 .6 (16) (32) (64) (128) (CPplan) 10 .7 .8 .9 .1 .2 .3 .4 ICAPS'06 Tutorial T6 .5 .6 .7 .8 (16) (32) (64) (128) (CPplan) 10 0.02 0.05 98 0.1 Grid Domain Results [ICAPS 2006] 1000 Grid(0.8) time(s) Grid(0.5) time(s) 1000 (4) (8) (16) (32) (CPplan) 100 100 10 .2 .3 .4 .5 .6 .7 .8 Again,.1good scalability and quality! Grid(0.8) length 100 (4) (8) (16) (32) (CPplan) (4) (8) (16) (32) (CPplan) 10 .1 .2 .3 .1 .2 .3 June 7th, 2006 .4 .5 .6 .7 .8 .5 (4) (8) (16) (32) (CPplan) 10 10 .4 Need More Particles for broad beliefs Grid(0.5) length 100 .1 .2 .3 .4 .5 ICAPS'06 Tutorial T6 99 Direct Probability Propagation Alternative to label propagation, we can propagate numeric probabilities Problem: Numeric Propagation tends to assume only independence or positive interaction between actions and propositions. With probability, we can vastly under-estimate the probability of reaching propositions Solution: Propagate Correlation – measures pair-wise independence/pos interaction/neg interaction Can be seen as a continuous mutex June 7th, 2006 ICAPS'06 Tutorial T6 100 Correlation C(x, y) = Pr(x, y)/(Pr(x)Pr(y)) If : C(x, y) = 0, then x, y are mutex 0< C(x, y) < 1, then x, y interfere C(x, y) = 1, then x, y are independent 1< C(x, y) < 1/Pr(x), then x, y synergize C(x, y) = 1/Pr(x) = 1/Pr(y), then x, y are completely correlated June 7th, 2006 ICAPS'06 Tutorial T6 101 Probability of a set of Propositions Pr(x1, x2, …, xn) = Πi=1..n Pr(xi|x1…xi-1) Pr(xi|x1…xi-1) = Pr(x1… xi-1 | xi) Pr(xi) Pr(x1 … xi-1) ≈ Pr(x1|xi) … Pr(xi-1|xi)Pr(xi) Pr(x1) … Pr(xi-1) = Pr(xi|x1) … Pr(xi|xi-1) Pr(xi) Pr(xi) Pr(xi) = Pr(xi) C(xi, x1)…C(xi, xi-1) = Pr(xi) Πj = 1..i-1C(xi, xj) Chain Rule Bayes Rule Assume Independence Bayes Rule Correlation Pr(x1, x2, …, xn) = Πi=1..n Pr(xi) Πj = 1..i-1C(xi, xj) June 7th, 2006 ICAPS'06 Tutorial T6 102 Probability Propagation The probability of an Action being enabled is the probability of its preconditions (a set of propositions). The probability of an effect is the product of the action probability and outcome probability A single (or pair of) proposition(s) has probability equal to the probability it is given by the best set of supporters. The probability that a set of supporters gives a proposition is the sum over the probability of all possible executions of the actions. June 7th, 2006 ICAPS'06 Tutorial T6 103 ICAPS'06 Tutorial T6 104 Results June 7th, 2006 Stochastic Planning Conclusions Number of joint action outcomes too large Sampling outcomes to represent in labels is much faster than exact representation SMC gives us a good way to use multiple planning graph for heuristics, and the McLUG helps keep the representation small Numeric Propagation of probability can better capture interactions with correlation Can extend to cost and resource propagation June 7th, 2006 ICAPS'06 Tutorial T6 105 Hybrid Planning Models June 7th, 2006 ICAPS'06 Tutorial T6 106 Hybrid Models Metric-Temporal w/ Resources (SAPA) Temporal Planning Graph w/ Uncertainty (Prottle) PSP w/ Resources (SAPAMPS) Cost-based Conditional Planning (CLUG) June 7th, 2006 ICAPS'06 Tutorial T6 107 Propagating Temporal Cost Functions Drive-car(Tempe,LA) Tempe Hel(T,P) Airplane(P,LA) Airplane(P,LA) Shuttle(T,P) L.A Phoenix t=0 ∞ t = 0.5 t=1 t = 1.5 t = 2.0 t = 10 Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hour Helicopter(Tempe,Phx): Cost: $100; Time: 0.5 hour Car(Tempe,LA): Cost: $100; Time: 10 hour Airplane(Phx,LA): Cost: $200; Time: 1.0 hour $300 $220 $100 $20 0 0.5 1 1.5 2 Cost(At(LA)) June 7th, 2006 10 time Cost(At(Phx)) = Cost(Flight(Phx,LA)) ICAPS'06 Tutorial T6 108 Heuristics based on cost functions Using Relaxed Plan Direct If we want to minimize makespan: h = t0 If we want to minimize cost h = CostAggregate(G, t∞) If we want to minimize a function f(time,cost) of cost and makespan h = min f(t,Cost(G,t)) s.t. t0 ≤ t ≤ t∞ E.g. f(time,cost) = 100.makespan + Cost then h = 100x2 + 220 at t0 ≤ t = 2 ≤ t∞ Extract a relaxed plan using h as the bias If the objective function is f(time,cost), then action A ( to be added to RP) is selected such that: f(t(RP+A),C(RP+A)) + f(t(Gnew),C(Gnew)) is minimal Gnew = (G ∪ Precond(A)) \ Effects) cost $300 $220 Cost(At(LA)) $100 Time of lowest cost 0 June 7th, 2006 Time of Earliest achievement ∞ ICAPS'06 Tutorial T6 t0=1.5 2 t∞ = 10 time 109 Phased Relaxation The relaxed plan can be adjusted to take into account constraints that were originally ignored Adjusting for Mutexes: Adjust the make-span estimate of the relaxed plan by marking actions that are mutex (and thus cannot be executed concurrently Adjusting for Resource Interactions: Estimate the number of additional resource-producing actions needed to make-up for any resource short-fall in the relaxed plan C = C + ΣR  (Con(R) – (Init(R)+Pro(R)))/∆R * C(AR) June 7th, 2006 ICAPS'06 Tutorial T6 110 Handling Cost/Makespan Tradeoffs Cost variation Planning Problem Select state with lowest f-value Generate start state f can have both Cost & Makespan components Makespan variation 60 Queue of TimeStamped states Satisfies Goals? No Expand state by applying actions Build RTPG Propagate Cost functions Heuristic Extract relaxed plan estimation Adjust for Mutexes; Resources Total Cost 50 Yes Partialize the p.c. plan 40 30 20 10 0 Return o.c and p.c plans 0.1 0.2 0.3 0.4 0.5 0.6 0 0.8 0.9 0.95 Results over 20 randomly generated temporal logistics problems involve moving 4 packages between different locations in 3 cities: O = f(time,cost) = α.Makespan + (1- α).TotalCost June 7th, 2006 ICAPS'06 Tutorial T6 111 Prottle SAPA-style (time-stamped states and event queues) search for fully-observable conditional plans using L-RTDP Optimize probability of goal satisfaction within a given, finite makespan Heuristic estimates probability of goal satisfaction in the plan suffix June 7th, 2006 1 Alpha ICAPS'06 Tutorial T6 112 Prottle planning graph p4 p2 p3 o4 p1 o3 o2 o1 50% Time Stamped Propositions 60% 50% Probabilistic Outcomes at Different Times a2 o2 a1 40% <0.4> 50% p2 p1 p3 o1 50% a1 p2 p1 June 7th, 2006 ICAPS'06 Tutorial T6 113 Probability Back-Propagation <0> <0.8> <0.5> p2 p3 p4 <1> <1> p1 o4 <0> <0.5> <0.5> o2 o3 o1 50% <0.5> 50% <1> <0.8> <0.2> <0.4> 50% <0.5> <0.2> a1 <0.4> p1 <0.05> June 7th, 2006 Outcome cost is max of affected proposition costs <0.04> Action Cost is expectation of outcome costs p3 o1 <0.1> <0.5> 40% a2 <0.4> 50% p2 p1 Heuristic is product of Relevant Node costs <0> o2 a1 <0.5> 60% Vector of probability Costs for each toplevel Goal: 1 – Pr(Satisfy gi) Proposition cost is Product of supported Action costs p2 ICAPS'06 Tutorial T6 114 Prottle Results June 7th, 2006 ICAPS'06 Tutorial T6 115 PSP w/ Resources Utility and Cost based on the values of resources Challenges: Need to propagate cost for resource intervals Need to support resource goals at different levels June 7th, 2006 ICAPS'06 Tutorial T6 116 Resource Cost Propagation Propagate reachable values with cost Move(Waypoint1) Sample_Soil Sample_Soil Communicate v1: cost( ): June 7th, 2006 2 0 [0,0] 1 [0,1] [0,2] 0 1 2 A range of possible values 2.5 Cost of achieving each value bound ICAPS'06 Tutorial T6 117 Cost Propagation on Variable Bounds Sample_Soil Effect: v1+=1 Bound cost dependent upon action cost previous bound cost - current bound cost adds to the next Cost of all bounds in expressions Sample_Soil v1: [0,0] Sample_Soil [0,1] [0,2] Cost(v1=2) C(Sample_Soil)+Cost(v1=1) Sample_Soil Effect: v1+=v2 Sample_Soil v1: [0,0] v2: [0,3] Sample_Soil [0,3] [0,6] Cost(v1=6) C(Sample_Soil)+Cost(v2=3)+Cost(v1=3) June 7th, 2006 ICAPS'06 Tutorial T6 118 Results – Modified Rovers (numeric soil) Average improvement: 3.06 June 7th, 2006 ICAPS'06 Tutorial T6 119 Anytime A* Search Behavior June 7th, 2006 ICAPS'06 Tutorial T6 120 Results – Modified Logistics (#of packages) Average improvement: 2.88 June 7th, 2006 ICAPS'06 Tutorial T6 121 Cost-Based Conditional Planning Actions may reduce uncertainty, but cost a lot Do we want more “just in case” actions that are cheap, or less that are more expensive Propagate Costs on the LUG (CLUG) Problem: LUG represents multiple explicit planning graphs and the costs can be different in each planning graph. A single cost for every explicit planning assumes full positive interaction Multiple costs, one for each planning graph is too costly Solution: Propagate cost for partitions of the explicit planning graphs June 7th, 2006 ICAPS'06 Tutorial T6 122 Cost Propagation on the LUG (CLUG) s 0 10 0 ϕ0(B) B ¬s 0 ¬s ¬r C s s 0 10 0 ¬s 20 0 ϕ (C) 0 0 ¬r R 0 ¬r ¬ s∧¬r B 0 C ϕ0(C) 10 0 ¬s 20 24 10 0 R 7 ϕ0(R) 10 ϕ0(B) 0 ¬r r 17 7 0 ϕ (R) 0 ¬r Relaxed Plan (bold) Cost = c(B) + C(R) = 10 + 7 = 17 s∧¬r ¬r June 7th, 2006 ICAPS'06 Tutorial T6 The Medical Specialist 123 [Bryce & Kambhampati, 2005] Using Propagated Costs Improves Plan Quality, Average Path Cost Total Time cost-medical1 cost-medical1 Without much additional time Low Cost Sensors 90 70 coverage cost MBP GPT 300000 250000 60 50 200000 40 150000 30 100000 20 50000 10 0 1 High Cost Sensors 350000 coverage cost MBP GPT 80 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 7 8 9 10 cost-medical2 450000 coverage cost MBP GPT 120 0 10 cost-medical2 140 coverage cost MBP GPT 400000 350000 100 300000 80 250000 200000 60 150000 40 100000 20 50000 0 0 1 2 3 June 7th, 2006 4 5 6 7 8 9 10 0 27 r 0 s ICAPS'06 Tutorial T6 1 2 3 4 5 6 124 Overall Conclusions Relaxed Reachability Analysis Concentrate strongly on positive interactions and independence by ignoring negative interaction Estimates improve with more negative interactions Heuristics can estimate and aggregate costs of goals or find relaxed plans Propagate numeric information to adjust estimates Cost, Resources, Probability, Time Solving hybrid problems is hard Extra Approximations Phased Relaxation Adjustments/Penalties June 7th, 2006 ICAPS'06 Tutorial T6 125 Why do we love PG Heuristics? They work! They are “forgiving” You don't like doing mutex? okay You don't like growing the graph all the way? okay. Allow propagation of many types of information Level, subgoal interaction, time, cost, world support, probability Support phased relaxation E.g. Ignore mutexes and resources and bring them back later… Graph structure supports other synergistic uses e.g. action selection Versatility… June 7th, 2006 ICAPS'06 Tutorial T6 126 Versatility of PG Heuristics PG Variations Serial Parallel Temporal Labelled Planning Problems Classical Resource/Temporal Conformant June 7th, 2006 Propagation Methods Level Mutex Cost Label Planners ICAPS'06 Tutorial T6 Regression Progression Partial Order Graphplan-style 127

Planning Graph Based Reachability Heuristics

Related documents

Products

Support

Planning Graph Based Reachability Heuristics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib