Planning Graph Based Reachability Heuristics

Planning Graph Based Reachability Heuristics Daniel Bryce & Subbarao Kambhampati ICAPS’06 Tutorial 6 June 7, 2006 http://rakaposhi.eas.asu.edu/pg-tutorial/ dan.bryce@asu.edu rao@asu.edu, http://verde.eas.asu.edu http://rakaposhi.eas.asu.edu Scalability of Planning Problem is Search Control!!!  Before, planning algorithms could synthesize about 6 – 10 action plans in minutes  Significant scaleup in the last 6-7 years Realistic encodings of Munich airport!  Now, we can synthesize 100 action plans in seconds. The primary revolution in planning in the recent years has been domain-independent heuristics to scale up plan synthesis June 7th, 2006 ICAPS'06 Tutorial T6 3 Motivation  Ways to improve Planner Scalability  Problem Formulation  Search Space  Reachability Heuristics  Domain (Formulation) Independent  Work for many search spaces  Flexible – work with most domain features  Overall complement other scalability techniques  Effective!! June 7th, 2006 ICAPS'06 Tutorial T6 4 Topics          Classical Planning Cost Based Planning Rao Partial Satisfaction Planning <Break> Non-Deterministic/Probabilistic Planning Resources (Continuous Quantities) Temporal Planning Hybrid Models Wrap-up Rao June 7th, 2006 ICAPS'06 Tutorial T6 Dan 5 Classical Planning June 7th, 2006 ICAPS'06 Tutorial T6 6 Rover Domain June 7th, 2006 ICAPS'06 Tutorial T6 7 Classical Planning  Relaxed Reachability Analysis  Types of Heuristics  Level-based  Relaxed Plans  Mutexes  Heuristic Search  Progression  Regression  Plan Space  Exploiting Heuristics June 7th, 2006 ICAPS'06 Tutorial T6 8 Planning Graph and Search Tree  Envelope of Progression Tree (Relaxed Progression)  Proposition lists: Union of states at kth level  Lowerbound reachability June 7th, 2006 information ICAPS'06 Tutorial T6 9 Level Based Heuristics  The distance of a proposition is the index of the first proposition layer in which it appears  Proposition distance changes when we propagate cost functions – described later  What is the distance of a Set of propositions??  Set-Level: Index of first proposition layer where all goal propositions appear  Admissible  Gets better with mutexes, otherwise same as max  Max: Maximum distance proposition  Sum: Summation of proposition distances June 7th, 2006 ICAPS'06 Tutorial T6 10 Example of Level Based Heuristics set-level(sI, G) = 3 max(sI, G) = max(2, 3, 3) = 3 sum(sI, G) = 2 + 3 + 3 = 8 June 7th, 2006 ICAPS'06 Tutorial T6 11 Distance of a Set of Literals Prop list Level 0 At(0,0) Action list Level 0 Prop list Level 1 Action list Level 1 At(0,0) noop x Move(0,0,0,1) x noop At(1,0) key(0,1) …... …... noop noop At(0,0) Move(0,1,1,1) xx Move(0,0,1,0) Key(0,1) At(0,1) Prop list Level 2 x Move(1,0,1,1) noop Pick_key(0,1) x noop x At(0,1) xx x At(1,1) x x x At(1,0) x Have_key ~Key(0,1) Key(0,1) Admissible x x Mutexes h(S) = pS lev({p}) Sum Partition-k   Adjusted Sum h(S) = lev(S) Set-Level Combo Set-Level with memos lev(p) : index of the first level at which p comes into the planning graph lev(S): index of the first level where all props in S appear non-mutexed.  If there is no such level, then If the graph is grown to level off, then  Else k+1 (k is the current length of the graph) June 7th, 2006 ICAPS'06 Tutorial T6 12 How do Level-Based Heuristics Break? P0 q A0 B1 B2 B3 B99 B100 P1 q p1 p2 p3 p99 p100 A1 B1 B2 B3 B99 B100 A P2 q p1 p2 p3 p99 p100 g The goal g is reached at level 2, but requires 101 actions to support it. June 7th, 2006 ICAPS'06 Tutorial T6 13 Relaxed Plan Heuristics  When Level does not reflect distance well, we can find a relaxed plan.  A relaxed plan is subgraph of the planning graph, where:  Every goal proposition is in the relaxed plan at the level where it first appears  Every proposition in the relaxed plan has a supporting action in the relaxed plan  Every action in the relaxed plan has its preconditions supported.  Relaxed Plans are not admissible, but are generally effective.  Finding the optimal relaxed plan is NP-hard, but finding a greedy one is easy. Later we will see how “greedy” can change. June 7th, 2006 ICAPS'06 Tutorial T6 14 Example of Relaxed Plan Heuristic Support Goal Propositions Individually June 7th, 2006 Count Actions RP(sI, G) = 8 ICAPS'06 Tutorial T6 Identify Goal Propositions 15 Results Relaxed Plan Level-Based June 7th, 2006 ICAPS'06 Tutorial T6 16 Results (cont’d) Relaxed Plan Level-Based June 7th, 2006 ICAPS'06 Tutorial T6 17 Optimizations in Heuristic Computation  Taming Space/Time costs  Bi-level Planning Graph representation  Partial expansion of the PG (stop before level-off)  It is FINE to cut corners when using PG for heuristics (instead of search)!! Goals C,D are present Example: •A x •B •C  Select actions in lev(S) vs Levels-off (incomplete)  Consider action appearing in RP June 7th, 2006 ICAPS'06 Tutorial T6 (incomplete) x x •D •E •E Heuristic extracted from partial graph vs. leveled graph 100000 Levels-off Lev(S) 10000 1000 Time(Seconds)  Branching factor can still be quite high  Use actions appearing in the PG (complete) •C x x •D •D x •B x •C •C x ff •A •B Trade-off •B x els o Lev •A •A •A •B Discarded 100 10 1 1 11 21 31 41 51 61 71 81 91 101 111 121 131 0.1 Problems 18 141 151 161 Adjusting for Negative Interactions  Until now we assumed actions only positively interact. What about negative interactions?  Mutexes help us capture some negative interactions  Types  Actions: Interference/Competing Needs  Propositions: Inconsistent Support  Binary are the most common and practical  |A| + 2|P|-ary will allow us to solve the planning problem with a backtrack-free GraphPlan search  An action layer may have |A| actions and 2|P| noops  Serial Planning Graph assumes all non-noop actions are mutex June 7th, 2006 ICAPS'06 Tutorial T6 19 Binary Mutexes P0 A0 avail(soil, ) sample(soil, ) avail(rock, ) drive(, ) avail(image,) P1 avail(soil, ) sample needs at(), drive negates at() --Interference-- sample(soil, ) have(soil) only supporter is mutex avail(rock, ) with at() only supporter -drive(, ) Inconsistent Support-avail(image, ) drive(, ) at() A1 have(soil) has a supporter not mutex with a supporter of at(), P2 2---feasible together at level avail(soil, ) avail(rock, ) avail(image, ) drive(, ) at() at() sample(image,) have(soil) have(soil) sample(rock, ) commun(soil) have(image) have(rock) drive(, ) at() Set-Level(sI, {at(),.have(soil)}) = 2 Max(sI, {at(),. have(soil)}) = 1 June 7th, 2006 at() drive(, ) drive(, ) at() ICAPS'06 Tutorialdrive(, T6 ) comm(soil) at() 20 Adjusting the Relaxed Plans  Start with RP heuristic and adjust it to take subgoal interactions into account  Negative interactions in terms of “degree of interaction”  Positive interactions in terms of co-achievement links  Ignore negative interactions when accounting for positive interactions (and vice versa) PROBLEM Level Sum AdjSum2M Gripper-25 - 69/0.98 67/1.57 Gripper-30 - 81/1.63 77/2.83 Tower-7 127/1.28 127/0.95 127/1.37 Tower-9 511/47.91 511/16.04 511/48.45 8-Puzzle1 31/6.25 39/0.35 31/0.69 8-Puzzle2 30/0.74 34/0.47 30/0.74 Mystery-6 - - 16/62.5 Mistery-9 8/0.53 8/0.66 8/0.49 Mprime-3 4/1.87 4/1.88 4/1.67 Mprime-4 8/1.83 8/2.34 10/1.49 Aips-grid1 14/1.07 14/1.12 14/0.88 Aips-grid2 - - 34/95.98 HAdjSum2M(S) = length(RelaxedPlan(S)) + max p,qS (p,q) Where (p,q) = lev({p,q}) - max{lev(p), lev(q)} /*Degree of –ve Interaction */ June 7th, 2006 ICAPS'06 Tutorial T6 [AAAI 2000] 21 Anatomy of a State-space Regression planner Problem: Given a set of subgoals (regressed state) estimate how far they are from the initial state [AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003] June 7th, 2006 ICAPS'06 Tutorial T6 22 Rover Example in Regression s4 comm(soil) s3 comm(soil) s2 comm(soil) s1 comm(soil) sG comm(soil) avail(rock, ) avail(rock, ) avail(image, ) have(rock) comm(rock) comm(rock) have(image) at() have(image) have(image) comm(image) at() at() commun(rock) commun(image) sample(rock, ) sample(image, ) Should be ∞, s4 is inconsistent, how do we improve the heuristic?? Sum(sI, s4) = 0+0+1+1+2=4 Sum(sI, s3) = 0+1+2+2=5 June 7th, 2006 ICAPS'06 Tutorial T6 23 AltAlt Performance Schedule Domain (AIPS-00) 100000 Level-based Adjusted RP 10000 Logistics STAN3.0 HSP2.0 AltAlt1.0 Time(Seconds) 1000 100 10 Scheduling 1 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 0.1 0.01 Problems Problem sets from IPC 2000 June 7th, 2006 ICAPS'06 Tutorial T6 24 151 161 Plan Space Search Then it was cruelly UnPOPped The good times return with Re(vived)POP In the beginning it was all POP. June 7th, 2006 ICAPS'06 Tutorial T6 25 POP Algorithm 1. Plan Selection: Select a plan P from the search queue 2. Flaw Selection: Choose a flaw f (open cond or unsafe link) 3. Flaw resolution: If f is an open condition, choose an action S that achieves f If f is an unsafe link, choose promotion or demotion Update P Return NULL if no resolution exist 4. If there is no flaw left, return P 1. Initial plan: g1 g2 Sin S0 f 2. Plan refinement (flaw selection and resolution): q1 S0 S1 p oc1 S g2 2 oc2 ~p S3 g1 Sinf g2 Choice points • Flaw selection (open condition? unsafe link? Non-backtrack choice) • Flaw resolution/Plan Selection (how to select (rank) partial plan?) June 7th, 2006 ICAPS'06 Tutorial T6 26 PG Heuristics for Partial Order Planning  Distance heuristics to estimate cost of partially ordered plans (and to select flaws)  If we ignore negative interactions, then the set of open conditions can be seen as a regression state q1 S1 p S0 S3 g1 S5 S4 q r S2 g2 g2 Sinf ~p  Mutexes used to detect indirect conflicts in partial plans  A step threatens a link if there is a mutex between the link condition and the steps’ effect or precondition  Post disjunctive precedences and use propagation to simplify June 7th, 2006 ICAPS'06 Tutorial T6 p Si Sj q Sk r if mutex( p, q) or mutex( p, r ) S k  Si  S j  S k 27 Regression and Plan Space s4 comm(soil) s3 comm(soil) s2 comm(soil) s1 comm(soil) sG comm(soil) avail(rock, ) avail(rock, ) avail(image, ) have(rock) comm(rock) comm(rock) have(image) at() have(image) have(image) comm(image) at() at() commun(rock) commun(image) sample(rock, ) sample(image, ) at() S0 avail(soil,) avail(rock,) avail(image, ) avail(rock, ) at() have(rock) S3 have(rock) sample(rock, ) S2 comm(rock) commun(rock) have(image) comm(soil) comm(rock) S1 comm(image) S1 comm(image) commun(image) at() S0 avail(soil,) avail(rock,) avail(image, ) June 7th, 2006 avail(rock, ) at() have(rock) S3 have(rock) sample(rock, ) S2 comm(rock) commun(rock) have(image) avail(image, ) at() comm(soil) comm(rock) S1 comm(image) S1 comm(image) S4 commun(image) have(image) ICAPS'06 Tutorial T6 sample(image, ) 28 RePOP’s Performance  RePOP implemented on top of UCPOP  Dramatically better than any other partial order planner before it  Competitive with Graphplan and AltAlt  VHPOP carried the torch at ICP 2002 Problem UCPOP RePOP Graphplan AltAlt Gripper-8 - 1.01 66.82 .43 Gripper-10 - 2.72 47min 1.15 Gripper-20 - 81.86 - 15.42 Rocket-a - 8.36 75.12 1.02 Rocket-b - 8.17 77.48 1.29 Logistics-a - 3.16 306.12 1.59 Logistics-b - 2.31 262.64 1.18 Logistics-c - 22.54 - 4.52 Logistics-d - 91.53 - 20.62 Bw-large-a 45.78 (5.23) - 14.67 4.12 Bw-large-b - (18.86) - 122.56 14.14 (137.84) - - 116.34 Bw-large-c - Written in Lisp, runs on Linux, 500MHz, 250MB You see, pop, it is possible to Re-use all the old POP work! [IJCAI, 2001] June 7th, 2006 ICAPS'06 Tutorial T6 29 Exploiting Planning Graphs  Restricting Action Choice  Use actions from:     Last level before level off (complete) Last level before goals (incomplete) First Level of Relaxed Plan (incomplete) – FF’s helpful actions Only action sequences in the relaxed plan (incomplete) – YAHSP  Reducing State Representation  Remove static propositions. A static proposition is only ever true or false in the last proposition layer. June 7th, 2006 ICAPS'06 Tutorial T6 30 Classical Planning Conclusions  Many Heuristics  Set-Level, Max, Sum, Relaxed Plans  Heuristics can be improved by adjustments  Mutexes  Useful for many types of search  Progresssion, Regression, POCL June 7th, 2006 ICAPS'06 Tutorial T6 31 Cost-Based Planning June 7th, 2006 ICAPS'06 Tutorial T6 32 Cost-based Planning  Propagating Cost Functions  Cost-based Heuristics  Generalized Level-based heuristics  Relaxed Plan heuristics June 7th, 2006 ICAPS'06 Tutorial T6 33 Rover Cost Model June 7th, 2006 ICAPS'06 Tutorial T6 34 Cost Propagation P0 avail(soil, ) avail(rock, ) A0 20 sample(soil, ) avail(image,) at() 10 drive(, ) 30 drive(, ) P1 avail(soil, ) avail(rock, ) avail(image, ) at() 20 have(soil) Cost Reduces because Of different supporter At a later level June 7th, 2006 10 at() 30 at() ICAPS'06 Tutorial T6 A1 20 sample(soil, ) 10 drive(, ) 30 drive(, ) 35 sample(image,) 35 sample(rock, ) 25 commun(soil) 25 drive(, ) 15 drive(, ) 35 drive(, ) 40 drive(, ) P2 avail(soil, ) avail(rock, ) avail(image, ) at() 20 have(soil) 35 have(image) 35 have(rock) 10 at() 25 comm(soil) 25 at() 35 Cost Propagation (cont’d) P avail(soil,2) avail(rock, ) avail(image, ) at() 20 have(soil) 35 have(image) 35 have(rock) 10 at() 25 comm(soil) 25 at() June 7th, 2006 A2 20 sample(soil, ) 10 drive(, ) 30 drive(, ) 30 sample(image,) 35 sample(rock, ) 25 commun(soil) 25 drive(, ) 15 drive(, ) 30 drive(, ) 40 drive(, ) 40 commun(image) 40 commun(rock) P 3 avail(soil, ) avail(rock, ) avail(image, ) at() 20 have(soil) 30 have(image) 35 have(rock) 10 at() 25 comm(soil) 25 at() 40 comm(image) 40 comm(rock) ICAPS'06 Tutorial T6 1-lookahead A3 20 sample(soil, ) 10 drive(, ) 30 drive(, ) 30 sample(image,) 35 sample(rock, ) 25 commun(soil) 25 drive(, ) 15 drive(, ) 30 drive(, ) 40 drive(, ) 35 commun(image) 40 commun(rock) P 4 avail(soil, ) avail(rock, ) avail(image, ) at() 20 have(soil) 30 have(image) 35 have(rock) 10 at() 25 comm(soil) 25 at() 35 comm(image) 40 comm(rock) 36 Terminating Cost Propagation  Stop when:  goals are reached (no-lookahead)  costs stop changing (∞-lookahead)  k levels after goals are reached (k-lookahead) June 7th, 2006 ICAPS'06 Tutorial T6 37 Guiding Relaxed Plans with Costs Start Extract at first level goal proposition is cheapest June 7th, 2006 ICAPS'06 Tutorial T6 38 Cost-Based Planning Conclusions  Cost-Functions:  Remove false assumption that level is correlated with cost  Improve planning with non-uniform cost actions  Are cheap to compute (constant overhead) June 7th, 2006 ICAPS'06 Tutorial T6 39 Partial Satisfaction (Over-Subscription) Planning June 7th, 2006 ICAPS'06 Tutorial T6 40 Partial Satisfaction Planning  Selecting Goal Sets  Estimating goal benefit  Anytime goal set selection  Adjusting for negative interactions between goals June 7th, 2006 ICAPS'06 Tutorial T6 41 Partial Satisfaction (Oversubscription) Planning In many real world planning tasks, the agent often has more goals than it has resources to accomplish. Example: Rover Mission Planning (MER) Need automated support for Over-subscription/Partial Satisfaction Planning Actions have execution costs, goals have utilities, and the objective is to find the plan that has the highest net benefit. PLAN EXISTENCE PLAN LENGTH PSP GOAL PSP GOAL LENGTH PLAN COST PSP UTILITY PSP NET BENEFIT PSP UTILITY COST June 7th, 2006 ICAPS'06 Tutorial T6 42 SapaPS: Anytime Forward Search • General idea: – Search incrementally while looking for better solutions • Return solutions as they are found… – If we reach a node with h value of 0, then we know we can stop searching (no better solutions can be found) June 7th, 2006 ICAPS'06 Tutorial T6 43 Background: Cost Propagation P0 A0 20 sample(soil, ) 10 drive(, ) at() 30 drive(, ) P1 avail(soil, ) avail(rock, ) avail(image, ) at() 20 have(soil) Cost Reduces because Of different supporter At a later level June 7th, 2006 10 at() 30 at() ICAPS'06 Tutorial T6 A1 20 sample(soil, ) 10 drive(, ) 30 drive(, ) 35 sample(image,) 35 sample(rock, ) 25 commun(soil) 25 drive(, ) 15 drive(, ) 35 drive(, ) 40 drive(, ) P2 avail(soil, ) avail(rock, ) avail(image, ) at() 20 have(soil) 35 have(image) 35 have(rock) 10 at() 25 comm(soil) 25 at() 44 Adapting PG heuristics for PSP 0  Challenges: 5  Need to propagate costs on the planning graph  The exact set of goals are not clear 0 5 4 l=0 4 4 l=1 0 5 0 12 8 4 l=2 Idea: Select a subset of the top level goals upfront  Challenge: Goal interactions Action Templates Cost-sensitive Planning Graph Actions in the Last Level Problem Spec (Init, Goal state) Goal Set selection Algorithm Cost sensitive Search June 7th, 2006 5 0 3  Interactions between goals  Obvious approach of  considering all 2n goal subsets is infeasible Graphplan Plan Extension Phase (based on STAN) + Cost Propagation 0 Solution Plan Extraction of Heuristics Heuristics  Approach: Estimate the net benefit of each goal in terms of its utility minus the cost of its relaxed plan  Bias the relaxed plan extraction to (re)use the actions already chosen for other goals ICAPS'06 Tutorial T6 45 Goal Set Selection In Rover Problem Found By RP 50-25 = 25 comm(soil) 60-40 = 20 comm(rock) 50-25 = 25 110-65 = 45 comm(rock) 20-35 = -15 comm(image) Found By Cost Propagation Found By Biased RP 70-60 = 20 comm(image) 130-100 = 30 comm(image) June 7th, 2006 ICAPS'06 Tutorial T6 46 SAPAPS (anytime goal selection)  A* Progression search  g-value: net-benefit of plan so far  h-value: relaxed plan estimate of best goal set  Relaxed plan found for all goals  Iterative goal removal, until net benefit does not increase  Returns plans with increasing g-values. June 7th, 2006 ICAPS'06 Tutorial T6 47 Some Empirical Results for AltAltps Exact algorithms based on MDPs don’t scale at all [AAAI 2004] June 7th, 2006 ICAPS'06 Tutorial T6 48 Adjusting for Negative Interactions (AltWlt)  Problem:  What if the apriori goal set is not achievable because of negative interactions?  What if greedy algorithm gets bad local optimum?  Solution:  Do not consider mutex goals  Add penalty for goals whose relaxed plan has mutexes.  Use interaction factor to adjust cost, similar to adjusted sum heuristic  maxg1, g2 2 G {lev(g1, g2) – max(lev(g1), lev(g2)) }  Find Best Goal set for each goal June 7th, 2006 ICAPS'06 Tutorial T6 49 Goal Utility Dependencies BAD GOOD Cost: 50 Utility: 0 Cost: 100 Utility: 500 High-res photo utility: 150 High-res photo utility: 150 Soil sample utility: 100 Low-res photo utility: 100 June 7th, 2006 T6 goals: Remove 80 Both goals: AdditionalICAPS'06 200 Tutorial Both 50 Partial Satisfaction Planning (PSP) Agent has more goals than it has resources to accomplish Objective: Find a plan with highest net benefit – Goals have utility – Actions have cost (i.e. maximize (utility – cost)) Challenges: Unclear as to which goals to achieve Interactions exist between goals June 7th, 2006 ICAPS'06 Tutorial T6 51 Dependencies goal interactions exist as two distinct types cost dependencies Goals share actions in the plan trajectory Defined by the plan Exists in classical planning, but is a larger issue in PSP All PSP Planners AltWlt, Optiplan, SapaPS June 7th, 2006 utility dependencies Goals may interact in the utility they give Explicitly defined by user No need to consider this in classical planning SPUDS Our planner based on PS Sapa ICAPS'06 Tutorial T6 52 Defining Goal Dependencies Follow the General Additive Independence (GAI) Model (Bacchus & Grove) Goals listed with additive utility Goal set Utility over set June 7th, 2006 ICAPS'06 Tutorial T6 53 Bacchus, F., and Grove, A. 1995. Graphical models for preference and utility. In Proc. of UAI-95. Defining Goal Dependencies (have-image high-res loc1) 150 (have left-shoe) (have right-shoe) +500 (have-image low-res loc1) 100 (have-soil loc1) 100 (have-image high-res loc1) (have-soil loc1) +200 (have-image high-res loc1) (have-image low-res loc1) -80 June 7th, 2006 ICAPS'06 Tutorial T6 54 Defining Goal Dependencies (we’ve seen so far…) (but what about…) Mutual Dependency Conditional Dependency Utility is different than the sum of utility of goals Utility of set of goals dependent upon whether other goals are already achieved transforms into June 7th, 2006 ICAPS'06 Tutorial T6 55 SapaPS Example of a relaxed plan (RP) in a rovers domain a1: Move(l1,l2) a2: Calibrate(camera) a3: Sample(l2) a4: Take_high_res(l2) a5: Take_low_res(l2) a6: Sample(l1) g1: Sample(l1) g2: Sample(l2) g3: HighRes(l2) g4: lowRes(l2) RP as a tree g2 ca3= 40 g1 ca6=40 At(l1) ca1=50 ca4=40 At(l2) g3 Calibrated ca2 =20 ca5=25 g4 Anytime Forward Search • • Propagate costs on relaxed planning graph Using the RP as a tree lets us remove supporting actions as we remove goals • We remove goals whose relaxed cost outweighs June 7th, 2006 utility ICAPS'06 Tutorial T6 0 0 5 0 5 0 5 3 4 l=0 4 4 l=1 0 5 0 12 8 4 56 l=2 SPUDS Example of a relaxed plan (RP) in a rovers domain a1: Move(l1,l2) a2: Calibrate(camera) a3: Sample(l2) a4: Take_high_res(l2) a5: Take_low_res(l2) a6: Sample(l1) g1: Sample(l1) g2: Sample(l2) g3: HighRes(l2) g4: lowRes(l2) RP as a tree g1 ca6=40 At(l1) ca1=50 ca4=40 At(l2) g3 Calibrated ca2 =20 ca5=25 g4 In SapaPS goals are removed independently (and in pairs) If cost > utility, the goal is removed. …but now we have more complicated goal interactions. New challenge: Goals no longer independent June 7th,do 2006 Tutorial T6 How we select the best set ICAPS'06 of goals? Idea: Use declarative programming 57 Heuristically Identifying Goals a1: Move(l1,l2) a2: Calibrate(camera) a3: Sample(l2) a4: Take_high_res(l2) a5: Take_low_res(l2) a6: Sample(l1) g1: Sample(l1) g2: Sample(l2) g3: HighRes(l2) g4: lowRes(l2) g2 ca3= 40 g1 ca6=40 At(l1) ca1=50 ca4=40 At(l2) g3 Calibrated ca2 =20 ca5=25 g4 We can encode all goal sets and actions June 7th, 2006 ICAPS'06 Tutorial T6 58 Heuristically Identifying Goals We encode the relaxed plan Binary Variables goal utility dependencies goals actions Constraints Select actions for goals Select goals in dependency Select goals in dependency Objective function: where June 7th, 2006 gives the utility given by the goal set , cost of gives all goals action is involved in reaching ICAPS'06 Tutorial T6 and 59 Heuristically Identifying Goals a1: Move(l1,l2) a2: Calibrate(camera) a3: Sample(l2) a4: Take_high_res(l2) a5: Take_low_res(l2) a6: Sample(l1) g1: Sample(l1) g2: Sample(l2) g3: HighRes(l2) g4: lowRes(l2) g2 ca3= 40 g1 ca6=40 At(l1) ca1=50 ca4=40 At(l2) g3 Calibrated ca2 =20 ca5=25 g4 We can now remove combinations of goals that give inoptimal relaxed solutions We are finding optimal goals given the (inadmissible) relaxed plan June 7th, 2006 ICAPS'06 Tutorial T6 60 Experimental Domains •Goal utilities randomly generated •Goals “hard” or “soft” •Goal dependencies randomly generated number of dependencies their size the set of goals involved their utility value •Upper bound on the number of dependencies is 3 x number of goals •Compared SPUDS to SapaPS SapaPS modified to calculate goal utility dependencies at each state June 7th, 2006 Domain Cost ZenoTravel Money spent Satellite Energy spent ICAPS'06 Tutorial T6 61 Results ZenoTravel 120000 SPUDS SapaPS 100000 benefit 80000 60000 40000 20000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 problems June 7th, 2006 ICAPS'06 Tutorial T6 62 Results Only in this instance does SapaPS do slightly better Satellite 160000 SPUDS SapaPS 140000 120000 benefit 100000 80000 60000 40000 20000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 -20000 -40000 problems June 7th, 2006 ICAPS'06 Tutorial T6 63 Results Our experiments show that the ILP encoding takes upwards of 200x PS procedure per node. Yet it is still doing better. more time Sapais Ourthan newthen heuristic better informed about cost dependencies PS Sapa heuristic lacks the ability to take utility dependency into account. Why? between goals. Obvious A little less obvious June 7th, 2006 ICAPS'06 Tutorial T6 64 Example Run 100000 SPUDS SapaPS 90000 80000 net benefit 70000 60000 50000 40000 30000 20000 10000 0 0 June 7th, 2006 10000 20000 30000 time (ms) ICAPS'06 Tutorial T6 40000 50000 60000 65 Summary • Goal utility dependency in GAI framework • Choosing goals at each search node • ILP-based heuristic enhances search guidance – cost dependencies – utility dependencies June 7th, 2006 ICAPS'06 Tutorial T6 66 Future Work • Comparing with other planners (IP) – Results promising so far (better quality in less time) • Include qualitative preference model as in PrefPlan (Brafman & Chernyavsky) • Take into account residual cost as in AltWlt (Sanchez & Kambhampati) • Extend into PDDL3 preference model June 7th, 2006 ICAPS'06 Tutorial T6 67 Questions June 7th, 2006 ICAPS'06 Tutorial T6 68 The Problem with Plangraphs [Smith, ICAPS 04] 0 0 5 5 0 1 0 1 5 0 5 1 0 2 5 2 1 7 5 3 1 7 5 0 5 2 0 3 3 1 0 Assume independence between objectives For rover: all estimates from starting location June 7th, 2006 ICAPS'06 Tutorial T6 69 Approach – Construct orienteering problem – Solve it – Use as search guidance June 7th, 2006 ICAPS'06 Tutorial T6 70 Orienteering Problem TSP variant – Given: network of cities rewards in various cities finite amount of gas – Objective: collect as much reward as possible before running out of gas June 7th, 2006 ICAPS'06 Tutorial T6 71 Orienteering Graph 0 3 1 Loc1 7 Loc0 5 Sample1 Image1 3 0 4 2 8 3 Rover 3 6 Loc2 1 5 Sample3 Loc3 3 Image2 0 June 7th, 2006 ICAPS'06 Tutorial T6 72 The Big Question: How do we determine which propositions go in the orienteering graph? Propositions that: are changed in achieving one goal impact the cost of another goal Sensitivity analysis June 7th, 2006 ICAPS'06 Tutorial T6 73 Recall: Plan Graph 0 0 L0 7 L0 Im1 M0,1 1 I1 3 6 7 M0,2 M1,2 8 M0,3 Sa1 L1 6 M1,3 2 S1 L1 L2 M2,1 2 1 I2 Im2 L3 M2,3 8 10 7 4 L2 8 L0 0 3 L3 6 7 9 11 S3 3 June 7th, 2006 ICAPS'06 Tutorial T6 Sa3 74 Sensitivity Analysis  L0 L0 L0 Im1 M0,1 0 Sa1 L1 M0,2 I1 S1 L1 M1,2 M1,3 M0,3 L2 L2 M2,1 I2 Im2 L3 L3 M2,3 June 7th, 2006 ICAPS'06 Tutorial T6 Sa3 S3 75 Sensitivity Analysis   L0 L0 7 Im1 M0,1 1 I1 3 0 6 M0,2 M1,2  8 Image1 2 1  4 3 8 Sample3 Rover 3 6 Loc June 7th, 22006 Image2 1 I2 Im2 L3 M2,3 Loc3 1 L1 L2 Sample1 Loc1 S1  1 8 1 3 0 4 M2,1 2 3 Loc0 M1,3 2 L2 M0,3 7 Sa1 L1 L0 3 L3 S3 2 62  4 94  3 ICAPS'06 Tutorial T6 Sa3 76 Basis Set Algorithm For each goal: Construct a relaxed plan For each net effect of relaxed plan: Reset costs in PG Set cost of net effect to 0 Set cost of mutex initial conditions to  Compute revised cost estimates If significantly different, add net effect to basis set June 7th, 2006 ICAPS'06 Tutorial T6 77 25 Rocks June 7th, 2006 ICAPS'06 Tutorial T6 78 PSP Conclusions  Goal Set Selection  Apriori for Regression Search  Anytime for Progression Search  Both types of search use greedy goal insertion/removal to optimize net-benefit of relaxed plans  Orienteering Problem  Interactions between goals apparent in OP  Use solution to OP as heuristic  Planning Graphs help define OP June 7th, 2006 ICAPS'06 Tutorial T6 79 Planning with Resources June 7th, 2006 ICAPS'06 Tutorial T6 80 Planning with Resources  Propagating Resource Intervals  Relaxed Plans  Handling resource subgoals June 7th, 2006 ICAPS'06 Tutorial T6 81 Rover with power Resource Resource Usage, Same as costs for This example June 7th, 2006 ICAPS'06 Tutorial T6 82 Resource Intervals P0 A0 avail(soil, ) P1 avail(soil, ) avail(rock, ) 5 avail(image, ) recharge recharge [25,25] power avail(rock, ) avail(image, ) avail(image, ) at() at() [-5,50] power 30 Resource Interval assumes independence have(soil) among Consumers/Producers UB: UB(noop) + recharge = 25 + 25 = 50 at() LB: LB(noop) + sample + drive = 25 + (-20) + (-10) = -5 sample(rock, ) commun(soil) ICAPS'06 Tutorial T6 [-115,75] power 5 have(soil) drive(, ) at() drive(, ) UB: UB(noop) + recharge = 50 + 25 = 75 LB: LB(noop) + drive + sample + drive + sample + commun + drive + drive = -5 + (-30) + (-20) + (-10) + (-25) + (-5) + (-15) + (-5) = -115 June 7th, 2006 avail(soil, ) drive(, ) drive(, ) at() drive(,) P2 sample(soil, ) sample(soil, ) avail(rock, ) A1 have(rock) comm(soil) 83 at() Resource Intervals (cont’d) P2 A2 avail(soil, ) avail(rock, ) avail(image, ) drive(,) sample(soil, ) drive(, ) P3 avail(soil, ) avail(rock, ) avail(image, ) recharge at() [-115,75] power 5 have(soil) at() sample(rock, ) commun(soil) drive(, ) drive(, ) A3 drive(,) sample(soil, ) drive(, ) at() [-250,100] power 5 have(soil) at() sample(rock, ) drive(, ) drive(, ) drive(, ) [-390,125] power have(soil) at() have(image) drive(, ) drive(, ) comm(soil) drive(, ) at() at() commun(image) June 7th, 2006 at() have(rock) comm(soil) commun(rock) avail(image, ) commun(soil) have(image) at() avail(rock, ) sample(image,) sample(image,) comm(soil) avail(soil, ) recharge have(rock) have(rock) P4 ICAPS'06 Tutorial T6 comm(rock) commun(rock) comm(image) 84 comm(rock) Relaxed Plan Extraction with Resources Start Extraction as before Track “Maximum” Resource Requirements For actions chosen at each level May Need more than One Supporter For a resource!! June 7th, 2006 ICAPS'06 Tutorial T6 85 Results June 7th, 2006 ICAPS'06 Tutorial T6 86 Results (cont’d) June 7th, 2006 ICAPS'06 Tutorial T6 87 Planning With Resources Conclusion  Resource Intervals allow us to be optimistic about reachable values  Upper/Lower bounds can get large  Relaxed Plans may require multiple supporters for subgoals  Negative Interactions are much harder to capture June 7th, 2006 ICAPS'06 Tutorial T6 88 Temporal Planning June 7th, 2006 ICAPS'06 Tutorial T6 89 Temporal Planning  Temporal Planning Graph  From Levels to Time Points  Delayed Effects  Estimating Makespan  Relaxed Plan Extraction June 7th, 2006 ICAPS'06 Tutorial T6 90 Rover with Durative Actions June 7th, 2006 ICAPS'06 Tutorial T6 91 SAPA Planning Problem f can have both Select state with Cost & Makespan lowest f-value components Generate start state Queue of TimeStamped states No Satisfies Goals? Expand state by applying actions Build RTPG Propagate Cost functions Heuristic Extract relaxed plan estimation Adjust for Mutexes; Resources Yes Partialize the p.c. plan Return o.c and p.c plans [ECP 2001; AIPS 2002; ICAPS 2003; JAIR 2003] June 7th, 2006 ICAPS'06 Tutorial T6 92 Search through time-stamped states  Goal Satisfaction: S=(P,M,,Q,t)  G if <pi,ti> G Set <pi,ti> of predicates pi and the time of their last achievement ti < t. Set of protected persistent conditions (could be binary or resource conds). S=(P,M,,Q,t) either:   <pi,tj>  P, tj < ti and no event in Q deletes pi.   e  Q that adds pi at time te < ti. Set of functions represent resource values.  Action Application:  All instantaneous preconditions of A are satisfied by P and M.  A’s effects do not interfere with  and Q.  No event in Q interferes with persistent preconditions of A.  A does not lead to concurrent resource change  When A is applied to S: Event queue (contains resource as well As binary fluent events). (in-city ?airplane ?city1) Action A is applicable in S if: Time stamp of S. (in-city ?airplane ?city2) consume (fuel ?airplane) Flying (in-city ?airplane ?city1) (fuel ?airplane) > 0 Search: Pick a state S from the queue. If S satisfies the goals, end  P is updated according to A’s instantaneous effects. Else non-deterministically do one of  Persistent preconditions of A are put in  --Advance the clock  Delayed effects of A are put in Q. (by executing the earliest event in Qs June 7th, 2006 ICAPS'06 Tutorial T6 93 --Apply one of the applicable actions to S Temporal Planning drive(, ) drive(, ) drive(, ) drive(, ) drive(, ) sample(rock, ) commun(rock) commun(soil) drive(, ) sample(soil, ) sample(image, ) commun(image) at() avail(soil,) avail(rock,) avail(image,) Assume Latest Start Time for actions in RP Record First time Point Action/Proposition is First Reachable comm(image) have(image) comm(soil) at() have(soil) June 7th, 2006 0 10 20 30 comm(rock) have(rock) at() ICAPS'06 Tutorial T6 40 50 60 70 80 94 90 100 Planning Problem Select state with lowest f-value Generate start state Queue of TimeStamped states SAPA at IPC-2002 Satellite (complex setting) Satisfies Goals? Expand state by applying actions Build RTPG Propagate Cost functions Heuristic Extract relaxed plan estimation Adjust for Mutexes; Resources Satellite (complex setting) Rover (time setting) June 7th, 2006 No f can have both Cost & Makespan components ICAPS'06 Tutorial T6 Rover (time setting) [JAIR 2003] 95 Yes Partialize the p.c. plan Return o.c and p.c plans Temporal Planning Conclusion  Levels become Time Points  Makespan and plan length/cost are different objectives  Set-Level heuristic measures makespan  Relaxed Plans measure makespan and plan cost June 7th, 2006 ICAPS'06 Tutorial T6 96 Non-Deterministic Planning June 7th, 2006 ICAPS'06 Tutorial T6 97 Non-Deterministic Planning     Belief State Distance Multiple Planning Graphs Labelled Uncertainty Graph Implicit Belief states and the CFF heuristic June 7th, 2006 ICAPS'06 Tutorial T6 98 Conformant Rover Problem June 7th, 2006 ICAPS'06 Tutorial T6 99 Search in Belief State Space avail(soil, ) at() have(soil) sample(soil, ) avail(soil, ) drive(, ) avail(soil, ) at() have(soil) avail(soil, ) at() avail(soil, ) at() avail(soil, ) avail(soil, ) at() at() at() avail(soil, ) at() sample(soil, ) avail(soil, ) at() have(soil) avail(soil, ) at() have(soil) avail(soil, ) at() drive(, ) drive(, ) avail(soil, ) avail(soil, ) at() at() drive(, ) avail(soil, ) at() sample(soil, ) avail(soil, ) at() avail(soil, ) at() have(soil) avail(soil, ) avail(soil, ) at() at() drive(b, ) June 7th, 2006 ICAPS'06 Tutorial T6 100 Belief State Distance BS1 Compute Classical Planning Distance Measures Assume can reach closest goal state Aggregate State Distances BS3 30 15 P1 5 10 P2 BS2 20 Max = 15, 20 40 Sum = 20, 20 Union = 17, 20 June 7th, 2006 ICAPS'06 Tutorial T6 [ICAPS 2004] 101 Belief State Distance BS1 BS3 15 P1 5 Estimate Plans for each state pair a1 a3 a4 a5 a8 a44 a27 a14 a22 a37 a19 a50 a80 a34 a11 a19 a13 a19 a3 a4 Capture Positive Interaction & Independence a5 a13 Union = 17 June 7th, 2006 ICAPS'06 Tutorial T6 [ICAPS 2004] 102 State Distance Aggregations Over-estimates [Bryce et.al, 2005] Max Sum Union Just Right Under-estimates June 7th, 2006 ICAPS'06 Tutorial T6 103 Extract Relaxed Plans from each Multiple Planning Graphs P0 avail(soil, ) at() avail(soil, ) at() A0 3 drive(, ) drive(, ) sample(soil, ) drive(, ) drive(, ) P1 avail(soil, ) drive(, ) at() at() at() drive(, ) sample(soil, ) have(soil) commun(soil) avail(soil, ) at() at() at() h=7 A1 3 drive(, ) drive(, ) sample(soil, ) drive(, ) drive(, ) drive(, ) drive(, ) avail(soil, ) avail(soil, ) at() drive(, ) drive(, ) at() drive(, ) at() sample(soil, ) at() drive(, ) drive(, ) drive(, ) drive(, ) Build A planning Graph For each State in the belief state June 7th, 2006 drive(, ) ICAPS'06 Tutorial T6 Step-wise union relaxed plans P2 avail(soil, ) A2 1 P3 at() at() at() comm(soil) have(soil) avail(soil, ) at() at() at() have(soil) avail(soil, ) at() at() at() have(soil) drive(, ) drive(, ) sample(soil, ) drive(, ) drive(, ) drive(, ) drive(, ) commun(soil) drive(, ) drive(, ) sample(soil, ) drive(, ) drive(, ) drive(, ) drive(, ) commun(soil) avail(soil, ) at() at() at() have(soil) comm(soil) avail(soil, ) at() at() at() have(soil) comm(soil) 104 Labelled Uncertainty Graph P0 A0 P1 A1 avail(soil, ) avail(soil, ) Action labels are the Conjunction (intersection) of their Precondition labels P2 A2 P3 avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) sample(soil, ) have(soil) at() Æ ( avail(soil, ) Ç avail(soil, ) Ç avail(soil,) )Æ : at() Æ … sample(soil, ) Labels correspond To sets of states, and Are represented as Propositional formulas June 7th, 2006 sample(soil, ) have(soil) sample(soil, ) comm(soil) sample(soil, ) commun(soil) commun(soil) drive(, ) drive(, ) at() drive(, ) have(soil) sample(soil, ) drive(, ) at() sample(soil, ) comm(soil) at() at() drive(, ) at() drive(, ) at() drive(, ) drive(, ) at() at() at() drive(,are ) the Effect labels Disjunction (union) drive(, ) of ICAPS'06 supporterTutorial labels T6 drive(, ) at() drive(, Stop when)goal is Labeled drive(,with ) every state 105 drive(, ) Labelled Relaxed Plan P0 A0 P1 A1 avail(soil, ) avail(soil, ) P2 A2 P3 avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) sample(soil, ) have(soil) sample(soil, ) have(soil) sample(soil, ) h=6 sample(soil, ) drive(, ) at() comm(soil) sample(soil, ) commun(soil) drive(, ) drive(, ) comm(soil) at() at() drive(, ) at() drive(, ) at() drive(, ) drive(, ) drive(, ) ICAPS'06 Tutorial T6 drive(, ) at() drive(, ) at() at() at() Subgoals and Supporters Need not be used for EveryJune state 7th,where 2006 they are reached (labeled) have(soil) sample(soil, ) commun(soil) at() drive(, ) sample(soil, ) drive(, Must Pick )Enough Supporters drive(, )to cover the (sub)goals 106 drive(, ) Comparison of Planning Graph Types [JAIR, 2006] LUG saves Time Sampling more worlds, Increases cost of MG Sampling more worlds, Improves effectiveness of LUG June 7th, 2006 ICAPS'06 Tutorial T6 107 State Agnostic Planning Graphs (SAG)  LUG represents multiple explicit planning graphs  SAG uses LUG to represent a planning graph for every state  The SAG is built once per search episode and we can use it for relaxed plans for every search node, instead of building a LUG at every node  Extract relaxed plans from SAG by ignoring planning graph components not labeled by states in our search node. June 7th, 2006 ICAPS'06 Tutorial T6 108 SAG Build a LUG for all states (union of all belief states) G G 1 3 3 5 1 5 oG 1 3 5 o12 o34 o56 oG 1 2 3 4 5 6 o12 o23 o34 o45 o56 o67 G oG 1 2 3 4 5 6 7  Ignore irrelevant labels  Largest LUG == all LUGs June 7th, 2006 ICAPS'06 Tutorial T6 [ AAAI, 2005] 109 CFF (implicit belief states + PG) at() drive(, ) sample(soil, ) at() avail(soil, ) drive(, ) at() avail(soil, ) at() SAT Solver drive(, ) avail(soil, ) at() sample(soil, ) at() at() drive(, ) Planning Graph drive(b, ) June 7th, 2006 ICAPS'06 Tutorial T6 110 Belief Space Problems Conformant June 7th, 2006 Classical Problems More Results at the IPC!!! Conditional ICAPS'06 Tutorial T6 111 Conditional Planning  Actions have Observations  Observations branch the plan because:  Plan Cost is reduced by performing less “just in case” actions – each branch performs relevant actions  Sometimes actions conflict and observing determines which to execute (e.g., medical treatments)  We are ignoring negative interactions  We are only forced to use observations to remove negative interactions  Ignore the observations and use the conformant relaxed plan  Suitable because the aggregate search effort over all plan branches is related to the conformant relaxed plan cost June 7th, 2006 ICAPS'06 Tutorial T6 112 Non-Deterministic Planning Conclusions  Measure positive interaction and independence between states cotransitioning to the goal via overlap  Labeled planning graphs and CFF SAT encoding efficiently measure conformant plan distance  Conformant planning heuristics work for conditional planning without modification June 7th, 2006 ICAPS'06 Tutorial T6 113 Stochastic Planning June 7th, 2006 ICAPS'06 Tutorial T6 114 Stochastic Rover Example June 7th, 2006 ICAPS'06 Tutorial T6 [ICAPS 2006] 115 Search in Probabilistic Belief State Space 0.04 0.04 avail(soil, ) avail(soil, ) at() at() sample(soil, ) 0.4 avail(soil, ) at() 0.5 0.5 0.1 avail(soil, ) at() avail(soil, ) 0.1 0.36 avail(soil, ) at() have(soil) 0.36 drive(, ) avail(soil, ) at() 0.5 avail(soil, ) 0.1 at() avail(soil, ) at() have(soil) sample(soil, ) 0.04 at() at() 0.4 avail(soil, ) at() have(soil) avail(soil, ) at() 0.36 avail(soil, ) 0.05 avail(soil, ) at() 0.45 avail(soil, ) at() have(soil) at() drive(, ) drive(, ) avail(soil, ) 0.1 avail(soil, ) at() avail(soil, ) at() drive(, ) 0.5 avail(soil, ) at() avail(soil, ) 0.1 June 7th, 2006 at() ICAPS'06 Tutorial T6 116 Handling Uncertain Actions [ICAPS 2006]  Extending LUG to handle uncertain actions requires label extension that captures:  State uncertainty (as before)  Action outcome uncertainty  Problem: Each action at each level may have a different outcome. The number of uncertain events grows over time – meaning the number of joint outcomes of events grows exponentially with time  Solution: Not all outcomes are important. Sample some of them – keep number of joint outcomes constant. June 7th, 2006 ICAPS'06 Tutorial T6 117 Monte Carlo LUG (McLUG) [ICAPS 2006]  Use Sequential Monte Carlo in the Relaxed Planning Space  Build several deterministic planning graphs by sampling states and action outcomes  Represent set of planning graphs using LUG techniques  Labels are sets of particles  Sample which Action outcomes get labeled with particles  Bias relaxed plan by picking actions labeled with most particles to prefer more probable support June 7th, 2006 ICAPS'06 Tutorial T6 118 McLUG for rover example P0 A0 P1 A1 P2 avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) sample(soil, ) have(soil) [ICAPS 2006] sample(soil, ) A2 avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) have(soil) sample(soil, ) sample(soil, ) at() comm(soil) commun(soil) commun(soil) drive(, ) drive(, ) at() drive(, ) drive(, ) drive(, ) drive(, ) at() at() at() June 7th, 2006 at() at() at() Particles in action Label must sample action outcome at() at() drive(, ) Sample States for Initial layer -- avail(soil, ) not sampled have(soil) sample(soil, ) comm(soil) drive(, ) P3 drive(, ) drive(, ) drive(, ) drive(, ) ¾ of particles ¼ of particles Support goal, need drive(, )Support goal, Okay to stop At least ½ drive(, ) ICAPS'06 Tutorial T6 119 Monte Carlo LUG (McLUG) P0 A0 P1 A1 avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) sample(soil, ) have(soil) sample(soil, ) P2 A2 avail(soil, ) avail(soil, ) avail(soil, ) avail(soil, ) have(soil) sample(soil, ) sample(soil, ) at() comm(soil) commun(soil) commun(soil) drive(, ) drive(, ) at() drive(, ) at() at() drive(, ) at() drive(, ) at() at() drive(, ) drive(, ) at() at() at() drive(, ) June 7th, 2006 have(soil) sample(soil, ) comm(soil) drive(, ) P3 Support drive(, ) Preconditions for Particles the) drive(, Action supports ICAPS'06 Tutorial T6 drive(, ) Pick Persistence Must Support By default, drive(, )Goal in All Commun covers Particles Other particlesdrive(, ) 120 Logistics Domain Results P2-2-2 time (s) P2-2-2 length June 7th, 2006 P4-2-2 time (s) [ICAPS 2006] P2-2-4 time (s) Scalable, w/P4-2-2 reasonable qualityP2-2-4 length length ICAPS'06 Tutorial T6 121 Grid Domain Results [ICAPS 2006] Grid(0.8) time(s) Grid(0.5) time(s) Again, good scalability and quality! Grid(0.8) length Need More Particles for broad beliefs Grid(0.5) length June 7th, 2006 ICAPS'06 Tutorial T6 122 Direct Probability Propagation  Alternative to label propagation, we can propagate numeric probabilities  Problem: Numeric Propagation tends to assume only independence or positive interaction between actions and propositions.  With probability, we can vastly under-estimate the probability of reaching propositions  Solution: Propagate Correlation – measures pair-wise independence/pos interaction/neg interaction  Can be seen as a continuous mutex June 7th, 2006 ICAPS'06 Tutorial T6 123 Correlation  C(x, y) = Pr(x, y)/(Pr(x)Pr(y))  If :      C(x, y) = 0, then x, y are mutex 0< C(x, y) < 1, then x, y interfere C(x, y) = 1, then x, y are independent 1< C(x, y) < 1/Pr(x), then x, y synergize C(x, y) = 1/Pr(x) = 1/Pr(y), then x, y are completely correlated June 7th, 2006 ICAPS'06 Tutorial T6 124 Probability of a set of Propositions  Pr(x1, x2, …, xn) = i=1..n Pr(xi|x1…xi-1)  Pr(xi|x1…xi-1) = Pr(x1… xi-1 | xi) Pr(xi) Pr(x1 … xi-1)  Pr(x1|xi) … Pr(xi-1|xi)Pr(xi) Pr(x1) … Pr(xi-1) = Pr(xi|x1) … Pr(xi|xi-1) Pr(xi) Pr(xi) Pr(xi) = Pr(xi) C(xi, x1)…C(xi, xi-1) = Pr(xi) j = 1..i-1C(xi, xj) Chain Rule Bayes Rule Assume Independence Bayes Rule Correlation  Pr(x1, x2, …, xn) = i=1..n Pr(xi) j = 1..i-1C(xi, xj) June 7th, 2006 ICAPS'06 Tutorial T6 125 Probability Propagation  The probability of an Action being enabled is the probability of its preconditions (a set of propositions).  The probability of an effect is the product of the action probability and outcome probability  A single (or pair of) proposition(s) has probability equal to the probability it is given by the best set of supporters.  The probability that a set of supporters gives a proposition is the sum over the probability of all possible executions of the actions. June 7th, 2006 ICAPS'06 Tutorial T6 126 Results June 7th, 2006 ICAPS'06 Tutorial T6 127 Stochastic Planning Conclusions  Number of joint action outcomes too large  Sampling outcomes to represent in labels is much faster than exact representation  SMC gives us a good way to use multiple planning graph for heuristics, and the McLUG helps keep the representation small  Numeric Propagation of probability can better capture interactions with correlation  Can extend to cost and resource propagation June 7th, 2006 ICAPS'06 Tutorial T6 128 Hybrid Planning Models June 7th, 2006 ICAPS'06 Tutorial T6 129 Hybrid Models  Metric-Temporal w/ Resources (SAPA)  Temporal Planning Graph w/ Uncertainty (Prottle)  PSP w/ Resources (SAPAMPS)  Cost-based Conditional Planning (CLUG) June 7th, 2006 ICAPS'06 Tutorial T6 130 Propagating Temporal Cost Functions Drive-car(Tempe,LA) Tempe Hel(T,P) Airplane(P,LA) Airplane(P,LA) Shuttle(T,P) L.A Phoenix t=0  t = 0.5 t=1 t = 1.5 t = 2.0 t = 10 Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hour Helicopter(Tempe,Phx): Cost: $100; Time: 0.5 hour Car(Tempe,LA): Cost: $100; Time: 10 hour Airplane(Phx,LA): Cost: $200; Time: 1.0 hour $300 $220 $100 $20 0 0.5 1 1.5 2 Cost(At(LA)) June 7th, 2006 10 time Cost(At(Phx)) = Cost(Flight(Phx,LA)) ICAPS'06 Tutorial T6 131 Heuristics based on cost functions Using Relaxed Plan Direct  If we want to minimize makespan:  h = t0  If we want to minimize cost  h = CostAggregate(G, t)  If we want to minimize a function f(time,cost) of cost and makespan  h = min f(t,Cost(G,t)) s.t. t0  t  t  E.g. f(time,cost) = 100.makespan + Cost then h = 100x2 + 220 at t0  t = 2  t  Extract a relaxed plan using h as the bias  If the objective function is f(time,cost), then action A ( to be added to RP) is selected such that: f(t(RP+A),C(RP+A)) + f(t(Gnew),C(Gnew)) is minimal Gnew = (G  Precond(A)) \ Effects) cost Time of Earliest achievement  $300 Cost(At(LA)) $220 $100 Time of lowest cost 0 June 7th, 2006 ICAPS'06 Tutorial T6 t0=1.5 2 t = 10 time 132 Phased Relaxation The relaxed plan can be adjusted to take into account constraints that were originally ignored Adjusting for Mutexes: Adjust the make-span estimate of the relaxed plan by marking actions that are mutex (and thus cannot be executed concurrently Adjusting for Resource Interactions: Estimate the number of additional resource-producing actions needed to make-up for any resource short-fall in the relaxed plan C = C + R  (Con(R) – (Init(R)+Pro(R)))/R * C(AR) June 7th, 2006 ICAPS'06 Tutorial T6 133 Handling Cost/Makespan Tradeoffs Cost variation Planning Problem Select state with lowest f-value Generate start state f can have both Cost & Makespan components Makespan variation 60 Queue of TimeStamped states Satisfies Goals? No Expand state by applying actions Build RTPG Propagate Cost functions Heuristic Extract relaxed plan estimation Adjust for Mutexes; Resources Total Cost 50 Yes Partialize the p.c. plan 40 30 20 10 0 Return o.c and p.c plans 0.1 0.2 0.3 0.4 0.5 0.6 0 0.8 0.9 0.95 Alpha Results over 20 randomly generated temporal logistics problems involve moving 4 packages between different locations in 3 cities: O = f(time,cost) = .Makespan + (1- ).TotalCost June 7th, 2006 1 ICAPS'06 Tutorial T6 134 Prottle  SAPA-style (time-stamped states and event queues) search for fully-observable conditional plans using L-RTDP  Optimize probability of goal satisfaction within a given, finite makespan  Heuristic estimates probability of goal satisfaction in the plan suffix June 7th, 2006 ICAPS'06 Tutorial T6 135 Prottle planning graph Probabilistic Outcomes at Different Times Time Stamped Propositions June 7th, 2006 ICAPS'06 Tutorial T6 136 Probability Back-Propagation Vector of probability Costs for each top-level Goal: 1 – Pr(Satisfy gi) Outcome cost is max of affected proposition costs Action Cost is expectation of outcome costs Proposition cost is Product of supported Action costs Heuristic is product of Relevant Node costs June 7th, 2006 ICAPS'06 Tutorial T6 137 Prottle Results June 7th, 2006 ICAPS'06 Tutorial T6 138 PSP w/ Resources  Utility and Cost based on the values of resources  Challenges:  Need to propagate cost for resource intervals  Need to support resource goals at different levels June 7th, 2006 ICAPS'06 Tutorial T6 139 Resource Cost Propagation  Propagate reachable values with cost Move(Waypoint1) Sample_Soil Sample_Soil Communicate v1: cost( ): June 7th, 2006 0 [0,0] 1 [0,1] 2 [0,2] 0 1 2 ICAPS'06 Tutorial T6 2.5 A range of possible values Cost of achieving each value bound 140 Cost Propagation on Variable Bounds Sample_Soil Effect: v1+=1  Bound cost dependent upon  action cost  previous bound cost - current bound cost adds to the next  Cost of all bounds in expressions Sample_Soil v1: [0,0] Sample_Soil [0,1] [0,2] Cost(v1=2) C(Sample_Soil)+Cost(v1=1) Sample_Soil Effect: v1+=v2 Sample_Soil v1: [0,0] v2: [0,3] Sample_Soil [0,3] [0,6] Cost(v1=6) C(Sample_Soil)+Cost(v2=3)+Cost(v1=3) June 7th, 2006 ICAPS'06 Tutorial T6 141 Results – Modified Rovers (numeric soil) Average improvement: 3.06 June 7th, 2006 ICAPS'06 Tutorial T6 142 Anytime A* Search Behavior June 7th, 2006 ICAPS'06 Tutorial T6 143 Results – Modified Logistics (#of packages) Average improvement: 2.88 June 7th, 2006 ICAPS'06 Tutorial T6 144 Cost-Based Conditional Planning  Actions may reduce uncertainty, but cost a lot  Do we want more “just in case” actions that are cheap, or less that are more expensive  Propagate Costs on the LUG (CLUG)  Problem: LUG represents multiple explicit planning graphs and the costs can be different in each planning graph.  A single cost for every explicit planning assumes full positive interaction  Multiple costs, one for each planning graph is too costly  Solution: Propagate cost for partitions of the explicit planning graphs June 7th, 2006 ICAPS'06 Tutorial T6 145 Cost Propagation on the LUG (CLUG) s 0 0 :s :r 0(B) B 0 :s 0 C s 10 s 0 10 0 :s 20 0(C) :r 0 R 0 :r 20 0(C) :s 7 0(R) 0 24 10 0 R r 17 7 0(R) 0 :r :r : s:r s:r :r June 7th, 2006 0 C 10 0 27 r 0 B 10 0(B) s 0 Relaxed Plan (bold) Cost = c(B) + C(R) = 10 + 7 = 17 ICAPS'06 Tutorial T6 146 The Medical Specialist [Bryce & Kambhampati, 2005] High Cost Sensors Low Cost Sensors Using Propagated Costs Improves Plan Quality, Average Path Cost Total Time Without much additional time June 7th, 2006 ICAPS'06 Tutorial T6 147 Overall Conclusions  Relaxed Reachability Analysis  Concentrate strongly on positive interactions and independence by ignoring negative interaction  Estimates improve with more negative interactions  Heuristics can estimate and aggregate costs of goals or find relaxed plans  Propagate numeric information to adjust estimates  Cost, Resources, Probability, Time  Solving hybrid problems is hard  Extra Approximations  Phased Relaxation  Adjustments/Penalties June 7th, 2006 ICAPS'06 Tutorial T6 148 Why do we love PG Heuristics?  They work!  They are “forgiving”   You don't like doing mutex? okay You don't like growing the graph all the way? okay.  Allow propagation of many types of information  Level, subgoal interaction, time, cost, world support, probability  Support phased relaxation  E.g. Ignore mutexes and resources and bring them back later…  Graph structure supports other synergistic uses  e.g. action selection  Versatility… June 7th, 2006 ICAPS'06 Tutorial T6 149 Versatility of PG Heuristics  PG Variations     Serial Parallel Temporal Labelled  Planning Problems  Classical  Resource/Temporal  Conformant June 7th, 2006  Propagation Methods     Level Mutex Cost Label  Planners     ICAPS'06 Tutorial T6 Regression Progression Partial Order Graphplan-style 150

Planning Graph Based Reachability Heuristics

Related documents

Products

Support

Planning Graph Based Reachability Heuristics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib