A Hybrid Linear Programming and Relaxed Plan Heuristic for Partial Satisfaction Planning Problems J. Benton Menkes van den Briel Subbarao Kambhampati Arizona State University UD PSP Partial Satisfaction Planning with Utility Dependency (Do, et al., IJCAI 2007) (Smith, ICAPS 2004; van den Briel, et al., AAAI 2004) Actions have cost loc1 Goal sets have utility 150 loc2 200 100 101 loc3 Maximize Net Benefit (utility - cost) (fly plane loc2) S0 (at plane loc1) (in person plane) (debark person loc2) S1 (at plane loc2) (in person plane) cost: 1 sum cost: 151 S2 (at plane loc2) (at person loc2) (fly plane loc3) S3 (at plane loc3) (at person loc2) cost: 150 sum cost: 251 cost: 150 sum cost: 150 util(S3): 1000+1000+10=2010 util(S1): 0 util(S0): 0 util(S2): 1000 net benefit(S0): 0-0=0 net benefit(S1): 0-150=-150net benefit(S2): 1000-151=849net benefit(S3): 2010-251=1759 utility((at plane loc3)) = 1000 utility((at person loc2)) = 1000 utility((at plane loc1) & (at person loc3)) = 10 Heuristic search for SOFT GOALS Action Cost/Goal Achievement Interaction Plan Quality (Do & Kambhampati, KCBS 2004; Do, et al., IJCAI 2007) Relaxed Planning Graph Heuristics Integer programming (IP) LP-relaxation Heuristics Cannot take all complex interactions into account Current encodings don’t scale well, can only be optimal to some plan step BBOP-LP Approach Build a network flowbased IP encoding No time indices Uses multi-valued variables Use its LP relaxation for a heuristic value Gives a second relaxation on the heuristic Perform branch and bound search Uses the LP solution to find a relaxed plan (similar to YAHSP, Vidal 2004) Building a Heuristic A network flow model on variable transitions (no time indices) Capture relevant transitions with multi-valued fluents prevail constraints cost on actions loc1 150 200 plane initial states goal states utility on goals loc2 100 person 101 loc3 cost: 101 util: 1000 cost: 1 cost: 150 util: 10 cost: 1 cost: 100 cost: 1 cost: 200 cost: 1 util: 1000 cost: 1 cost: 1 Building a Heuristic Constraints of this model 1. If an action executes, then all of its effects and prevail conditions must also. 2. If a fact is deleted, then it must be added to re-achieve a value. 3. If a prevail condition is required, then it must be achieved. 4. A goal utility dependency is achieved iff its goals are achieved. plane person cost: 101 util: 1000 cost: 1 cost: 150 util: 10 cost: 1 cost: 100 cost: 1 cost: 200 cost: 1 util: 1000 cost: 1 cost: 1 Building a Heuristic Constraints of this model 1. If an action executes, then all of its effects and prevail conditions must also. action(a) = Σeffects of a in v effect(a,v,e) + Σprevails of a in v prevail(a,v,f) 2. If a fact is deleted, then it must be added to re-achieve a value. 1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) + endvalue(v,f) 3. If a prevail condition is required, then it must be achieved. 1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M 4. A goal utility dependency is achieved iff its goals are achieved. goaldep(k) ≥ Σf in dependency k endvalue(v,f) – |Gk| – 1 Variables goaldep(k) ≤ endvalue(v,f) ∀ f in k The number of times a ∈ A is executed action(a) dependency ∈ Z+ The number of times a transition e in state variable v is caused by action a effect(a,v,e) ∈ Z+ prevail(a,v,f) ∈ Z+ endvalue(v,f) ∈ {0,1} goaldep(k) Parameters cost(a) utility(v,f) utility(k) The number of times a prevail condition f in state variable v is required by action a Equal to 1 if value f is the end value in a state variable v Equal to 1 if a goal dependency is achieved the cost of executing action a ∈ A the utility of achieving value f in state variable v the utility of achieving achieving goal dependency Gk Objective Function MAX Σv∈V,f∈Dv utility(v,f) endvalue(v,f) + Σk∈K utility(k) goaldep(k) – Σa∈A cost(a) action(a) Maximize Net Benefit 2. If a fact is deleted, then it must be added to re-achieve a value. 1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) + endvalue(v,f) Updated 3. If a prevail condition is required, then it must be achieved. 1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M at each search node Variables action(a) ∈ Z+ effect(a,v,e) ∈ Z+ prevail(a,v,f) ∈ Z+ endvalue(v,f) ∈ {0,1} goaldep(k) Parameters cost(a) utility(v,f) utility(k) The number of times a ∈ A is executed The number of times a transition e in state variable v is caused by action a The number of times a prevail condition f in state variable v is required by action a Equal to 1 if value f is the end value in a state variable v Equal to 1 if a goal dependency is achieved the cost of executing action a ∈ A the utility of achieving value f in state variable v the utility of achieving achieving goal dependency Gk Search Branch and Bound Branch and bound with time limit All soft goals; all states are goal states Returns the best plan (i.e., best bound) Greedy lookahead strategy Similar to YAHSP (Vidal, 2004) To quickly find good bounds LP-solution guided relaxed plan extraction To add informedness Getting a Relaxed Plan (fly loc3 loc2) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc1) (fly loc1 loc2) (fly loc1 loc2) (at plane loc2) (at plane loc2) (fly loc2 loc3) (at person loc2) (drop person loc2) (in person plane) (in person plane) (in person plane) Getting a Relaxed Plan (fly loc3 loc2) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc1) (fly loc1 loc2) (fly loc1 loc2) (at plane loc2) (at plane loc2) (fly loc2 loc3) (at person loc2) (drop person loc2) (in person plane) (in person plane) (in person plane) Getting a Relaxed Plan (fly loc3 loc2) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc1) (fly loc1 loc2) (fly loc1 loc2) (at plane loc2) (at plane loc2) (fly loc2 loc3) (at person loc2) (drop person loc2) (in person plane) (in person plane) (in person plane) Getting a Relaxed Plan (fly loc3 loc2) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc1) (fly loc1 loc2) (fly loc1 loc2) (at plane loc2) (at plane loc2) (fly loc2 loc3) (at person loc2) (drop person loc2) (in person plane) (in person plane) (in person plane) Getting a Relaxed Plan (fly loc3 loc2) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc3) (fly loc1 loc3) (at plane loc1) (at plane loc1) (fly loc1 loc2) (fly loc1 loc2) (at plane loc2) (at plane loc2) (fly loc2 loc3) (at person loc2) (drop person loc2) (in person plane) (in person plane) (in person plane) Experimental Setup Three modified IPC 3 domains: zenotravel, satellite, rovers (maximize net benefit) - action costs - goal utilities - goal utility dependencies BBOP-LP : with and without RP lookahead Compared with SPUDS, uses a relaxed plan-based heuristic , an admissible cost propagation-based heuristic Ran with 600 second time limit Results rovers satellite optimal solutions (higher net benefit is better) Found optimal solution in 15 of 60 problems zenotravel Results Summary Novel LP-based heuristic for partial satisfaction planning Branch and bound with RP lookahead Planner that is sensitive to plan quality: BBOP-LP Future Work Improve encoding Explore other lookahead methods