Bounds for Oversubscribed Planning Problems J. Benton Menkes van den Briel Subbarao Kambhampati Arizona State University Is this plan “good”? e.g., when we have many soft goals { How good is a given plan Especially important when quality may vary widely Related How to drive a planner to find a good plan Helps one-shot use Admissible Helps per-node use heuristics Need a heuristic schema that admits degrees of relaxation Challenges 1. Build a strong admissible heuristic An integer programming (IP) based heuristic 2. Provide a way to add relaxation for varied use Use the linear programming (LP) relaxation UD PSP Partial Satisfaction Planning with Utility Dependency Actions have cost Goal sets have utility loc1 loc2 S3 S0 S1 S2 (at t (at t (move t loc2) (unload p1 loc2) (at t loc2) (move t loc1) (at t loc1) loc1) loc2) (at p1 (at p1 loc2) cost: 20 cost: 20 cost: 5 (in p1 t) (in p1 t) loc2) sum cost: 20 sum cost: 25 sum cost: 45 util(S1): 0 util(S3): 10+10+60=80 util(S0): 10 util(S2): 10 net benefit(S0): 10-0=10 net benefit(S1): 0-20=-20 net benefit(S2): 10-25=-15net benefit(S3): 80-45=35 utility((at t loc1)) = 10 utility((at p1 loc2)) = 10 utility((at t loc1) & (at p1 t)) = 60 Building a Heuristic A network flow model on variable transitions Capture relevant transitions with multi-valued fluents add initial states add prevail constraints add goal states add cost on actions add utility on goals loc1 loc2 package truck util: 10 util: 10 cost: 5 util: 60 cost: 20 cost: 20 cost: 5 cost: 5 cost: 5 Building a Heuristic Constraints of this model 1. If an action executes, then all of its effects and prevail conditions must also. 2. If a fact is deleted, then it must be added to re-achieve a value. 3. If a prevail condition is required, then it must be achieved. 4. A goal utility dependency is achieved if its goals are achieved. package truck util: 10 util: 10 cost: 5 util: 60 cost: 20 cost: 20 cost: 5 cost: 5 cost: 5 Formulation Variables action(a) ∈ Z+ effect(a,v,e) ∈ Z+ prevail(a,v,f) ∈ Z+ endvalue(v,f) ∈ {0,1} goaldep(k) Parameters cost(a) utility(v,f) utility(k) The number of times a ∈ A is executed The number of times a transition e in state variable v is caused by action a The number of times a prevail condition f in state variable v is required by action a Equal to 1 if value f is the end value in a state variable v Equal to 1 if a goal dependency is achieved the cost of executing action a ∈ A the utility of achieving value f in state variable v the utility of achieving achieving goal dependency Gk 1. If an action executes, then all of its effects and prevail conditions must also. action(a) = Σeffects of a in v effect(a,v,e) + Σprevails of a in v prevail(a,v,f) 2. If a fact is deleted, then it must be added to re-achieve a value. 1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) + endvalue(v,f) 3. If a prevail condition is required, then it must be achieved. 1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M 4. A goal utility dependency is achieved if its goals are achieved. goaldep(k) ≥ Σf in dependency k endvalue(v,f) – |Gk| – 1 goaldep(k) ≤ endvalue(v,f) ∀ f in Formulation Variables action(a) ∈ Z+ effect(a,v,e) ∈ Z+ The number of times a ∈ A is executed The number of times a transition e in state variable v is caused by action a The number of times a prevail condition f in state variable v is required by action a prevail(a,v,f) ∈ Z+ endvalue(v,f) ∈ {0,1} goaldep(k) Parameters cost(a) utility(v,f) utility(k) Equal to 1 if value f is the end value in a state variable v Equal to 1 if a goal dependency is achieved the cost of executing action a ∈ A the utility of achieving value f in state variable v the utility of achieving achieving goal dependency Gk Objective Function Σv∈V,f∈Dv utility(v,f) endvalue(v,f) + Σk∈K utility(k) goaldep(k) – Σa∈A cost(a) action(a) Maximize Net Benefit Experimental Setup Three modified IPC 3 domains: zenotravel, satellite, rovers (maximize net benefit) One IPC 5 domain: Rovers, simple preferences (minimize (goal achievement violations + action cost)) Compared with , a cost propagation-based heuristic heuristic value at initial state versus optimal plan Found using a branch and bound search maximizing LP > IP > OPTIMAL minimizing LP < IP < OPTIMAL Results Results Results IP LP Summary IP gives bound on quality of plan Doubly relaxed (LP) to provide heuristic for search (Search I Session: Monday at 4:10 pm) Future Work Improve encoding (to give better LP values) Use fluent merging