Finding Admissible Bounds for Over- subscribed Planning Problems

advertisement
Bounds for Oversubscribed Planning
Problems
J. Benton
Menkes van den Briel
Subbarao
Kambhampati
Arizona State University
Is this plan “good”?
e.g.,
when we
have many
soft goals
{
How good is a given
plan
Especially important
when quality may
vary widely
Related
How to drive a planner
to find a good plan
Helps one-shot use Admissible Helps per-node use
heuristics
Need a heuristic schema that admits degrees of
relaxation
Challenges
1. Build a strong admissible heuristic
An integer programming (IP) based heuristic
2. Provide a way to add relaxation for varied use
Use the linear programming (LP) relaxation
UD
PSP
Partial Satisfaction Planning with Utility
Dependency
Actions have cost
Goal sets have utility
loc1
loc2
S3
S0
S1
S2
(at t
(at t
(move t loc2)
(unload p1 loc2) (at t loc2) (move t loc1)
(at t loc1)
loc1)
loc2)
(at p1
(at p1 loc2)
cost: 20
cost: 20
cost: 5
(in p1 t)
(in p1 t)
loc2)
sum cost: 20
sum cost: 25
sum cost: 45
util(S1): 0
util(S3): 10+10+60=80
util(S0): 10
util(S2): 10
net benefit(S0): 10-0=10 net benefit(S1): 0-20=-20 net benefit(S2): 10-25=-15net benefit(S3): 80-45=35
utility((at t loc1)) = 10
utility((at p1 loc2)) = 10
utility((at t loc1) & (at p1 t)) = 60
Building a Heuristic
A network flow model on variable transitions
Capture relevant transitions with
multi-valued fluents
add initial states
add prevail constraints
add goal states
add cost on actions
add utility on
goals
loc1
loc2
package
truck
util: 10
util: 10
cost: 5
util: 60
cost: 20
cost: 20
cost: 5
cost: 5
cost: 5
Building a Heuristic
Constraints of this model
1. If an action executes, then all of its effects and prevail conditions must also.
2. If a fact is deleted, then it must be added to re-achieve a value.
3. If a prevail condition is required, then it must be achieved.
4. A goal utility dependency is achieved if its goals are achieved.
package
truck
util: 10
util: 10
cost: 5
util: 60
cost: 20
cost: 20
cost: 5
cost: 5
cost: 5
Formulation
Variables
action(a) ∈ Z+
effect(a,v,e) ∈ Z+
prevail(a,v,f) ∈ Z+
endvalue(v,f) ∈ {0,1}
goaldep(k)
Parameters
cost(a)
utility(v,f)
utility(k)
The number of times a ∈ A is executed
The number of times a transition e in state variable v is caused by action a
The number of times a prevail condition f in state variable v is required by
action a
Equal to 1 if value f is the end value in a state variable v
Equal to 1 if a goal dependency is achieved
the cost of executing action a ∈ A
the utility of achieving value f in state variable v
the utility of achieving achieving goal dependency Gk
1. If an action executes, then all of its effects and prevail conditions must also.
action(a) = Σeffects of a in v effect(a,v,e) + Σprevails of a in v prevail(a,v,f)
2. If a fact is deleted, then it must be added to re-achieve a value.
1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) + endvalue(v,f)
3. If a prevail condition is required, then it must be achieved.
1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M
4. A goal utility dependency is achieved if its goals are achieved.
goaldep(k) ≥ Σf in dependency k endvalue(v,f) – |Gk|
– 1
goaldep(k) ≤ endvalue(v,f)
∀ f in
Formulation
Variables
action(a) ∈ Z+
effect(a,v,e) ∈ Z+
The number of times a ∈ A is executed
The number of times a transition e in state variable v is caused by action a
The number of times a prevail condition f in state variable v is required by
action a
prevail(a,v,f) ∈ Z+
endvalue(v,f) ∈ {0,1}
goaldep(k)
Parameters
cost(a)
utility(v,f)
utility(k)
Equal to 1 if value f is the end value in a state variable v
Equal to 1 if a goal dependency is achieved
the cost of executing action a ∈ A
the utility of achieving value f in state variable v
the utility of achieving achieving goal dependency Gk
Objective Function
Σv∈V,f∈Dv utility(v,f) endvalue(v,f) + Σk∈K utility(k) goaldep(k) – Σa∈A cost(a) action(a)
Maximize Net Benefit
Experimental Setup
Three modified IPC 3 domains: zenotravel, satellite,
rovers
(maximize net benefit)
One IPC 5 domain: Rovers, simple
preferences (minimize (goal achievement violations + action cost))
Compared with
, a cost propagation-based heuristic
heuristic value at initial state versus optimal plan
Found using a branch and bound search
maximizing LP > IP > OPTIMAL
minimizing LP < IP < OPTIMAL
Results
Results
Results
IP
LP
Summary
IP gives bound on quality of plan
Doubly relaxed (LP) to provide heuristic
for search (Search I Session: Monday at
4:10 pm)
Future Work
Improve encoding (to give better LP
values)
Use fluent merging
Download