A Hybrid Linear Programming and Relaxed Plan Heuristic for Partial Satisfaction Planning Problems

advertisement
A Hybrid Linear Programming
and Relaxed Plan Heuristic for
Partial Satisfaction Planning
Problems
J. Benton
Menkes van den Briel
Subbarao
Kambhampati
Arizona State University
UD
PSP
Partial Satisfaction Planning with Utility
Dependency
(Do, et al., IJCAI 2007)
(Smith, ICAPS 2004; van den Briel, et al., AAAI 2004)
Actions have cost
loc1
Goal sets have utility
150
loc2
200
100
101
loc3
Maximize Net Benefit (utility - cost)
(fly plane loc2)
S0
(at plane loc1)
(in person
plane)
(debark person loc2)
S1
(at plane loc2)
(in person
plane)
cost: 1
sum cost: 151
S2
(at plane
loc2)
(at person
loc2)
(fly plane loc3)
S3
(at plane loc3)
(at person
loc2)
cost: 150
sum cost: 251
cost: 150
sum cost: 150
util(S3): 1000+1000+10=2010
util(S1): 0
util(S0): 0
util(S2): 1000
net benefit(S0): 0-0=0 net benefit(S1): 0-150=-150net benefit(S2): 1000-151=849net benefit(S3): 2010-251=1759
utility((at plane loc3)) = 1000
utility((at person loc2)) = 1000
utility((at plane loc1) & (at person loc3)) = 10
Heuristic search for
SOFT GOALS
Action Cost/Goal
Achievement
Interaction
Plan Quality
(Do & Kambhampati, KCBS
2004;
Do, et al., IJCAI 2007)
Relaxed Planning Graph
Heuristics
Integer programming
(IP)
LP-relaxation Heuristics
Cannot take all complex
interactions into account
Current encodings don’t
scale well, can only be
optimal to some plan
step
BBOP-LP
Approach
Build a network flowbased
IP encoding
No time indices
Uses multi-valued variables
Use its LP relaxation for a
heuristic value
Gives a second relaxation on
the heuristic
Perform branch and
bound
search
Uses the LP solution to find a
relaxed plan
(similar to YAHSP, Vidal 2004)
Building a Heuristic
A network flow model on variable transitions
(no time indices)
Capture relevant transitions with
multi-valued fluents
prevail constraints
cost on actions
loc1
150
200
plane
initial states
goal states
utility on goals
loc2
100
person
101
loc3
cost: 101
util: 1000
cost: 1
cost: 150
util: 10
cost: 1
cost: 100
cost: 1
cost: 200
cost: 1
util: 1000
cost: 1
cost: 1
Building a Heuristic
Constraints of this model
1. If an action executes, then all of its effects and prevail conditions must also.
2. If a fact is deleted, then it must be added to re-achieve a value.
3. If a prevail condition is required, then it must be achieved.
4. A goal utility dependency is achieved iff its goals are achieved.
plane
person
cost: 101
util: 1000
cost: 1
cost: 150
util: 10
cost: 1
cost: 100
cost: 1
cost: 200
cost: 1
util: 1000
cost: 1
cost: 1
Building a Heuristic
Constraints of this model
1. If an action executes, then all of its effects and prevail conditions must also.
action(a) = Σeffects of a in v effect(a,v,e) + Σprevails of a in v prevail(a,v,f)
2. If a fact is deleted, then it must be added to re-achieve a value.
1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) + endvalue(v,f)
3. If a prevail condition is required, then it must be achieved.
1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M
4. A goal utility dependency is achieved iff its goals are achieved.
goaldep(k) ≥ Σf in dependency k endvalue(v,f) – |Gk|
– 1
Variables goaldep(k) ≤ endvalue(v,f)
∀ f in
k The number of times a ∈ A is executed
action(a) dependency
∈ Z+
The number of times a transition e in state variable v is caused by action a
effect(a,v,e) ∈ Z+
prevail(a,v,f) ∈ Z+
endvalue(v,f) ∈ {0,1}
goaldep(k)
Parameters
cost(a)
utility(v,f)
utility(k)
The number of times a prevail condition f in state variable v is required by
action a
Equal to 1 if value f is the end value in a state variable v
Equal to 1 if a goal dependency is achieved
the cost of executing action a ∈ A
the utility of achieving value f in state variable v
the utility of achieving achieving goal dependency Gk
Objective Function
MAX Σv∈V,f∈Dv utility(v,f) endvalue(v,f) + Σk∈K utility(k) goaldep(k) – Σa∈A cost(a) action(a)
Maximize Net Benefit
2. If a fact is deleted, then it must be added to re-achieve a value.
1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) + endvalue(v,f)
Updated
3. If a prevail condition is required, then it must be achieved.
1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M
at each search node
Variables
action(a) ∈ Z+
effect(a,v,e) ∈ Z+
prevail(a,v,f) ∈ Z+
endvalue(v,f) ∈ {0,1}
goaldep(k)
Parameters
cost(a)
utility(v,f)
utility(k)
The number of times a ∈ A is executed
The number of times a transition e in state variable v is caused by action a
The number of times a prevail condition f in state variable v is required by
action a
Equal to 1 if value f is the end value in a state variable v
Equal to 1 if a goal dependency is achieved
the cost of executing action a ∈ A
the utility of achieving value f in state variable v
the utility of achieving achieving goal dependency Gk
Search
Branch and Bound
Branch and bound with time limit
All soft goals; all states are goal states
Returns the best plan (i.e., best bound)
Greedy lookahead strategy
Similar to YAHSP (Vidal, 2004)
To quickly find good bounds
LP-solution guided relaxed plan
extraction
To add informedness
Getting a Relaxed
Plan
(fly loc3 loc2)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc1)
(fly loc1 loc2)
(fly loc1 loc2)
(at plane loc2)
(at plane loc2)
(fly loc2 loc3)
(at person loc2)
(drop person
loc2)
(in person plane)
(in person plane)
(in person plane)
Getting a Relaxed
Plan
(fly loc3 loc2)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc1)
(fly loc1 loc2)
(fly loc1 loc2)
(at plane loc2)
(at plane loc2)
(fly loc2 loc3)
(at person loc2)
(drop person
loc2)
(in person plane)
(in person plane)
(in person plane)
Getting a Relaxed
Plan
(fly loc3 loc2)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc1)
(fly loc1 loc2)
(fly loc1 loc2)
(at plane loc2)
(at plane loc2)
(fly loc2 loc3)
(at person loc2)
(drop person
loc2)
(in person plane)
(in person plane)
(in person plane)
Getting a Relaxed
Plan
(fly loc3 loc2)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc1)
(fly loc1 loc2)
(fly loc1 loc2)
(at plane loc2)
(at plane loc2)
(fly loc2 loc3)
(at person loc2)
(drop person
loc2)
(in person plane)
(in person plane)
(in person plane)
Getting a Relaxed
Plan
(fly loc3 loc2)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc3)
(fly loc1 loc3)
(at plane loc1)
(at plane loc1)
(fly loc1 loc2)
(fly loc1 loc2)
(at plane loc2)
(at plane loc2)
(fly loc2 loc3)
(at person loc2)
(drop person
loc2)
(in person plane)
(in person plane)
(in person plane)
Experimental Setup
Three modified IPC 3 domains: zenotravel, satellite,
rovers
(maximize net benefit)
- action costs
- goal utilities
- goal utility dependencies
BBOP-LP :
with and without RP lookahead
Compared with
SPUDS, uses a relaxed plan-based heuristic
, an admissible cost propagation-based
heuristic
Ran with 600 second time limit
Results
rovers
satellite
optimal
solutions
(higher net benefit is better)
Found optimal solution
in
15 of 60 problems
zenotravel
Results
Summary
Novel LP-based heuristic for partial satisfaction planning
Branch and bound with RP lookahead
Planner that is sensitive to plan quality: BBOP-LP
Future Work
Improve encoding
Explore other lookahead methods
Download