An LP-Based Heuristic for Optimal Planning Menkes van den Briel J. Benton

advertisement
An LP-Based Heuristic for Optimal
Planning
Menkes van den Briel
J. Benton
Department of Industrial Engineering
Arizona State University
menkes@asu.edu
Department of Computer Science
Arizona State University
bentonj@asu.edu
Subbarao Kambhampati
Thomas Vossen
Department of Computer Science
Arizona State University
rao@asu.edu
Leeds School of Business
University of Colorado at Boulder
vossen@colorado.edu
http://rakaposhi.eas.asu.edu/yochan/
What is automated planning?
loc1
loc2
Initial state
s0  S
loc1
loc2
Goal
s*  S
What is automated planning?
loc1
loc2
loc1
Initial state
loc2
Goal
s0  S
s*  S
loc1
loc1
Action
a = pre, post, prevail
What is automated planning?
loc1
loc2
loc1
Initial state
loc2
Goal
s0  S
s*  S
Plan
P = a1, …, an
loc1
loc1
Action
a = pre, post, prevail
Motivation
• Why heuristics?
– Heuristic state space search have been very successful in
solving automated planning problems
• Why optimal planning?
– Real-world planning applications require optimal or near-optimal
solutions
• The difference between a (near) optimal solution and a feasible
solution may be the difference between winning or losing the
interest of an investor or strategic partner
LP-based heuristic
Relax the ordering of the actions
Setup an integer programming formulation
Solve the LP-relaxation and use the objective
function value as an admissible distance estimate
Strengthen the formulation by adding valid
inequalites
Action selection formulation
• Represent the planning problem as a set of loosely
coupled network flow problems
– Each state variable defines one network flow problem
– Nodes correspond to the state variable values
– Arcs correspond to state variable transitions
Simple logistics example
loc1
loc2
DTGTruck1
Load(p1,t1,l1)
Unload(p1,t1,l1)
1
Drive(l1,l2)
Drive(l2,l1)
2
Load(p1,t1,l1)
Unload(p1,t1,l1)
DTGPackage1
Load(p1,t1,l1)
Load(p1,t1,l2)
1
2
T
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Action selection formulation
• Variables
– xa  Z+, for a  A; xa is equal to the number of times action a is
executed
No time indices
No upper bound
• Objective function
– MIN aA xa
• Constraints, for all c  C, f  Vc
– eVc+(f):aAcE(e) xa – eVc–(f):bAcE(e) xb 
– xa  M eVc+(f):bAcE(e) xb
1
–1
0
if f  s0[c], f = s*[c]
if f = s0[c], f  s*[c]
otherwise
for all f  s0[c], a  AcV(f)
Simple logistics example
loc1
loc2
DTGTruck1
Load(p1,t1,l1)
Unload(p1,t1,l1)
1
Drive(l1,l2)
Drive(l2,l1)
2
Load(p1,t1,l1)
Unload(p1,t1,l1)
DTGPackage1
Load(p1,t1,l1)
Load(p1,t1,l2)
1
2
T
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Simple logistics example
Drive(l2,l1)
Load(p1,t1,l1)
Drive(l1,l2)
Unload(p1,t1,l2)
DTGTruck1
Load(p1,t1,l1)
Unload(p1,t1,l1)
1
Drive(l1,l2)
Drive(l2,l1)
2
Load(p1,t1,l1)
Unload(p1,t1,l1)
DTGPackage1
Load(p1,t1,l1)
Load(p1,t1,l2)
1
2
T
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Feasible plan
xDrive(l2,l1)
xLoad(p1,t1,l1)
xDrive(l1,l2)
xUnload(p1,t1,l2)
4
=1
=1
=1
=1
Simple logistics example
Drive(l2,l1)
…
Load(p1,t1,l1)
…
Unload(p1,t1,l2)
DTGTruck1
Load(p1,t1,l1)
Unload(p1,t1,l1)
1
Drive(l1,l2)
Drive(l2,l1)
2
Load(p1,t1,l1)
Unload(p1,t1,l1)
DTGPackage1
Load(p1,t1,l1)
Load(p1,t1,l2)
1
2
T
LP solution
xLoad(p1,t1,l1)
xUnload(p1,t1,l2)
xDrive(l2,l1)
=1
=1
= 1/M
Unload(p1,t1,l1)
Unload(p1,t1,l2)
2 + 1/M
Preliminary results
Problem
log4-0
log4-1
log4-2
log5-1
log5-2
log6-1
log6-9
log12-0
log15-1
freecell2-1
freecell2-2
freecell2-3
freecell2-4
freecell2-5
freecell3-5
freecell13-3
freecell13-4
freecell13-5
driverlog1
driverlog2
driverlog3
driverlog4
driverlog6
driverlog7
driverlog13
driverlog19
driverlog20
LP
LP-
Lplan
16.0*
14.0*
10.0*
12.0*
6.0*
10.0*
18.0*
32.0*
54.0*
9
8
8
8
9
12
55
54
52
3.0*
12.0*
8.0*
11.0*
8.0*
11.0*
15.0*
60.0*
60.0*
h+
17
15
11
13
7
11
19
33
9
8
8
8
9
13
7
13
9
12
9
12
16
-
hFF
19
17
13
15
8
13
21
39
63
9
8
8
8
9
13
6
14
11
12
10
12
21
89
84
Optimal
19
17
13
15
8
13
21
39
66
9
8
9
9
9
14
95
94
94
8
15
11
15
10
15
26
93
106
20
19
15
17
8
14
24
9
8
8
8
9
7
19
12
16
11
13
-
Preliminary results
Problem
LP
zenotravel1
zenotravel2
zenotravel3
zenotravel4
zenotravel5
zenotravel6
zenotravel13
zenotravel19
zenotravel20
tpp1
tpp2
tpp3
tpp4
tpp5
tpp6
tpp28
tpp29
tpp30
bw-sussman
bw-12step
bw-large-a
bw-large-b
LP-
Lplan
1
3.0*
4.0*
5.0*
8.0*
8.0*
18.0*
46.0*
50.0*
3.0*
6.0*
9.0*
12.0*
15.0*
21.0*
150.0*
174.0*
4
4
12
16
h+
1
5
5
6
9
9
19
5
7
10
13
17
23
6
8
12
18
hFF
1
4
5
6
11
11
23
62
4
7
10
13
17
21
5
4
12
16
Optimal
1
4
5
6
11
13
23
63
69
4
7
10
13
17
21
88
104
101
5
7
12
16
1
6
6
8
11
11
5
8
11
14
19
6
12
12
18
Strengthening techniques
• Composition of state variables (i.e. fluent merging)
– Given the domain transition graph (DTG) of two state variables
c1, c2, the composition of DTGc1 and DTGc2 is the domain
transition graph DTGc1||c2 = (Vc1||c2, Ec1||c2) where
– Vc1||c2 = Vc1  Vc2
– ((f1,g1),(f2,g2))  Ec1||c2 if f1,f2  Vc1, g1,g2  Vc2 and there exists an
action a  A such that one of the following conditions hold
• pre[c1] = f1, post[c1] = f2, and pre[c2] = g1, post[c2] = g2
• pre[c1] = f1, post[c1] = f2, and prevail[c2] = g1, g1 = g2
• pre[c1] = f1, post[c1] = f2, and g1= g2
The term composition is also used in model checking to define the
parallel composition or the synchronized product of automata
[Cassandras & Lafortune, 1999]
Example
• Two DTGs and their composition
f1,,g1
d
c
f3,g2
f1,g2
f1
a
c
a
f2
a
g1
b
b
c
f3,g1
d
f2,g1
b
d
f3
g2
f2,g2
DTGc1
DTGc2
DTGc1 || c2
Example
• Two DTGs and their composition
– Small in-arcs denote the initial state
– Double circles denote the goal
f1,,g1
d
c
f1,g2
f1
a
c
a
f2
a
g1
b
b
c
f3,g1
d
f2,g1
b
d
f3
g2
f2,g2
DTGc1
DTGc2
DTGc1 || c2
Simple logistics example
loc1
loc2
DTGTruck1 || Package1
2,1
Drive(l1,l2)
1,2
Drive(l2,l1)
2,2
Drive(l2,l1)
Drive(l1,l2)
1,1
Unload(p1,t1,l1)
Unload(p1,t1,l2) Drive(l2,l1)
Load(p1,t1,l2)
Load(p1,t1,l1)
1,T
Drive(l1,l2)
2,T
Simple logistics example
Drive(l2,l1)
Load(p1,t1,l1)
Drive(l1,l2)
Unload(p1,t1,l2)
DTGTruck1 || Package1
2,1
Drive(l1,l2)
1,2
Drive(l2,l1)
2,2
Drive(l1,l2)
LP solution
Drive(l2,l1)
1,1
Unload(p1,t1,l1)
Unload(p1,t1,l2) Drive(l2,l1)
Load(p1,t1,l2)
1,T
Drive(l1,l2)
2,T
xDrive(l2,l1)
xLoad(p1,t1,l1)
xDrive(l1,l2)
xUnload(p1,t1,l2)
4
=1
=1
=1
=1
Another example
• Two DTGs and their composition
f1,,g1
f3,g3
f1
g1
f2
g2
f3
g3
DTGc1
DTGc2
f1,g2
f3,g2
f1,g3
f3,g1
f2,g1
f2,g3
f2,g2
DTGc1 || c2
Another example
• Two DTGs and their composition
– Solution to the individual state variables
f1,,g1
f3,g3
f1
a
g1
f1,g2
f3,g2
f1,g3
b
f2
b
g2
f3,g1
f2,g1
a
f3
g3
DTGc1
DTGc2
f2,g3
f2,g2
DTGc1 || c2
Another example
• Two DTGs and their composition
– Solution to the individual state variables represented in the
composed state variable
f1,,g1
f3,g3
f1,g2
a
f1
a
g1
f3,g2
b
f1,g3
b
f2
b
g2
f3,g1
f2,g1
a
f3
g3
DTGc1
DTGc2
f2,g3
f2,g2
DTGc1 || c2
Another example
• Two DTGs and their composition
– Solution to the individual state variables represented in the
composed state variable
f1,,g1
f3,g3
f1,g2
a
f1
a
g1
f3,g2
b
f1,g3
b
f2
b
g2
f3,g1
f2,g1
a
f3
g3
DTGc1
DTGc2
f2,g3
f2,g2
DTGc1 || c2
Violates balance of
flow constraints
Another example
• Two DTGs and their composition
– Adding new balance of flow constraints strengthens the
formulation
f1,,g1
c
f3,g3
f1,g2
a
e
f1
a
g1
c
g2
e
b
f1,g3
b
f2
b
f3,g2
d
f3,g1
d
a
f3
g3
DTGc1
DTGc2
f2,g3
f2,g2
DTGc1 || c2
f2,g1
Identifying mergeable fluents
• When should we create a composition of two or more
state variables?
– Look at the causal graph
– Look at the actions that introduce dependencies in the causal
graph
Person 1
Person 2
Person 1
Person 2
Airplane 1
Airplane 2
Airplane 1
Fuel1
Airplane 2
Fuel2
Fuel 1
Fuel 2
Experimental setup
• Objective
– Minimize number of actions
• Domains
– Selected domains from the International Planning Competition
•
•
•
•
•
•
Logistics
Freecell
Driverlog
Zenotravel
TPP
Blocksworld
• Resources
–
–
–
–
2.67Ghz Linux machine
1GB memory
15 minutes runtime
CPLEX 10.0
Experimental setup
• Distance estimates
– LP
• Action selection formulation with strengthening
– LP–
• Action selection formulation without strengthening
– Lplan
• Step based integer programming formulation by Lplan [Bylander, 1997]
– h+
• Optimal relaxed plan when the delete effects are ignored
– hFF
• Inadmissible but efficient relaxed plan heuristic by FF [Hoffmann, and
Nebel, 2001]
– Optimal
• Optimal distance estimate given by Satplanner using the –opt flag
[Rintanen, Heljanko, and Niemela, 2005]
Experimental results
Problem
log4-0
log4-1
log4-2
log5-1
log5-2
log6-1
log6-9
log12-0
log15-1
freecell2-1
freecell2-2
freecell2-3
freecell2-4
freecell2-5
freecell3-5
freecell13-3
freecell13-4
freecell13-5
driverlog1
driverlog2
driverlog3
driverlog4
driverlog6
driverlog7
driverlog13
driverlog19
driverlog20
LP
LP20
19
15
17
8
14
24
42
67
9
8
8
8
9
12
55
54
52
7
19
11
15.5
11
13
24
96.6*
89.5*
Lplan
16.0*
14.0*
10.0*
12.0*
6.0*
10.0*
18.0*
32.0*
54.0*
9
8
8
8
9
12
55
54
52
3.0*
12.0*
8.0*
11.0*
8.0*
11.0*
15.0*
60.0*
60.0*
h+
17
15
11
13
7
11
19
33
9
8
8
8
9
13
7
13
9
12
9
12
16
-
hFF
19
17
13
15
8
13
21
39
63
9
8
8
8
9
13
6
14
11
12
10
12
21
89
84
Optimal
19
17
13
15
8
13
21
39
66
9
8
9
9
9
14
95
94
94
8
15
11
15
10
15
26
93
106
20
19
15
17
8
14
24
9
8
8
8
9
7
19
12
16
11
13
-
Experimental results
Problem
LP
zenotravel1
zenotravel2
zenotravel3
zenotravel4
zenotravel5
zenotravel6
zenotravel13
zenotravel19
zenotravel20
tpp1
tpp2
tpp3
tpp4
tpp5
tpp6
tpp28
tpp29
tpp30
bw-sussman
bw-12step
bw-large-a
bw-large-b
LP1
6
6
8
11
11
24
66.2*
68.3*
5
8
11
14
19
25
4
4
12
16
Lplan
1
3.0*
4.0*
5.0*
8.0*
8.0*
18.0*
46.0*
50.0*
3.0*
6.0*
9.0*
12.0*
15.0*
21.0*
150.0*
174.0*
4
4
12
16
h+
1
5
5
6
9
9
19
5
7
10
13
17
23
6
8
12
18
hFF
1
4
5
6
11
11
23
62
4
7
10
13
17
21
5
4
12
16
Optimal
1
4
5
6
11
13
23
63
69
4
7
10
13
17
21
88
104
101
5
7
12
16
1
6
6
8
11
11
5
8
11
14
19
6
12
12
18
Distance estimates from the initial state to the goal (highlighted values
equal the optimal distance)
Experimental results
• Heuristic calculation time
1000
100
lp
lplplan
10
h+
1
0.1
0.01
Logistics
Freecell
Driverlog
Zenotravel
TPP
Blocks
Conclusions and future work
• LP-based heuristic that respects delete effects, but
ignores action ordering shows very promising results
– Finds the optimal distance estimate in several problem instances
– Can be used to calculate admissible distance estimates for
various optimization problems in planning
– Ongoing work successfully incorporated our LP-based heuristic
in a search algorithm that solves oversubscription planning
• Interesting directions for future work
– Apply fluent merging more aggressively
– Extend the formulation into a complete planning system
LP-based heuristic
Relax the ordering of the actions
Setup an integer programming formulation
Solve the LP-relaxation and use the objective
function value as an admissible distance estimate
Strengthen the formulation by adding valid
inequalites
Download