An LP-Based Heuristic for Optimal Planning Menkes van den Briel J. Benton Department of Industrial Engineering Arizona State University menkes@asu.edu Department of Computer Science Arizona State University bentonj@asu.edu Subbarao Kambhampati Thomas Vossen Department of Computer Science Arizona State University rao@asu.edu Leeds School of Business University of Colorado at Boulder vossen@colorado.edu http://rakaposhi.eas.asu.edu/yochan/ What is automated planning? loc1 loc2 Initial state s0 S loc1 loc2 Goal s* S What is automated planning? loc1 loc2 loc1 Initial state loc2 Goal s0 S s* S loc1 loc1 Action a = pre, post, prevail What is automated planning? loc1 loc2 loc1 Initial state loc2 Goal s0 S s* S Plan P = a1, …, an loc1 loc1 Action a = pre, post, prevail Motivation • Why heuristics? – Heuristic state space search have been very successful in solving automated planning problems • Why optimal planning? – Real-world planning applications require optimal or near-optimal solutions • The difference between a (near) optimal solution and a feasible solution may be the difference between winning or losing the interest of an investor or strategic partner LP-based heuristic Relax the ordering of the actions Setup an integer programming formulation Solve the LP-relaxation and use the objective function value as an admissible distance estimate Strengthen the formulation by adding valid inequalites Action selection formulation • Represent the planning problem as a set of loosely coupled network flow problems – Each state variable defines one network flow problem – Nodes correspond to the state variable values – Arcs correspond to state variable transitions Simple logistics example loc1 loc2 DTGTruck1 Load(p1,t1,l1) Unload(p1,t1,l1) 1 Drive(l1,l2) Drive(l2,l1) 2 Load(p1,t1,l1) Unload(p1,t1,l1) DTGPackage1 Load(p1,t1,l1) Load(p1,t1,l2) 1 2 T Unload(p1,t1,l1) Unload(p1,t1,l2) Action selection formulation • Variables – xa Z+, for a A; xa is equal to the number of times action a is executed No time indices No upper bound • Objective function – MIN aA xa • Constraints, for all c C, f Vc – eVc+(f):aAcE(e) xa – eVc–(f):bAcE(e) xb – xa M eVc+(f):bAcE(e) xb 1 –1 0 if f s0[c], f = s*[c] if f = s0[c], f s*[c] otherwise for all f s0[c], a AcV(f) Simple logistics example loc1 loc2 DTGTruck1 Load(p1,t1,l1) Unload(p1,t1,l1) 1 Drive(l1,l2) Drive(l2,l1) 2 Load(p1,t1,l1) Unload(p1,t1,l1) DTGPackage1 Load(p1,t1,l1) Load(p1,t1,l2) 1 2 T Unload(p1,t1,l1) Unload(p1,t1,l2) Simple logistics example Drive(l2,l1) Load(p1,t1,l1) Drive(l1,l2) Unload(p1,t1,l2) DTGTruck1 Load(p1,t1,l1) Unload(p1,t1,l1) 1 Drive(l1,l2) Drive(l2,l1) 2 Load(p1,t1,l1) Unload(p1,t1,l1) DTGPackage1 Load(p1,t1,l1) Load(p1,t1,l2) 1 2 T Unload(p1,t1,l1) Unload(p1,t1,l2) Feasible plan xDrive(l2,l1) xLoad(p1,t1,l1) xDrive(l1,l2) xUnload(p1,t1,l2) 4 =1 =1 =1 =1 Simple logistics example Drive(l2,l1) … Load(p1,t1,l1) … Unload(p1,t1,l2) DTGTruck1 Load(p1,t1,l1) Unload(p1,t1,l1) 1 Drive(l1,l2) Drive(l2,l1) 2 Load(p1,t1,l1) Unload(p1,t1,l1) DTGPackage1 Load(p1,t1,l1) Load(p1,t1,l2) 1 2 T LP solution xLoad(p1,t1,l1) xUnload(p1,t1,l2) xDrive(l2,l1) =1 =1 = 1/M Unload(p1,t1,l1) Unload(p1,t1,l2) 2 + 1/M Preliminary results Problem log4-0 log4-1 log4-2 log5-1 log5-2 log6-1 log6-9 log12-0 log15-1 freecell2-1 freecell2-2 freecell2-3 freecell2-4 freecell2-5 freecell3-5 freecell13-3 freecell13-4 freecell13-5 driverlog1 driverlog2 driverlog3 driverlog4 driverlog6 driverlog7 driverlog13 driverlog19 driverlog20 LP LP- Lplan 16.0* 14.0* 10.0* 12.0* 6.0* 10.0* 18.0* 32.0* 54.0* 9 8 8 8 9 12 55 54 52 3.0* 12.0* 8.0* 11.0* 8.0* 11.0* 15.0* 60.0* 60.0* h+ 17 15 11 13 7 11 19 33 9 8 8 8 9 13 7 13 9 12 9 12 16 - hFF 19 17 13 15 8 13 21 39 63 9 8 8 8 9 13 6 14 11 12 10 12 21 89 84 Optimal 19 17 13 15 8 13 21 39 66 9 8 9 9 9 14 95 94 94 8 15 11 15 10 15 26 93 106 20 19 15 17 8 14 24 9 8 8 8 9 7 19 12 16 11 13 - Preliminary results Problem LP zenotravel1 zenotravel2 zenotravel3 zenotravel4 zenotravel5 zenotravel6 zenotravel13 zenotravel19 zenotravel20 tpp1 tpp2 tpp3 tpp4 tpp5 tpp6 tpp28 tpp29 tpp30 bw-sussman bw-12step bw-large-a bw-large-b LP- Lplan 1 3.0* 4.0* 5.0* 8.0* 8.0* 18.0* 46.0* 50.0* 3.0* 6.0* 9.0* 12.0* 15.0* 21.0* 150.0* 174.0* 4 4 12 16 h+ 1 5 5 6 9 9 19 5 7 10 13 17 23 6 8 12 18 hFF 1 4 5 6 11 11 23 62 4 7 10 13 17 21 5 4 12 16 Optimal 1 4 5 6 11 13 23 63 69 4 7 10 13 17 21 88 104 101 5 7 12 16 1 6 6 8 11 11 5 8 11 14 19 6 12 12 18 Strengthening techniques • Composition of state variables (i.e. fluent merging) – Given the domain transition graph (DTG) of two state variables c1, c2, the composition of DTGc1 and DTGc2 is the domain transition graph DTGc1||c2 = (Vc1||c2, Ec1||c2) where – Vc1||c2 = Vc1 Vc2 – ((f1,g1),(f2,g2)) Ec1||c2 if f1,f2 Vc1, g1,g2 Vc2 and there exists an action a A such that one of the following conditions hold • pre[c1] = f1, post[c1] = f2, and pre[c2] = g1, post[c2] = g2 • pre[c1] = f1, post[c1] = f2, and prevail[c2] = g1, g1 = g2 • pre[c1] = f1, post[c1] = f2, and g1= g2 The term composition is also used in model checking to define the parallel composition or the synchronized product of automata [Cassandras & Lafortune, 1999] Example • Two DTGs and their composition f1,,g1 d c f3,g2 f1,g2 f1 a c a f2 a g1 b b c f3,g1 d f2,g1 b d f3 g2 f2,g2 DTGc1 DTGc2 DTGc1 || c2 Example • Two DTGs and their composition – Small in-arcs denote the initial state – Double circles denote the goal f1,,g1 d c f1,g2 f1 a c a f2 a g1 b b c f3,g1 d f2,g1 b d f3 g2 f2,g2 DTGc1 DTGc2 DTGc1 || c2 Simple logistics example loc1 loc2 DTGTruck1 || Package1 2,1 Drive(l1,l2) 1,2 Drive(l2,l1) 2,2 Drive(l2,l1) Drive(l1,l2) 1,1 Unload(p1,t1,l1) Unload(p1,t1,l2) Drive(l2,l1) Load(p1,t1,l2) Load(p1,t1,l1) 1,T Drive(l1,l2) 2,T Simple logistics example Drive(l2,l1) Load(p1,t1,l1) Drive(l1,l2) Unload(p1,t1,l2) DTGTruck1 || Package1 2,1 Drive(l1,l2) 1,2 Drive(l2,l1) 2,2 Drive(l1,l2) LP solution Drive(l2,l1) 1,1 Unload(p1,t1,l1) Unload(p1,t1,l2) Drive(l2,l1) Load(p1,t1,l2) 1,T Drive(l1,l2) 2,T xDrive(l2,l1) xLoad(p1,t1,l1) xDrive(l1,l2) xUnload(p1,t1,l2) 4 =1 =1 =1 =1 Another example • Two DTGs and their composition f1,,g1 f3,g3 f1 g1 f2 g2 f3 g3 DTGc1 DTGc2 f1,g2 f3,g2 f1,g3 f3,g1 f2,g1 f2,g3 f2,g2 DTGc1 || c2 Another example • Two DTGs and their composition – Solution to the individual state variables f1,,g1 f3,g3 f1 a g1 f1,g2 f3,g2 f1,g3 b f2 b g2 f3,g1 f2,g1 a f3 g3 DTGc1 DTGc2 f2,g3 f2,g2 DTGc1 || c2 Another example • Two DTGs and their composition – Solution to the individual state variables represented in the composed state variable f1,,g1 f3,g3 f1,g2 a f1 a g1 f3,g2 b f1,g3 b f2 b g2 f3,g1 f2,g1 a f3 g3 DTGc1 DTGc2 f2,g3 f2,g2 DTGc1 || c2 Another example • Two DTGs and their composition – Solution to the individual state variables represented in the composed state variable f1,,g1 f3,g3 f1,g2 a f1 a g1 f3,g2 b f1,g3 b f2 b g2 f3,g1 f2,g1 a f3 g3 DTGc1 DTGc2 f2,g3 f2,g2 DTGc1 || c2 Violates balance of flow constraints Another example • Two DTGs and their composition – Adding new balance of flow constraints strengthens the formulation f1,,g1 c f3,g3 f1,g2 a e f1 a g1 c g2 e b f1,g3 b f2 b f3,g2 d f3,g1 d a f3 g3 DTGc1 DTGc2 f2,g3 f2,g2 DTGc1 || c2 f2,g1 Identifying mergeable fluents • When should we create a composition of two or more state variables? – Look at the causal graph – Look at the actions that introduce dependencies in the causal graph Person 1 Person 2 Person 1 Person 2 Airplane 1 Airplane 2 Airplane 1 Fuel1 Airplane 2 Fuel2 Fuel 1 Fuel 2 Experimental setup • Objective – Minimize number of actions • Domains – Selected domains from the International Planning Competition • • • • • • Logistics Freecell Driverlog Zenotravel TPP Blocksworld • Resources – – – – 2.67Ghz Linux machine 1GB memory 15 minutes runtime CPLEX 10.0 Experimental setup • Distance estimates – LP • Action selection formulation with strengthening – LP– • Action selection formulation without strengthening – Lplan • Step based integer programming formulation by Lplan [Bylander, 1997] – h+ • Optimal relaxed plan when the delete effects are ignored – hFF • Inadmissible but efficient relaxed plan heuristic by FF [Hoffmann, and Nebel, 2001] – Optimal • Optimal distance estimate given by Satplanner using the –opt flag [Rintanen, Heljanko, and Niemela, 2005] Experimental results Problem log4-0 log4-1 log4-2 log5-1 log5-2 log6-1 log6-9 log12-0 log15-1 freecell2-1 freecell2-2 freecell2-3 freecell2-4 freecell2-5 freecell3-5 freecell13-3 freecell13-4 freecell13-5 driverlog1 driverlog2 driverlog3 driverlog4 driverlog6 driverlog7 driverlog13 driverlog19 driverlog20 LP LP20 19 15 17 8 14 24 42 67 9 8 8 8 9 12 55 54 52 7 19 11 15.5 11 13 24 96.6* 89.5* Lplan 16.0* 14.0* 10.0* 12.0* 6.0* 10.0* 18.0* 32.0* 54.0* 9 8 8 8 9 12 55 54 52 3.0* 12.0* 8.0* 11.0* 8.0* 11.0* 15.0* 60.0* 60.0* h+ 17 15 11 13 7 11 19 33 9 8 8 8 9 13 7 13 9 12 9 12 16 - hFF 19 17 13 15 8 13 21 39 63 9 8 8 8 9 13 6 14 11 12 10 12 21 89 84 Optimal 19 17 13 15 8 13 21 39 66 9 8 9 9 9 14 95 94 94 8 15 11 15 10 15 26 93 106 20 19 15 17 8 14 24 9 8 8 8 9 7 19 12 16 11 13 - Experimental results Problem LP zenotravel1 zenotravel2 zenotravel3 zenotravel4 zenotravel5 zenotravel6 zenotravel13 zenotravel19 zenotravel20 tpp1 tpp2 tpp3 tpp4 tpp5 tpp6 tpp28 tpp29 tpp30 bw-sussman bw-12step bw-large-a bw-large-b LP1 6 6 8 11 11 24 66.2* 68.3* 5 8 11 14 19 25 4 4 12 16 Lplan 1 3.0* 4.0* 5.0* 8.0* 8.0* 18.0* 46.0* 50.0* 3.0* 6.0* 9.0* 12.0* 15.0* 21.0* 150.0* 174.0* 4 4 12 16 h+ 1 5 5 6 9 9 19 5 7 10 13 17 23 6 8 12 18 hFF 1 4 5 6 11 11 23 62 4 7 10 13 17 21 5 4 12 16 Optimal 1 4 5 6 11 13 23 63 69 4 7 10 13 17 21 88 104 101 5 7 12 16 1 6 6 8 11 11 5 8 11 14 19 6 12 12 18 Distance estimates from the initial state to the goal (highlighted values equal the optimal distance) Experimental results • Heuristic calculation time 1000 100 lp lplplan 10 h+ 1 0.1 0.01 Logistics Freecell Driverlog Zenotravel TPP Blocks Conclusions and future work • LP-based heuristic that respects delete effects, but ignores action ordering shows very promising results – Finds the optimal distance estimate in several problem instances – Can be used to calculate admissible distance estimates for various optimization problems in planning – Ongoing work successfully incorporated our LP-based heuristic in a search algorithm that solves oversubscription planning • Interesting directions for future work – Apply fluent merging more aggressively – Extend the formulation into a complete planning system LP-based heuristic Relax the ordering of the actions Setup an integer programming formulation Solve the LP-relaxation and use the objective function value as an admissible distance estimate Strengthen the formulation by adding valid inequalites