Towards Model-lite Planning A Proposal For Learning & Planning with Incomplete Domain Models Sungwook Yoon Subbarao Kambhampati Supported by DARPA Integrated Learning Program A Planning Problem Suppose you have a super fast planner and a target application. What is the first problem you have to solve? Is it a problem from the application? Domain Engineering is hard Model-lite Planning Towards Model-lite Planning - Sungwook Yoon Snapshot of the talk • This is a proposal. We formulate learning and planning problems and solution methods for them. We tested our idea on some problems. But the verification is still an undergoing process • We propose – Representation for model-lite planning • probabilistic logic, incompleteness is quantified • Explicit consideration of domain invariant – Learning of the domain model • Update of the probability and finding of the new axioms – Planning with the model • Deterministic planning domain needs probabilistic planning • Most plausible plan that respects the current domain model Towards Model-lite Planning - Sungwook Yoon Representation • Precondition Axiom: pAi, A → prei • Uncertainty is quantified as a probability • Effect Axiom: eAi, A → effecti • Facilitates learning Towards Model-lite Planning - Sungwook Yoon Domain Model - Blocksworld • • • • • • 0.9, Pickup (x) -> armempty() 1, Pickup (x) -> clear(x) 1, Pickup (x) -> ontable(x) 0.8, Pickup (x) –> holding(x) 0.8, Pickup (x) -> not armempty() 0.8, Pickup (x) -> not ontable(x) Towards Model-lite Planning - Sungwook Yoon Precondition Axiom: Relates Actions with Current state facts Effect Axiom: Relates Actions with Next state facts Representation • One modeling problem • Conjunction of the effect have different semantics, if the probability of each effect is independently specified • Add hidden variable, O , (e, A → O), then add deterministic axioms for each effect, (1,O → eff1), (1,O → eff2), … • We can alleviate this problem also with explicit domain invariant property • 0.8, Pickup (x) –> holding(x) Static Property: • 1, holding(x) -> not armempty() Effect Axiom: RelatesRelates ActionsFacts with in a • 0.8, Pickup (x) -> not armempty() • • 1,0.8, holding(x) -> not ontable(x) Next state State facts Pickup (x) -> not ontable(x) • Writing explicit domain invariant property is easier than writing initial state generator and a set of operators that respects such property Towards Model-lite Planning - Sungwook Yoon Learning the domain model • Given a trajectory of states and actions, S1,A1,S2,A2, … , Sn,An,Sn+1 – – – – We can learn precondition axioms from (S1,A1), (S2,A2), …, (Sn,An) We can learn effect axioms from (A1,S2), (A2,S3), … , (An,Sn+1) We can learn domain invariant properties from each state (S1), … , (Sn+1) The weights (probabilities) of the axioms can be updated with simple perceptron update • There are readily available package for weighted logic learning – Alchemy (MLN) – Problog • Structure learning – Alchemy provides structure learning too – We can also enumerate all the possible axioms (very costly for planning) Towards Model-lite Planning - Sungwook Yoon Model-lite planning Probabilistic Planning • As stated before, with incomplete domain knowledge, a deterministic planning domain should be treated as a probabilistic domain • The resulting plan should be maximally consistent with the current domain model • We develop a planning technique for this purpose – A plan that is maximally plausible, given the probabilistic axioms, initial state and goal • MPE solution to a Bayes Net problem – Build on plangraph Towards Model-lite Planning - Sungwook Yoon Probabilistic Plangraph A Domain Invariant Property Can be asserted too clear_a clear_b armempty ontable_a ontable_b pickup_a pickup_b noop_clear_a noop_clear_b noop_armempty noop_ontable_a noop_ontable_b clear_a clear_b armempty ontable_a ontable_b holding_a holding_b pickup_a pickup_b stack_a_b stack_b_a noop_clear_a noop_clear_b noop_armempty noop_ontable_a noop_ontable_b noop_holding_a noop_holding_b 0.8 How do we generate a weighted clause? 0.95, pickup_b’ v holding_b B 0.8 Red lines indicate Mutexes Towards Model-lite Planning - Sungwook Yoon A B clear_a clear_b armempty ontable_a ontable_b holding_a holding_b on_a_b on_b_a Can we view the probabilistic plangraph as Bayes net? 0.5 clear_a clear_b armempty ontable_a ontable_b A Domain Invariant Property Can be asserted too, 0.9 pickup_a pickup_b noop_clear_a noop_clear_b noop_armempty noop_ontable_a noop_ontable_b clear_a clear_b armempty ontable_a ontable_b holding_a holding_b pickup_a pickup_b stack_a_b stack_b_a noop_clear_a noop_clear_b noop_armempty noop_ontable_a noop_ontable_b noop_holding_a noop_holding_b 0.8 Evidence Variables B 0.8 How we find a solution? MPE (most probabilistic explanation) There are some solvers out there Towards Model-lite Planning - Sungwook Yoon A B clear_a clear_b armempty ontable_a ontable_b holding_a holding_b on_a_b on_b_a MPE as Maxsat • There has been a work by James D. Park, AAAI 2002 • Set –log(P) as the weight of the clauses A/B P T T 0.7 F T 0.3 T F 0.2 F F 0.8 Weighted Clauses -log0.7 -A v –B -log0.3 A V –B -log0.2 –A v B -log0.8 A v B Intuitive explanation Violating the clause is easier for High probability instances Thus the MaxSat Problem Gives you the highest probability instantiations A->B, T T 1, T F 0, InfinityWeight for –A v B, (complies with our intuitive understanding) Towards Model-lite Planning - Sungwook Yoon Probabilistic Plangraph to MaxSat -log0.5 clear_a clear_b armempty ontable_a ontable_b Domain Invariant Property Can be asserted too, -log0.9 pickup_a pickup_b noop_clear_a noop_clear_b noop_armempty noop_ontable_a noop_ontable_b clear_a clear_b armempty ontable_a ontable_b holding_a holding_b B pickup_a pickup_b stack_a_b stack_b_a noop_clear_a noop_clear_b noop_armempty noop_ontable_a noop_ontable_b noop_holding_a noop_holding_b -log0.8 Evidence Variables A -log0.8 For each probabilistic weight, we give –log(1-p)! That’s it. Towards Model-lite Planning - Sungwook Yoon A B clear_a clear_b armempty ontable_a ontable_b holding_a holding_b on_a_b on_b_a Exploding Blocksworld Towards Model-lite Planning - Sungwook Yoon Current Status (ongoing) • Learning test – Generated Blocksworld Random Wandering Data and feed them to Alchemy with correct and incorrect axioms – Alchemy found higher weight on the correct axioms and lower weight on the incorrect axioms • Planning test – Tested on probabilistic planning problems – Hand tested on a couple of instances of Slippery Gripper Domain • Hand encoded the clauses and assigned the weight • Put the resulting clauses to MaxSat solve • Got desired results – On Exploding Blocksworld • Implemented generic MaxSat encoder for probabilistic planning problems • Tested on a couple of problems from Exploding Blocksworld • Finds desired output frequently (not always) Towards Model-lite Planning - Sungwook Yoon Summary • We can learn precondition axioms and effect axioms separately. – A -> Prec, A->Effect – Facilitates the learning • Domain axiom or Invariant Property can be, provided, learned and used explicitly – It is better for domain modeler • For planning, we can apply probabilistic plangraph approach – We proposed using MaxSat to solve probabilistic planning problems – Interesting parallel to deterministic planning to SAT Towards Model-lite Planning - Sungwook Yoon Domain Learning – Related Work • Logical Filtering (Chang & Eyal, ICAPS’06) – Update belief state and domain transition model – Experiments involved planning • Probabilistic operator learning (Zettlemoyer, Pasula and Kaelbling, AAAI’05) – Experiments involved planning • ARMS (Yang, Wu and Jiang, ICAPS ‘05) – No observation besides initial state and goal Towards Model-lite Planning - Sungwook Yoon Probabilistic Planning in Plangraph – Related Work • Pgraphplan, Paragraph • Both search plans in the graphplan framework. • pGraphplan searches for a consistent plan that maximizes the goal-reaching probability – Forward probability propagation • Paragraph searches for a plan that minimizes the cost to reach the goal – Backward plan search Towards Model-lite Planning - Sungwook Yoon