Decision Making Under Uncertainty Lec #4: Planning and Sensing UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Uses slides by José Luis Ambite, Son Tran, Chitta Baral and… Paolo Traverso’s (http://sra.itc.it/people/traverso/) tutorial: http://prometeo.ing.unibs.it/sschool/slides/traverso/traverso-slides.ps.gz, Some slides from http://www2.cs.cmu.edu/~mmv/planning/handouts/BDDplanning.pdf by Rune Jensen http://www.itu.dk/people/rmj Last Time: Planning by Regression • OneStepPlan(S) in the regression algorithm is the backward image of the set of states S. • Can computed as the QBF formula: xt+1 (Statest+1(xt+1) R(xt, a, xt+1)) • Quantified Boolean Formula (QBF): x (x y) = (0 y) (1 y) x (x y) = (0 y) (1 y) Last Time • Planning with no observations: – Can be done using belief states (sets of states) – Belief states can be encoded as OBDDs • Complexity? – later today • Other approaches: – Use model-checking approaches – Approximate belief state, e.g., (Petrick & Bacchus ’02, ‘04) The Model Checking Problem Determine whether a formula is true in a model 1. A domain of interest is described by a semantic model 2. A desired property of the domain is described by a logical formula 3. Check if the domain satisfy the desired property by checking whether the formula is true in the model Motivation: Formal verification of dynamic systems Now: Sensing Actions • Current solutions for Nondeterministic Planning: – Conditional planning: condition on observations that you make now – Condition on belief state Medication Example (Deterministic) • Problem – A patient is infected. He can take medicine and get cured if he were hydrated; otherwise, the patient will be dead. To become hydrated, the patient can drink. The check action allows us to determine if the patient is hydrated or not. • Goal: not infected and not dead. • Classical planners cannot solve such kind of problems because – it contains incomplete information: we don’t know whether he is initially hydrated or not. – it has a sensing action: in order to determine whether he is hydrated, the check action is required. Planning with sensing actions and incomplete information • How to reason about the knowledge of agents? • What is a plan? – Conditional plans: contain sensing actions and conditionals such as “if-then-else” structure • In contrast - Conformant plans: a sequence of actions which leads to the goal regardless of the value of the unknown fluents in the initial state Plan tree examples nil a a a a f f b b f b1 f g g h h d c1 c2 d1 d2 c [] [a] [a;b] b2 [a;b;if(f,c,d)] a;if(f,[b1;if(g,c1,c2)]; [b2;if(h,d1,d2)]) (1,1) Plan trees (cont) Example Path chk hyd chk (1,1) hyd hyd med dr (2,2) (2,1) med (3,2) hyd med (2,1) Time dr (2,2) (3,2) med Why plan trees? • Think of each node as a state that the agent might be in during the plan execution. • The root is the initial state. • Every leaf can be the final state. • The goal is satisfied if it holds in every final states, i.e., “leaves” of the tree Path (1,1) (2,1) (2,2) Time (3,2) Limitations of Approach • Can condition only on current sensing • No accumulation of knowledge • Forward-search approach – can we do better? • Our regression algorithm from last time: – Regress, and allow merging of sets/actions A,B when there is a sensing action that can distinguish the members of A,B Sensing Actions • Current solutions for Nondeterministic Planning: – Conditional planning: condition on observations that you make now – Condition on belief state Conditioning on Belief State • Planning Domain D=<S,A,O,I,T,X> – S set of states – A set of actions – O set of observations – I S initial belief state – T SAS transition relation (trans. model) – X SO observation relation (obs. model) Due to (Bertoli & Pistore; ICAPS 2004) Conditioning on Belief State • Plan P=<C,c0,act,evolve> for planning domain D – what we need to find – C set of belief states • belief states = contexts in (Bertoli & Pistore ‘04) – c0C initial belief state – act: CxO A action function – evolve: CxO C belief-state evolution func. • Very similar to belief-state MDPs • Represents an infinite set of executions Conditioning on Belief State • Configuration (s,o,c,a) for planning domain D – a state of the executor – sS world state – oX(s) observation made in state s – cC belief state that the executor holds – a = act(c,o) the action to be taken with this belief state and observation • How do we evolve a configuration? Example A planning problem P for a planning Domain Planning Domain D=<S,A,O,I,T,X>: • I S is the set of initial states • G S is the set of goal states I G Example: Patient + Wait between Check and Medication (1,1) Path chk hyd chk (1,1) hyd hyd med dr (2,2) (2,1) med (3,2) hyd med (2,1) Time dr (2,2) (3,2) med Left-Over Issues • Limitation • Languages for specifying nondeterministic effects, sensing (similar to STRIPS?) – Your Presentation • Complexity • Probabilistic domains – next class Homework 1. Read readings for next time: [Michael Littman; Brown U Thesis 1996] chapter 2 (Markov Decision Processes)