Expressive and Efficient Frameworks for Partial Satisfaction Planning

advertisement
Expressive and Efficient Frameworks for
Partial Satisfaction Planning
Subbarao Kambhampati
Arizona State University
(Proposal submitted for consideration to
Behzad Kamgar-Parsi/ONR)
Partial Satisfaction/Over-Subscription Planning
 Traditional planning problems
 Find the (lowest cost) plan that satisfies all the given goals
 PSP Planning
 Find the highest utility plan given the resource constraints
 Goals have utilities and actions have costs
 …arises naturally in many real world planning scenarios
 MARS rovers attempting to maximize scientific return, given resource constraints
 UAVs attempting to maximize reconnaisance returns, given fuel etc constraints
 Logistics problems resource constraints
 … due to a variety of reasons


Constraints on agent’s resources
Conflicting goals
 With complex inter-dependencies between goal utilities
 Soft constraints

Limited time
Supporting PSP planning
 PSP planning changes planning from a “satisficing” to an “optimizing” problem

It is trivial to find a plan; hard to find a good one!
 Rich connections to OR(IP)/MDP
 Requires selecting “objectives” in addition to “actions”


Which subset of goals to achieve
At what degree to satisfy individual goals
 E.g. Collect as much soil sample as possible; get done as close to 2pm as possible
 Currently, the objective selection is left to humans
 Leads to highly suboptimal plans since objective selection cannot be done independent
of planning
 We propose to develop scalable methods for synthesizing plans in
such over-subscribed scenarios
Proposal Overview
 Preliminary work
 Simple formal model: PSP-Net Benefit
 MDP-based, IP-based, and heuristic-planning based approaches
 Proposed directions
 Improving expressiveness of PSP planners
 Handling goals needing degree of satisfaction (e.g. numeric goals)
 Handling goals with soft deadline (where utility of the delayed goals is reduced)
 Handling complex interactions between objectives
 Interactions between the plans of the goals
 Interactions between the utilities of the goals
 Improving search in PSP planners
 More powerful heuristics for PSP planning (which take interactions into account)
 More flexible search frameworks --non-combinable costs and utilities
 Multi-objective search
 Applications
 Replanning as a PSP planning problem
Formulation
 PSP Net benefit:
 Given a planning problem P = (F, A, I, G), and for each action a “cost”
ca  0, and for each goal fluent f  G a “utility” uf  0, and a positive
number k. Is there a finite sequence of actions  = (a1, a2, …, an) that
starting from I leads to a state S that has net benefit f(SG) uf – a
ca  k.
PLAN EXISTENCE
PLAN LENGTH
Maximize the Net Benefit
PSP GOAL
PSP GOAL LENGTH
PLAN COST
Actions have execution costs,
goals have utilities, and the
objective is to find the plan that
has the highest net benefit.
 easy enough to extend to
mixture of soft and hard goals
PSP UTILITY
PSP NET BENEFIT
PSP UTILITY COST
A spectrum of approaches for PSP-Net Benefit
[AAAI 2004; KBCS 2004]

EXACT METHODS
 Deterministic MDPs
 Model the problem as a
deterministic MDP with action
costs, where a state has a
reward equal to the utility of the
goals that hold in it.
 A special action “Done” takes
the agent from any state S to a
state Sd which is a sink state
 Guaranteed optimal, but very
slow (using SPUDD, a state of
the art MDP solver)
 Optiplan
 Integer programming based
STRIPS planner
 Optimal for a given plan length
 Equivalent to bounded-horizon
MDP

HEURISTIC METHODS
 Altaltps
 Heuristic planner that selects
the “objectives” up front
heuristically
 Novel use of planning-graph
based reachability analysis to
pick objectives
 Not optimal, but quite fast
 Sapaps
 Models PSP as heuristic search.
Can be optimal given admissible
heuristics.
 Can be thought of as a searchbased solution to the
deterministic MDP
Source of Strength: Planning graph based
Reachability Heuristics for PSP
Comparison of approaches
Exact algorithms based on MDPs don’t scale at all
[AAAI 2004]
[optional]
Adapting PG heuristics for PSP
0
 Challenges:
0
5
 Need to propagate costs
on the planning graph
 The exact set of goals are
not clear
 Interactions between
goals
 Obvious approach of
considering all 2n goal
subsets is infeasible
0
5
0
5
3
4
l=0


4
4
l=1
0
5
0
12
8
4
l=2
Idea: Select a subset of the top level
goals upfront
Challenge: Goal interactions
Action Templates
Graphplan
Plan Extension Phase
(based on STAN)
+
Cost Propagation
Cost-sensitive
Planning
Graph
Actions in the
Last Level
Problem Spec
(Init, Goal state)
Goal Set selection
Algorithm
Cost sensitive
Search
Solution Plan
Extraction of
Heuristics
Heuristics
 Approach: Estimate the net benefit of
each goal in terms of its utility minus
the cost of its relaxed plan
 Bias the relaxed plan extraction
to (re)use the actions already
chosen for other goals
PS
SAPA : A forward A* Approach for PSP
[optional]
Anytime A* Algorithm:
A5: SampleRock(Y)
A1: Navigate(X,Y)
A2: SampleSoil(Y)
Search through best beneficial nodes
A4: Navigate(Y,Z)
A3: TakePicture
A*: f(S) = g(S) + h(S)
g(S) is the net benefit of the plan that got us from initial state to S
-- Difference between the utility of goals holding in S and
and the cost of actions that took us from I to S
h*(S) is the additional net benefit of the best plan P starting from S
(If S’ is the result of applying P to S, then we want to maximize
[U(S’) – U(S)] – C(P)]
h(S) is the estimate of h*()
PS
SAPA :
Modeling A* search for PSP
[optional]
Many state-of-the-art planners use best-first A* search.
How to model A* search to PSP Net Benefit?
 Search node evaluation

(f = g+h):
 Lowest expected total
number of actions
 Candidate Plans:
 Qualifying plans: Achieve all
goals
 Search termination
criteria:
 Achieving all goals
 Search node evaluation

(f = g+h):
 Highest expected total “benefit”
(goal utility – action cost).
 Candidate Plans:
 “Beneficial” plans: Total achieved
goal utility > total action cost.
 Search termination criteria:
 No search node appears to be
extendable to be more beneficial
than the best beneficial plan
found.
Proposal Overview
 Preliminary work
 Simple formal model: PSP-Net Benefit
 MDP-based, IP-based, and heuristic-planning based approaches
 Proposed directions
 Improving expressiveness of PSP planners
 Handling goals needing degree of satisfaction (e.g. numeric goals)
 Handling goals with soft deadlines (where utility of the delayed goals is reduced)
 Handling complex interactions between objectives
 Interactions between the plans of the goals
 Interactions between the utilities of the goals
 Improving search in PSP planners
 More powerful heuristics for PSP planning (which take interactions into account)
 More flexible search frameworks --non-combinable costs and utilities
 Multi-objective search
 Applications
 Replanning as a PSP planning problem
Search & Heuristic Improvements
 Make objective selection more
sensitive to goal (achievement)
interactions
 Consider group interactions
 Consider negative interactions
 Preliminary work in ICAPS 2005 (with
Sanchez Nigenda)
 Consider faster techniques for
exact methods
 Leverage our recent work on novel
IP encodings
 Based on loosely coupled network
flow problems which is highly
competitive with SAT methods
Example: state change flow network
LOC1
LOAD(Package1)  DRIVE(Truck1,Loc1,Loc2)  UNLOAD(Package1)
AT_LOC1
Package1
I
AT_LOC2
t=1
t=2
t=3
G
IN_TRUCK1
 ICAPS 2005 (with van den Briel)
 Consider adapting directed and
anytime MDP techniques
LOC2
AT_LOC1
I
Truck1
AT_LOC2
Action effects link multiple networks together
Degree & Delay of Satisfaction
•
In metric temporal domains,
PSP will involve
– Partial Degree of satisfaction
• If you can’t give me
1000$, give me half at
least
• Need to track costs for
various intervals of a
numeric quantity 
– Delayed Satisfaction
• If you submit the
homework past the
deadline, you will get
penalty points
Preliminary work on degree of satisfaction in [IJCAI 2005]
Utility interactions between goals
• PSP-net benefit considers goal
achievement interactions
• ..but assumes additive model of goal
utilities
– U(G1,G2)= U(G1)+U(G2)
• Additive utility model often unrealistic
– Utility having two shoes is much more than
the sum of the utilities of having either one of
them
– Utility of having two cars is less than the sum
of utilities of having either one of them
• Challenges:
– Elicit utility models (preference elicitation)
– Model utility interactions
• Adapt and extend CP-nets for modeling goal utilities
– Can also consider qualitative preference models
– Extend the reachability heuristics to consider
both plan interactions and goal interactions
Non-combinable costs/utilities
• PSP Net Benefit assumes costs and
utilities are in same units
• …often does not hold
– E.g. different types of resource costs (fuel,
manpower); different types of utilities
• Solution: Multi-objective search
– Either elicit utility models
• Alpha * manpower + Beta * mission utility
– ..or search for highest utility plans given a specific
resource bound
– ..or provide pareto (non-dominated) set of
solution plans and let the user choose
– We plan to build on our work on multi-objective
temporal planning in SAPA
Makespan variation
60
50
Total Cost
• Challenge: Need to adapt reachability
heuristics to separately track the various
types of costs and utilities
Cost variation
40
30
20
10
0
0.1
0.2
0.3
0.4
0.5
0.6
Alpha
0
0.8
0.9
0.95
1
Combining uncertainty and partial satisfaction
 Time permitting, we hope to extend
our PSP framework to handle
stochastic domains
 Planning in stochastic domains
already has many natural affinities to
PSP
 If the planner wants to ensure that its plan
reaches goals with higher probability, it
needs to often go for longer (costlier)
plans
 ..Many challenges remain in selecting
objectives in stochastic domains
 We expect to leverage our significant work
in extending reachability heuristics for
stochastic and non-deterministic domains

[UAI 2005; AAAI 2005; ICAPS 2004; JAIR
in review]
Explaining the planner’s decisions in mixed initiative scenarios

In mixed-initiative scenarios, humans would like to get explanations on the
selected objectives


Challenge: Explaining the “optimality” of the planner’s decisions is
technically hard


Anecdotal evidence suggests that in military planning applications, human users are not
willing to take a plan when the objectives selected by the planner do not match the human’s
intuition
In contrast, explaining correctness is much simpler
Proposed approach: Will modify the reachability heuristic computations to
leave a trace of their reasoning
 Intent would be to explain at least the pareto-optimality of the selected set of objectives
1. when a subgoal cannot not be included because of cost-based or preference-based
interactions with other selected subgoals, annotate this fact
2. summarize the pareto-set (in multi-objective optimization cases) in terms of conditional plans
explaining which member of the set is “optimal” under what conditions
3. Support sensitivity analysis on the stability of the selected objectives (i.e., under what
conditions will they no longer be optimal)
Modeling Replanning as a PSP problem
 Traditionally, replanning has been
cast as a “procedure” rather than a
problem
 Modify the old plan to handle the new
situations
 ..we take the stance that replanning
is a “problem”
 Achieve the original goals of the agent
from the current initial situation
 Subject to various constraints that were
imposed by the partial execution of the
original plan
 Reservations, Commitments– these are
however soft constraints
 ..Replanning can be best modeled as a
PSP problem!
 We propose to do this..
Three Replanning Scenarios
..that differ in their assumptions about other agents

Either no other agents or the agents are neutral

E.g. Replanning in Robot path planning
 Can
focus on going from the current state to goal
state (any differences are for computational savings)

Other agents are collaborative

E.g. Travel planning where we broadcast our plans to our
friends
 Must
consider commitments made by the
announcement/execution of the plan

Other agents are adversarial

E.g. A naughty child pushing all red block stacks
 Must
consider and plan around the disruptions that
the other agents can cause
Summary and Impact
 PSP planning problems are
ubiquitous and extend the
modeling power of planning
frameworks

.. By foregrounding user preferences
among different objectives
 They pose interesting technical
challenges to the state of the art

..by emphasizing plan-quality
considerations
 We have already made significant
progress in handling PSP problems

AAAI 2004; ICAPS 2005 (2); IJCAI 2005
 ..and propose to extend our
framework significantly
 ..as well as demonstrate its power
through applications
Proposal Overview
 Preliminary work
 Simple formal model: PSP-Net Benefit
 MDP-based, IP-based, and heuristic-planning based approaches
 Proposed directions
 Improving expressiveness of PSP planners
 Handling goals needing degree of satisfaction (e.g. numeric goals)
 Handling goals with soft deadline (where utility of the delayed goals is reduced)
 Handling complex interactions between objectives
 Interactions between the plans of the goals
 Interactions between the utilities of the goals
 Improving search in PSP planners
 More powerful heuristics for PSP planning (which take interactions into account)
 More flexible search frameworks --non-combinable costs and utilities
 Multi-objective search
 Applications
 Replanning as a PSP planning problem
Download