Effective Approaches for Partial Satisfaction (Over-subscription) Planning Romeo Sanchez *

advertisement
Effective Approaches for Partial
Satisfaction (Over-subscription) Planning
Romeo Sanchez *
Menkes van den Briel **
Subbarao Kambhampati *
* Department of Computer Science and Engineering
** Department of Industrial Engineering
Arizona State University
Tempe, Arizona
Outline
 Background
 Example
 Approaches
 Optiplan
 Altaltps
 Sapaps
 Planning graph heuristics
 Results
Background
In one day achieve the following 100
goals: RockData at WP 1, high-res
pics at WP 2 & 3, …., SoilData at WP
100
Given: Actions with costs, and
goals with utilities, find a plan
that has a highest {utility – cost}
No way I can achieve
that many goals in
one day
For all your demands,
you could’ve bought me
a better flash memory
stick at least!
It’s hard but here is
the best I can do:
Goal1, Goal5, Goal99
Previous Approaches:
Highest utility goal first
Estimating the set of most beneficial goals
Background
 Complete satisfaction (traditional) planning
 Goal state G is a list of conjunctions: G = g1  g2  …  gn
 A plan that achieves n – 1 goal fluents is as good as a plan that
achieves 0 goal fluents
 Partial satisfaction planning (PSP)
 Goal state G is a list of fluents: G = {g1, g2 , …, gn}
 Goal fluents might have utilities, actions might have costs,
therefore achieving a partial plan might be more beneficial than the
“null” plan.
 Achieving all goal fluents might be impossible…
 The goal state G may contain logically conflicting fluents
(:goal (and (pointing satellite1 moon) (pointing satellite1 mars) ))
 There might not be enough resources to achieve all fluents in G
(:goal (and (have_rock rover1 waypoint1) (have_rock rover1 waypoint2) ))
PSP problems
 PSP Net benefit:
 Given a planning problem P = (F, A, I, G), and for each action a
“cost” ca  0, and for each goal fluent f  G a “utility” uf  0, and a
positive number k. Is there a finite sequence of actions  = (a1, a2,
…, an) that starting from I leads to a state S that has net benefit
f(SG) uf – a ca  k.
PLAN EXISTENCE
PLAN LENGTH
PSP GOAL
PSP GOAL LENGTH
PLAN COST
PSP UTILITY
PSP NET BENEFIT
PSP UTILITY COST
Example
 Getting from Las Vegas (LV) to San Jose (SJ)
C: action cost
U(G): utility of goal G
G1,G2,G3,G4: goals
P = {travel(LV,DL), travel(DL,SJ), travel(SJ,SF)} achieves G1, G2, G3
Approaches
 Optiplan
 Integer programming based STRIPS planner
 Solves the PSP problem by encoding it as an integer program
 Altaltps
 Heuristic regression planner
 Solves the PSP problem through a goal selection heuristic
 Sapaps
 Heuristic forward state space planner
 Solves the PSP problem using an anytime A* algorithm
Optiplan
 Optiplan planning system:
 Combines Graphplan (Blum & Furst, 1995) with State Change
Encoding (Vossen et al., 1999)
 As in the Blackbox planning system, Graphplan reduces the
encoding size generated by Optiplan
 Computes optimal plans for a given parallel length
 Objective:
 fG Uf (x_addf,n + x_preaddf,n + x_maintainf,n) –  lL aA Ca ya,l
 Sum of goal utilities
– Sum of action cost
Optiplan and partial satisfaction
Objective
 0 / Minimize #actions
Objective
 Maximize net benefit
 Goal utility – action cost
Constraints
 Fluent changes
 Satisfy initial state
 Satisfy goal
 Fluent implications
 Action implications
 Total satisfaction planning:
goal satisfaction is treated as
a hard constraint
Constraints
 Fluent changes
 Satisfy initial state
 Fluent implications
 Actions implications
 Partial satisfaction planning:
goal satisfaction is treated as
a soft constraint
Graphplan based cost propagation
AltAltps
 AltAlt planning system
 Heuristic state-space search planner (Nguyen, Kambhampati &
Sanchez, 2002)
 Combines Graphplan (Blum & Furst, 1995) with heuristic statespace search techniques (Bonet, Loerincs & Geffner, 1997; Bonet
Geffner, 1999; McDermott 1999)
 AltAltps planning system
 Total enumeration on 2n goal subsets is too costly
 Selects a promising subset of the top-level goals upfront
 Searches for a plan using a regression state space search combined
with cost-sensitive planning graph heuristics.
AltAltps cost propagation
 Using a planning graph structure
 Propositions in the initial state come for free (they have zero cost)
 Other propositions have costs computed as follows:
0
0
0
5 5
0
4
l=0
4
0
5 5
0
3 8
4 4
l=1
l=2
hl(p) = Cost of proposition p at level l
hl(p) =
0
if p  I
min{hl-1(p), cost(a) + Cl(a)}
if l > 0

otherwise
 Propagation procedures
 Max-propagation
Cl(a) = max{hl-1(q) : q  prec(a)}
 Sum-propagation
Cl(a) = q  prec(a) hl-1(q)
AltAltps goal set selection
 Main idea
 Start with the original goal set G and an empty goal set G’
 Iteratively add goals to G’ as long as the estimated NET BENEFIT
increases
 The cost of adding another goal g to G’ depends on the goals that
are already in G’
G’  g
G’
Cost for achieving G’
Relaxed plan for G’ (R’p)
Residual cost for g
Rp for G’  g biased to re-use actions in R’p
AltAltps cost-sensitive relaxed plan heuristic
 General procedure
 States are ranked during search using the relaxed plan heuristic
and the propagated costs
 The idea is to compute the cost of a relaxed plan Rp in terms of the
costs of the actions composing it.
1.
Given a state S, remove the (sub)goal g from S that has highest hl(g)
2.
Select the action that supports g with lowest cost (cost(a) + Cl(a))
3.
Regress S over a to get S’ = S  prec(a) \ eff(a)
4.
Stop when each proposition q S is present in the initial state
 Heuristic value for S equal h(S) = aRpcost(a)
Sapaps
SAPAPS: a forward A* approach for PSP
A5: SampleRock
A1: Navigate(X,Y)
A2: SampleSoil(Y)
A4: Navigate(Y,Z)
Anytime A* Algorithm:
Search through best beneficial nodes
A3: TakePicture



g(S) = Util(HasSoilData) – Cost(A1,A2)
h(S) = Util(Apply(A3,S)) – Cost(A3)

g(S) = U(S) – C(S)
h(S) = U(RP(S)) – C(RP(S))
Beneficial Node:


A*: f(S) = g(S) + h(S)
Nodes evaluation:
g(S) > 0 or U(S) > C(S)
Termination Node:

V S’: g(S) > f(S’)
SAPAPS: heuristic
Heuristic: Variation of SAPA’s Approach
Heuristically extracting the least cost relaxed plan using cost-function
Remove “unbeneficial” goals and related actions
A1
A3
A4
G1
G2
A2
→
A1
A3
G3
C(A1) + C(A2) > U(G3)
G1
G2
Empirical results
Empirical results
Future work
Download