Planning Graph Based Reachability Heuristics

advertisement
Planning Graph Based
Reachability Heuristics
Daniel Bryce & Subbarao Kambhampati
ICAPS’06 Tutorial 6
June 7, 2006
http://rakaposhi.eas.asu.edu/pg-tutorial/
dan.bryce@asu.edu
rao@asu.edu,
http://verde.eas.asu.edu
http://rakaposhi.eas.asu.edu
Scalability of Planning
Problem is Search Control!!!
 Before, planning
algorithms could
synthesize about 6
– 10 action plans in
minutes
 Significant scaleup in the last 6-7
years
Realistic encodings
of Munich airport!
 Now, we can
synthesize 100
action plans in
seconds.
The primary revolution in planning in the recent years has been
domain-independent heuristics to scale up plan synthesis
June 7th, 2006
ICAPS'06 Tutorial T6
3
Motivation
 Ways to improve Planner Scalability
 Problem Formulation
 Search Space
 Reachability Heuristics
 Domain (Formulation) Independent
 Work for many search spaces
 Flexible – work with most domain features
 Overall complement other scalability techniques
 Effective!!
June 7th, 2006
ICAPS'06 Tutorial T6
4
Topics









Classical Planning
Cost Based Planning
Rao
Partial Satisfaction Planning
<Break>
Non-Deterministic/Probabilistic Planning
Resources (Continuous Quantities)
Temporal Planning
Hybrid Models
Wrap-up
Rao
June 7th, 2006
ICAPS'06 Tutorial T6
Dan
5
Classical Planning
June 7th, 2006
ICAPS'06 Tutorial T6
6
Rover Domain
June 7th, 2006
ICAPS'06 Tutorial T6
7
Classical Planning
 Relaxed Reachability Analysis
 Types of Heuristics
 Level-based
 Relaxed Plans
 Mutexes
 Heuristic Search
 Progression
 Regression
 Plan Space
 Exploiting Heuristics
June 7th, 2006
ICAPS'06 Tutorial T6
8
Planning Graph and Search Tree

Envelope of Progression Tree
(Relaxed Progression)
 Proposition lists: Union of
states at kth level
 Lowerbound reachability
June 7th, 2006
information
ICAPS'06 Tutorial T6
9
Level Based Heuristics
 The distance of a proposition is the index of the
first proposition layer in which it appears
 Proposition distance changes when we propagate cost
functions – described later
 What is the distance of a Set of propositions??
 Set-Level: Index of first proposition layer where all
goal propositions appear
 Admissible
 Gets better with mutexes, otherwise same as max
 Max: Maximum distance proposition
 Sum: Summation of proposition distances
June 7th, 2006
ICAPS'06 Tutorial T6
10
Example of Level Based Heuristics
set-level(sI, G) = 3
max(sI, G) = max(2, 3, 3) = 3
sum(sI, G) = 2 + 3 + 3 = 8
June 7th, 2006
ICAPS'06 Tutorial T6
11
Distance of a Set of Literals
Prop list
Level 0
At(0,0)
Action list
Level 0
Prop list
Level 1
Action list
Level 1
At(0,0)
noop
x
Move(0,0,0,1)
x
noop
At(1,0)
key(0,1)
…...
…...
noop
noop
At(0,0)
Move(0,1,1,1)
xx
Move(0,0,1,0)
Key(0,1)
At(0,1)
Prop list
Level 2
x
Move(1,0,1,1)
noop
Pick_key(0,1)
x
noop
x
At(0,1)
xx
x
At(1,1)
x
x
x
At(1,0)
x
Have_key
~Key(0,1)
Key(0,1)
Admissible
x
x
Mutexes
h(S) = pS lev({p})
Sum
Partition-k


Adjusted Sum
h(S) = lev(S)
Set-Level
Combo
Set-Level
with memos
lev(p) : index of the first level at which p comes into the planning graph
lev(S): index of the first level where all props in S appear non-mutexed.

If there is no such level, then
If the graph is grown to level off, then 
Else k+1 (k is the current length of the graph)
June 7th, 2006
ICAPS'06 Tutorial T6
12
How do Level-Based Heuristics Break?
P0
q
A0
B1
B2
B3
B99
B100
P1
q
p1
p2
p3
p99
p100
A1
B1
B2
B3
B99
B100
A
P2
q
p1
p2
p3
p99
p100
g
The goal g is reached at level 2, but
requires 101 actions to support it.
June 7th, 2006
ICAPS'06 Tutorial T6
13
Relaxed Plan Heuristics
 When Level does not reflect distance well, we can find a
relaxed plan.
 A relaxed plan is subgraph of the planning graph, where:
 Every goal proposition is in the relaxed plan at the level where it
first appears
 Every proposition in the relaxed plan has a supporting action in the
relaxed plan
 Every action in the relaxed plan has its preconditions supported.
 Relaxed Plans are not admissible, but are generally
effective.
 Finding the optimal relaxed plan is NP-hard, but finding a
greedy one is easy. Later we will see how “greedy” can
change.
June 7th, 2006
ICAPS'06 Tutorial T6
14
Example of Relaxed Plan Heuristic
Support Goal Propositions
Individually
June 7th, 2006
Count Actions
RP(sI, G) = 8
ICAPS'06 Tutorial T6
Identify Goal Propositions
15
Results
Relaxed Plan
Level-Based
June 7th, 2006
ICAPS'06 Tutorial T6
16
Results (cont’d)
Relaxed Plan
Level-Based
June 7th, 2006
ICAPS'06 Tutorial T6
17
Optimizations in Heuristic Computation
 Taming Space/Time costs
 Bi-level Planning Graph
representation
 Partial expansion of the PG (stop
before level-off)
 It is FINE to cut corners when
using PG for heuristics (instead
of search)!!
Goals C,D are present
Example:
•A
x
•B
•C
 Select actions in lev(S) vs Levels-off
(incomplete)
 Consider action appearing in RP
June 7th, 2006
ICAPS'06 Tutorial T6
(incomplete)
x
x
•D
•E
•E
Heuristic extracted from partial graph vs. leveled graph
100000
Levels-off
Lev(S)
10000
1000
Time(Seconds)
 Branching factor can still be quite high
 Use actions appearing in the PG
(complete)
•C
x
x
•D
•D
x
•B
x
•C
•C
x
ff
•A
•B
Trade-off
•B
x
els o
Lev
•A
•A
•A
•B
Discarded
100
10
1
1
11
21
31
41
51
61
71
81
91
101
111
121
131
0.1
Problems
18
141
151
161
Adjusting for Negative Interactions
 Until now we assumed actions only positively
interact. What about negative interactions?
 Mutexes help us capture some negative
interactions
 Types
 Actions: Interference/Competing Needs
 Propositions: Inconsistent Support
 Binary are the most common and practical
 |A| + 2|P|-ary will allow us to solve the planning
problem with a backtrack-free GraphPlan search
 An action layer may have |A| actions and 2|P| noops
 Serial Planning Graph assumes all non-noop actions are
mutex
June 7th, 2006
ICAPS'06 Tutorial T6
19
Binary Mutexes
P0
A0
avail(soil, )
sample(soil, )
avail(rock, )
drive(, )
avail(image,)
P1
avail(soil, )
sample needs at(),
drive negates at()
--Interference--
sample(soil, )
have(soil) only supporter is mutex
avail(rock, )
with
at() only supporter -drive(, )
Inconsistent
Support-avail(image, )
drive(, )
at()
A1
have(soil) has a supporter not
mutex with a supporter of at(),
P2 2---feasible together at level
avail(soil, )
avail(rock, )
avail(image, )
drive(, )
at()
at()
sample(image,)
have(soil)
have(soil)
sample(rock, )
commun(soil)
have(image)
have(rock)
drive(, )
at()
Set-Level(sI, {at(),.have(soil)}) = 2
Max(sI, {at(),. have(soil)}) = 1
June 7th, 2006
at()
drive(, )
drive(, )
at()
ICAPS'06 Tutorialdrive(,
T6 )
comm(soil)
at()
20
Adjusting the Relaxed Plans
 Start with RP heuristic and
adjust it to take subgoal
interactions into account
 Negative interactions in terms
of “degree of interaction”
 Positive interactions in terms
of co-achievement links
 Ignore negative interactions
when accounting for positive
interactions (and vice versa)
PROBLEM
Level
Sum
AdjSum2M
Gripper-25
-
69/0.98
67/1.57
Gripper-30
-
81/1.63
77/2.83
Tower-7
127/1.28
127/0.95
127/1.37
Tower-9
511/47.91
511/16.04
511/48.45
8-Puzzle1
31/6.25
39/0.35
31/0.69
8-Puzzle2
30/0.74
34/0.47
30/0.74
Mystery-6
-
-
16/62.5
Mistery-9
8/0.53
8/0.66
8/0.49
Mprime-3
4/1.87
4/1.88
4/1.67
Mprime-4
8/1.83
8/2.34
10/1.49
Aips-grid1
14/1.07
14/1.12
14/0.88
Aips-grid2
-
-
34/95.98
HAdjSum2M(S) = length(RelaxedPlan(S)) + max p,qS (p,q)
Where (p,q) = lev({p,q}) - max{lev(p), lev(q)} /*Degree of –ve Interaction */
June 7th, 2006
ICAPS'06 Tutorial T6
[AAAI 2000]
21
Anatomy of a State-space Regression planner
Problem: Given a set of subgoals (regressed state)
estimate how far they are from the initial state
[AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003]
June 7th, 2006
ICAPS'06 Tutorial T6
22
Rover Example in Regression
s4
comm(soil)
s3 comm(soil)
s2 comm(soil)
s1 comm(soil)
sG comm(soil)
avail(rock, )
avail(rock, )
avail(image, )
have(rock)
comm(rock)
comm(rock)
have(image)
at()
have(image)
have(image)
comm(image)
at()
at()
commun(rock)
commun(image)
sample(rock, )
sample(image, )
Should be ∞, s4
is inconsistent,
how do we improve
the heuristic??
Sum(sI, s4) = 0+0+1+1+2=4
Sum(sI, s3) = 0+1+2+2=5
June 7th, 2006
ICAPS'06 Tutorial T6
23
AltAlt Performance
Schedule Domain (AIPS-00)
100000
Level-based
Adjusted RP
10000
Logistics
STAN3.0
HSP2.0
AltAlt1.0
Time(Seconds)
1000
100
10
Scheduling
1
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
0.1
0.01
Problems
Problem sets from IPC 2000
June 7th, 2006
ICAPS'06 Tutorial T6
24
151
161
Plan Space Search
Then it was cruelly
UnPOPped
The good times
return with Re(vived)POP
In the beginning it was all POP.
June 7th, 2006
ICAPS'06 Tutorial T6
25
POP Algorithm
1. Plan Selection: Select a plan P from
the search queue
2. Flaw Selection: Choose a flaw f
(open cond or unsafe link)
3. Flaw resolution:
If f is an open condition,
choose an action S that achieves f
If f is an unsafe link,
choose promotion or demotion
Update P
Return NULL if no resolution exist
4. If there is no flaw left, return P
1. Initial plan:
g1
g2 Sin
S0
f
2. Plan refinement (flaw selection and resolution):
q1
S0
S1
p
oc1 S g2
2
oc2
~p
S3
g1
Sinf
g2
Choice points
• Flaw selection (open condition? unsafe link? Non-backtrack choice)
• Flaw resolution/Plan Selection (how to select (rank) partial plan?)
June 7th, 2006
ICAPS'06 Tutorial T6
26
PG Heuristics for Partial Order Planning
 Distance heuristics to estimate
cost of partially ordered plans
(and to select flaws)
 If we ignore negative
interactions, then the set of
open conditions can be seen as
a regression state
q1 S1
p
S0
S3
g1
S5
S4
q
r
S2
g2
g2
Sinf
~p
 Mutexes used to detect
indirect conflicts in partial
plans
 A step threatens a link if there
is a mutex between the link
condition and the steps’ effect
or precondition
 Post disjunctive
precedences and use
propagation to simplify
June 7th, 2006
ICAPS'06 Tutorial T6
p
Si
Sj
q
Sk
r
if mutex( p, q) or mutex( p, r )
S k  Si  S j  S k
27
Regression and Plan Space
s4
comm(soil)
s3 comm(soil)
s2 comm(soil)
s1 comm(soil)
sG comm(soil)
avail(rock, )
avail(rock, )
avail(image, )
have(rock)
comm(rock)
comm(rock)
have(image)
at()
have(image)
have(image)
comm(image)
at()
at()
commun(rock)
commun(image)
sample(rock, )
sample(image, )
at()
S0 avail(soil,)
avail(rock,)
avail(image, )
avail(rock, )
at()
have(rock)
S3
have(rock)
sample(rock, )
S2
comm(rock)
commun(rock)
have(image)
comm(soil)
comm(rock) S1
comm(image)
S1
comm(image)
commun(image)
at()
S0 avail(soil,)
avail(rock,)
avail(image, )
June 7th, 2006
avail(rock, )
at()
have(rock)
S3
have(rock)
sample(rock, )
S2
comm(rock)
commun(rock)
have(image)
avail(image, )
at()
comm(soil)
comm(rock) S1
comm(image)
S1
comm(image)
S4
commun(image)
have(image)
ICAPS'06 Tutorial T6
sample(image, )
28
RePOP’s Performance
 RePOP implemented on
top of UCPOP
 Dramatically better than
any other partial order
planner before it
 Competitive with
Graphplan and AltAlt
 VHPOP carried the
torch at ICP 2002
Problem
UCPOP
RePOP
Graphplan
AltAlt
Gripper-8
-
1.01
66.82
.43
Gripper-10
-
2.72
47min
1.15
Gripper-20
-
81.86
-
15.42
Rocket-a
-
8.36
75.12
1.02
Rocket-b
-
8.17
77.48
1.29
Logistics-a
-
3.16
306.12
1.59
Logistics-b
-
2.31
262.64
1.18
Logistics-c
-
22.54
-
4.52
Logistics-d -
91.53
-
20.62
Bw-large-a
45.78
(5.23) -
14.67
4.12
Bw-large-b
-
(18.86) -
122.56
14.14
(137.84) -
-
116.34
Bw-large-c -
Written in Lisp, runs on Linux, 500MHz, 250MB
You see, pop, it is possible to Re-use all the old POP work!
[IJCAI, 2001]
June 7th, 2006
ICAPS'06 Tutorial T6
29
Exploiting Planning Graphs
 Restricting Action Choice
 Use actions from:




Last level before level off (complete)
Last level before goals (incomplete)
First Level of Relaxed Plan (incomplete) – FF’s helpful actions
Only action sequences in the relaxed plan (incomplete) –
YAHSP
 Reducing State Representation
 Remove static propositions. A static proposition is only
ever true or false in the last proposition layer.
June 7th, 2006
ICAPS'06 Tutorial T6
30
Classical Planning Conclusions
 Many Heuristics
 Set-Level, Max, Sum, Relaxed Plans
 Heuristics can be improved by adjustments
 Mutexes
 Useful for many types of search
 Progresssion, Regression, POCL
June 7th, 2006
ICAPS'06 Tutorial T6
31
Cost-Based Planning
June 7th, 2006
ICAPS'06 Tutorial T6
32
Cost-based Planning
 Propagating Cost Functions
 Cost-based Heuristics
 Generalized Level-based heuristics
 Relaxed Plan heuristics
June 7th, 2006
ICAPS'06 Tutorial T6
33
Rover Cost Model
June 7th, 2006
ICAPS'06 Tutorial T6
34
Cost Propagation
P0
avail(soil, )
avail(rock, )
A0
20
sample(soil, )
avail(image,)
at()
10
drive(, )
30
drive(, )
P1
avail(soil, )
avail(rock, )
avail(image, )
at()
20
have(soil)
Cost Reduces because
Of different supporter
At a later level
June 7th, 2006
10
at()
30
at()
ICAPS'06 Tutorial T6
A1
20
sample(soil, )
10
drive(, )
30
drive(, )
35
sample(image,)
35
sample(rock, )
25
commun(soil)
25
drive(, )
15
drive(, )
35
drive(, )
40
drive(, )
P2
avail(soil, )
avail(rock, )
avail(image, )
at()
20
have(soil)
35
have(image)
35
have(rock)
10
at()
25
comm(soil)
25
at()
35
Cost Propagation (cont’d)
P
avail(soil,2)
avail(rock, )
avail(image, )
at()
20
have(soil)
35
have(image)
35
have(rock)
10
at()
25
comm(soil)
25
at()
June 7th, 2006
A2
20
sample(soil, )
10
drive(, )
30
drive(, )
30
sample(image,)
35
sample(rock, )
25
commun(soil)
25
drive(, )
15
drive(, )
30
drive(, )
40
drive(, )
40
commun(image)
40
commun(rock)
P
3
avail(soil,
)
avail(rock, )
avail(image, )
at()
20
have(soil)
30
have(image)
35
have(rock)
10
at()
25
comm(soil)
25
at()
40
comm(image)
40
comm(rock)
ICAPS'06 Tutorial T6
1-lookahead
A3
20
sample(soil, )
10
drive(, )
30
drive(, )
30
sample(image,)
35
sample(rock, )
25
commun(soil)
25
drive(, )
15
drive(, )
30
drive(, )
40
drive(, )
35
commun(image)
40
commun(rock)
P
4
avail(soil,
)
avail(rock, )
avail(image, )
at()
20
have(soil)
30
have(image)
35
have(rock)
10
at()
25
comm(soil)
25
at()
35
comm(image)
40
comm(rock)
36
Terminating Cost Propagation
 Stop when:
 goals are reached (no-lookahead)
 costs stop changing (∞-lookahead)
 k levels after goals are reached (k-lookahead)
June 7th, 2006
ICAPS'06 Tutorial T6
37
Guiding Relaxed Plans with Costs
Start Extract at first level
goal proposition is cheapest
June 7th, 2006
ICAPS'06 Tutorial T6
38
Cost-Based Planning Conclusions
 Cost-Functions:
 Remove false assumption that level is
correlated with cost
 Improve planning with non-uniform cost
actions
 Are cheap to compute (constant overhead)
June 7th, 2006
ICAPS'06 Tutorial T6
39
Partial Satisfaction (Over-Subscription)
Planning
June 7th, 2006
ICAPS'06 Tutorial T6
40
Partial Satisfaction Planning
 Selecting Goal Sets
 Estimating goal benefit
 Anytime goal set selection
 Adjusting for negative interactions between
goals
June 7th, 2006
ICAPS'06 Tutorial T6
41
Partial Satisfaction (Oversubscription) Planning
In many real world planning tasks, the agent often has
more goals than it has resources to accomplish.
Example: Rover Mission Planning (MER)
Need automated support for
Over-subscription/Partial Satisfaction
Planning
Actions have execution
costs, goals have utilities,
and the objective is to find
the plan that has the highest
net benefit.
PLAN EXISTENCE
PLAN LENGTH
PSP GOAL
PSP GOAL LENGTH
PLAN COST
PSP UTILITY
PSP NET BENEFIT
PSP UTILITY COST
June 7th, 2006
ICAPS'06 Tutorial T6
42
SapaPS: Anytime Forward Search
• General idea:
– Search incrementally while looking for better
solutions
• Return solutions as they are found…
– If we reach a node with h value of 0, then we
know we can stop searching (no better
solutions can be found)
June 7th, 2006
ICAPS'06 Tutorial T6
43
Background: Cost Propagation
P0
A0
20
sample(soil, )
10
drive(, )
at()
30
drive(, )
P1
avail(soil, )
avail(rock, )
avail(image, )
at()
20
have(soil)
Cost Reduces because
Of different supporter
At a later level
June 7th, 2006
10
at()
30
at()
ICAPS'06 Tutorial T6
A1
20
sample(soil, )
10
drive(, )
30
drive(, )
35
sample(image,)
35
sample(rock, )
25
commun(soil)
25
drive(, )
15
drive(, )
35
drive(, )
40
drive(, )
P2
avail(soil, )
avail(rock, )
avail(image, )
at()
20
have(soil)
35
have(image)
35
have(rock)
10
at()
25
comm(soil)
25
at()
44
Adapting PG heuristics for PSP
0
 Challenges:
5
 Need to propagate costs on
the planning graph
 The exact set of goals are not
clear
0
5
4
l=0
4
4
l=1
0
5
0
12
8
4
l=2
Idea: Select a subset of the top
level goals upfront
 Challenge: Goal interactions
Action Templates
Cost-sensitive
Planning
Graph
Actions in the
Last Level
Problem Spec
(Init, Goal state)
Goal Set selection
Algorithm
Cost sensitive
Search
June 7th, 2006
5
0
3
 Interactions between goals
 Obvious approach of

considering all 2n goal subsets
is infeasible
Graphplan
Plan Extension Phase
(based on STAN)
+
Cost Propagation
0
Solution Plan
Extraction of
Heuristics
Heuristics
 Approach: Estimate the net
benefit of each goal in terms of its
utility minus the cost of its relaxed
plan
 Bias the relaxed plan
extraction to (re)use the
actions already chosen for
other goals
ICAPS'06 Tutorial T6
45
Goal Set Selection In Rover Problem
Found By
RP
50-25 = 25
comm(soil)
60-40 = 20
comm(rock)
50-25 = 25
110-65 = 45
comm(rock)
20-35 = -15
comm(image)
Found By
Cost
Propagation
Found By
Biased RP
70-60 = 20
comm(image)
130-100 = 30
comm(image)
June 7th, 2006
ICAPS'06 Tutorial T6
46
SAPAPS (anytime goal selection)
 A* Progression search
 g-value: net-benefit of plan so far
 h-value: relaxed plan estimate of best goal set
 Relaxed plan found for all goals
 Iterative goal removal, until net benefit does not
increase
 Returns plans with increasing g-values.
June 7th, 2006
ICAPS'06 Tutorial T6
47
Some Empirical Results for AltAltps
Exact algorithms based on MDPs don’t scale at all
[AAAI 2004]
June 7th, 2006
ICAPS'06 Tutorial T6
48
Adjusting for Negative Interactions (AltWlt)
 Problem:
 What if the apriori goal set is not achievable because of
negative interactions?
 What if greedy algorithm gets bad local optimum?
 Solution:
 Do not consider mutex goals
 Add penalty for goals whose relaxed plan has mutexes.
 Use interaction factor to adjust cost, similar to adjusted sum
heuristic
 maxg1, g2 2 G {lev(g1, g2) – max(lev(g1), lev(g2)) }
 Find Best Goal set for each goal
June 7th, 2006
ICAPS'06 Tutorial T6
49
Goal Utility Dependencies
BAD
GOOD
Cost: 50
Utility: 0
Cost: 100
Utility: 500
High-res photo utility: 150
High-res photo utility: 150
Soil sample utility: 100
Low-res photo utility: 100
June 7th, 2006
T6 goals: Remove 80
Both goals: AdditionalICAPS'06
200 Tutorial
Both
50
Partial Satisfaction Planning
(PSP)
Agent has more goals than it has resources to accomplish
Objective: Find a plan with
highest net benefit
– Goals have utility
– Actions have cost
(i.e. maximize (utility – cost))
Challenges:
Unclear as to which goals to achieve
Interactions exist between goals
June 7th, 2006
ICAPS'06 Tutorial T6
51
Dependencies
goal interactions exist as two distinct types
cost dependencies
Goals share actions in the plan trajectory
Defined by the plan
Exists in classical planning, but
is a larger issue in PSP
All PSP Planners
AltWlt, Optiplan, SapaPS
June 7th, 2006
utility dependencies
Goals may interact in the utility they give
Explicitly defined by user
No need to consider this in
classical planning
SPUDS
Our planner based on
PS
Sapa
ICAPS'06 Tutorial
T6
52
Defining Goal Dependencies
Follow the General Additive Independence (GAI) Model (Bacchus & Grove)
Goals listed with additive utility
Goal set
Utility over set
June 7th, 2006
ICAPS'06 Tutorial T6
53
Bacchus, F., and Grove, A. 1995. Graphical models for preference and utility. In Proc. of UAI-95.
Defining Goal Dependencies
(have-image high-res loc1)
150
(have left-shoe)
(have right-shoe)
+500
(have-image low-res loc1)
100
(have-soil loc1)
100
(have-image high-res loc1)
(have-soil loc1)
+200
(have-image high-res loc1)
(have-image low-res loc1)
-80
June 7th, 2006
ICAPS'06 Tutorial T6
54
Defining Goal Dependencies
(we’ve seen so far…)
(but what about…)
Mutual Dependency
Conditional Dependency
Utility is different than the sum of
utility of goals
Utility of set of goals dependent
upon whether other goals are
already achieved
transforms
into
June 7th, 2006
ICAPS'06 Tutorial T6
55
SapaPS
Example of a relaxed plan (RP) in a rovers domain
a1: Move(l1,l2)
a2: Calibrate(camera)
a3: Sample(l2)
a4: Take_high_res(l2)
a5: Take_low_res(l2)
a6: Sample(l1)
g1: Sample(l1)
g2: Sample(l2)
g3: HighRes(l2)
g4: lowRes(l2)
RP as a tree
g2
ca3= 40
g1
ca6=40
At(l1)
ca1=50
ca4=40
At(l2)
g3
Calibrated
ca2 =20
ca5=25
g4
Anytime Forward Search
•
•
Propagate costs on relaxed planning graph
Using the RP as a tree lets us remove
supporting actions as we remove goals
• We remove goals whose relaxed cost
outweighs
June
7th, 2006 utility
ICAPS'06 Tutorial T6
0
0
5
0
5
0
5
3
4
l=0
4
4
l=1
0
5
0
12
8
4
56
l=2
SPUDS
Example of a relaxed plan (RP) in a rovers domain
a1: Move(l1,l2)
a2: Calibrate(camera)
a3: Sample(l2)
a4: Take_high_res(l2)
a5: Take_low_res(l2)
a6: Sample(l1)
g1: Sample(l1)
g2: Sample(l2)
g3: HighRes(l2)
g4: lowRes(l2)
RP as a tree
g1
ca6=40
At(l1)
ca1=50
ca4=40
At(l2)
g3
Calibrated
ca2 =20
ca5=25
g4
In SapaPS goals are
removed independently (and in pairs)
If cost > utility, the goal is removed.
…but now we have more complicated goal interactions.
New challenge:
Goals no longer independent
June
7th,do
2006
Tutorial T6
How
we select the best set ICAPS'06
of goals?
Idea:
Use declarative
programming
57
Heuristically Identifying Goals
a1: Move(l1,l2)
a2: Calibrate(camera)
a3: Sample(l2)
a4: Take_high_res(l2)
a5: Take_low_res(l2)
a6: Sample(l1)
g1: Sample(l1)
g2: Sample(l2)
g3: HighRes(l2)
g4: lowRes(l2)
g2
ca3= 40
g1
ca6=40
At(l1)
ca1=50
ca4=40
At(l2)
g3
Calibrated
ca2 =20
ca5=25
g4
We can encode all goal sets and actions
June 7th, 2006
ICAPS'06 Tutorial T6
58
Heuristically Identifying Goals
We encode the relaxed plan
Binary Variables
goal utility dependencies
goals
actions
Constraints
Select actions for goals
Select goals in dependency
Select goals in dependency
Objective function:
where
June 7th, 2006
gives the utility given by the goal set
,
cost of
gives all goals action is involved in reaching
ICAPS'06 Tutorial T6
and
59
Heuristically Identifying Goals
a1: Move(l1,l2)
a2: Calibrate(camera)
a3: Sample(l2)
a4: Take_high_res(l2)
a5: Take_low_res(l2)
a6: Sample(l1)
g1: Sample(l1)
g2: Sample(l2)
g3: HighRes(l2)
g4: lowRes(l2)
g2
ca3= 40
g1
ca6=40
At(l1)
ca1=50
ca4=40
At(l2)
g3
Calibrated
ca2 =20
ca5=25
g4
We can now remove combinations of goals that give inoptimal
relaxed solutions
We are finding optimal goals given the (inadmissible) relaxed plan
June 7th, 2006
ICAPS'06 Tutorial T6
60
Experimental Domains
•Goal utilities randomly generated
•Goals “hard” or “soft”
•Goal dependencies randomly generated
number of dependencies
their size
the set of goals involved
their utility value
•Upper bound on the number of dependencies is 3 x number of goals
•Compared SPUDS to SapaPS
SapaPS modified to calculate goal utility dependencies at each state
June 7th, 2006
Domain
Cost
ZenoTravel
Money spent
Satellite
Energy spent
ICAPS'06 Tutorial T6
61
Results
ZenoTravel
120000
SPUDS
SapaPS
100000
benefit
80000
60000
40000
20000
0
1
2
3
4
5
6
7
8
9
10
11
12
13
problems
June 7th, 2006
ICAPS'06 Tutorial T6
62
Results
Only in this instance does
SapaPS do slightly better
Satellite
160000
SPUDS
SapaPS
140000
120000
benefit
100000
80000
60000
40000
20000
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-20000
-40000
problems
June 7th, 2006
ICAPS'06 Tutorial T6
63
Results
Our experiments show that the ILP encoding takes upwards of 200x
PS procedure per node. Yet it is still doing better.
more time
Sapais
Ourthan
newthen
heuristic
better informed about cost dependencies
PS
Sapa heuristic lacks the ability
to take
utility dependency into account.
Why?
between
goals.
Obvious
A little less obvious
June 7th, 2006
ICAPS'06 Tutorial T6
64
Example Run
100000
SPUDS
SapaPS
90000
80000
net benefit
70000
60000
50000
40000
30000
20000
10000
0
0
June 7th, 2006
10000
20000
30000
time (ms)
ICAPS'06 Tutorial T6
40000
50000
60000
65
Summary
• Goal utility dependency in GAI framework
• Choosing goals at each search node
• ILP-based heuristic enhances search
guidance
– cost dependencies
– utility dependencies
June 7th, 2006
ICAPS'06 Tutorial T6
66
Future Work
• Comparing with other planners (IP)
– Results promising so far (better quality in less
time)
• Include qualitative preference model as in
PrefPlan (Brafman & Chernyavsky)
• Take into account residual cost as in AltWlt
(Sanchez & Kambhampati)
• Extend into PDDL3 preference model
June 7th, 2006
ICAPS'06 Tutorial T6
67
Questions
June 7th, 2006
ICAPS'06 Tutorial T6
68
The Problem with Plangraphs [Smith, ICAPS 04]
0
0
5
5
0
1
0
1
5
0
5
1
0
2
5
2
1
7
5
3
1
7 5
0
5 2
0
3 3
1
0
Assume independence between objectives
For rover: all estimates from starting location
June 7th, 2006
ICAPS'06 Tutorial T6
69
Approach
– Construct orienteering
problem
– Solve it
– Use as search guidance
June 7th, 2006
ICAPS'06 Tutorial T6
70
Orienteering Problem
TSP variant
– Given:
network of cities
rewards in various cities
finite amount of gas
– Objective:
collect as much reward as possible
before running out of gas
June 7th, 2006
ICAPS'06 Tutorial T6
71
Orienteering Graph
0
3
1
Loc1
7
Loc0
5
Sample1
Image1
3
0
4
2
8
3
Rover
3
6
Loc2
1
5
Sample3
Loc3
3
Image2
0
June 7th, 2006
ICAPS'06 Tutorial T6
72
The Big Question:
How do we determine which propositions
go in the orienteering graph?
Propositions that:
are changed in achieving one goal
impact the cost of another goal
Sensitivity analysis
June 7th, 2006
ICAPS'06 Tutorial T6
73
Recall: Plan Graph
0
0
L0
7
L0
Im1
M0,1
1
I1
3
6
7
M0,2
M1,2
8
M0,3
Sa1
L1
6
M1,3
2
S1
L1
L2
M2,1 2
1
I2
Im2
L3
M2,3
8
10
7
4
L2
8
L0
0
3
L3
6
7
9
11
S3
3
June 7th, 2006
ICAPS'06 Tutorial T6 Sa3
74
Sensitivity Analysis

L0
L0
L0
Im1
M0,1
0
Sa1
L1
M0,2
I1
S1
L1
M1,2
M1,3
M0,3
L2
L2
M2,1
I2
Im2
L3
L3
M2,3
June 7th, 2006
ICAPS'06 Tutorial T6 Sa3
S3
75
Sensitivity Analysis


L0
L0
7
Im1
M0,1
1
I1
3
0
6
M0,2
M1,2

8
Image1
2
1

4
3
8
Sample3
Rover
3
6
Loc
June 7th, 22006
Image2
1
I2
Im2
L3
M2,3
Loc3
1
L1
L2
Sample1
Loc1
S1

1
8 1
3
0
4
M2,1 2
3
Loc0
M1,3
2
L2
M0,3
7
Sa1
L1
L0
3
L3
S3
2
62

4
94

3
ICAPS'06 Tutorial T6 Sa3
76
Basis Set Algorithm
For each goal:
Construct a relaxed plan
For each net effect of relaxed plan:
Reset costs in PG
Set cost of net effect to 0
Set cost of mutex initial conditions to 
Compute revised cost estimates
If significantly different,
add net effect to basis set
June 7th, 2006
ICAPS'06 Tutorial T6
77
25 Rocks
June 7th, 2006
ICAPS'06 Tutorial T6
78
PSP Conclusions
 Goal Set Selection
 Apriori for Regression Search
 Anytime for Progression Search
 Both types of search use greedy goal insertion/removal
to optimize net-benefit of relaxed plans
 Orienteering Problem
 Interactions between goals apparent in OP
 Use solution to OP as heuristic
 Planning Graphs help define OP
June 7th, 2006
ICAPS'06 Tutorial T6
79
Planning with Resources
June 7th, 2006
ICAPS'06 Tutorial T6
80
Planning with Resources
 Propagating Resource Intervals
 Relaxed Plans
 Handling resource subgoals
June 7th, 2006
ICAPS'06 Tutorial T6
81
Rover with power Resource
Resource Usage,
Same as costs for
This example
June 7th, 2006
ICAPS'06 Tutorial T6
82
Resource Intervals
P0
A0
avail(soil, )
P1
avail(soil, )
avail(rock, )
5
avail(image, )
recharge
recharge
[25,25]
power
avail(rock, )
avail(image, )
avail(image, )
at()
at()
[-5,50]
power
30
Resource Interval assumes independence have(soil)
among Consumers/Producers
UB: UB(noop) + recharge = 25 + 25 = 50 at()
LB: LB(noop) + sample + drive = 25 + (-20) + (-10) = -5
sample(rock, )
commun(soil)
ICAPS'06 Tutorial T6
[-115,75]
power
5
have(soil)
drive(, )
at()
drive(, )
UB: UB(noop) + recharge = 50 + 25 = 75
LB: LB(noop) + drive + sample + drive + sample + commun + drive + drive =
-5 + (-30) + (-20) + (-10) + (-25) + (-5) + (-15) + (-5) = -115
June 7th, 2006
avail(soil, )
drive(, )
drive(, )
at()
drive(,)
P2
sample(soil, )
sample(soil, )
avail(rock, )
A1
have(rock)
comm(soil)
83
at()
Resource Intervals (cont’d)
P2
A2
avail(soil, )
avail(rock, )
avail(image, )
drive(,)
sample(soil, )
drive(, )
P3
avail(soil, )
avail(rock, )
avail(image, )
recharge
at()
[-115,75]
power
5
have(soil)
at()
sample(rock, )
commun(soil)
drive(, )
drive(, )
A3
drive(,)
sample(soil, )
drive(, )
at()
[-250,100]
power
5
have(soil)
at()
sample(rock, )
drive(, )
drive(, )
drive(, )
[-390,125]
power
have(soil)
at()
have(image)
drive(, )
drive(, )
comm(soil)
drive(, )
at()
at()
commun(image)
June 7th, 2006
at()
have(rock)
comm(soil)
commun(rock)
avail(image, )
commun(soil)
have(image)
at()
avail(rock, )
sample(image,)
sample(image,)
comm(soil)
avail(soil, )
recharge
have(rock)
have(rock)
P4
ICAPS'06 Tutorial T6
comm(rock)
commun(rock)
comm(image)
84
comm(rock)
Relaxed Plan Extraction with Resources
Start Extraction as before
Track “Maximum” Resource
Requirements For actions
chosen at each level
May Need more than One Supporter
For a resource!!
June 7th, 2006
ICAPS'06 Tutorial T6
85
Results
June 7th, 2006
ICAPS'06 Tutorial T6
86
Results (cont’d)
June 7th, 2006
ICAPS'06 Tutorial T6
87
Planning With Resources Conclusion
 Resource Intervals allow us to be optimistic
about reachable values
 Upper/Lower bounds can get large
 Relaxed Plans may require multiple
supporters for subgoals
 Negative Interactions are much harder to
capture
June 7th, 2006
ICAPS'06 Tutorial T6
88
Temporal Planning
June 7th, 2006
ICAPS'06 Tutorial T6
89
Temporal Planning
 Temporal Planning Graph
 From Levels to Time Points
 Delayed Effects
 Estimating Makespan
 Relaxed Plan Extraction
June 7th, 2006
ICAPS'06 Tutorial T6
90
Rover with Durative Actions
June 7th, 2006
ICAPS'06 Tutorial T6
91
SAPA
Planning Problem
f can have both
Select state with
Cost & Makespan
lowest f-value
components
Generate
start state
Queue of TimeStamped states
No
Satisfies
Goals?
Expand state
by applying
actions
Build RTPG
Propagate Cost
functions
Heuristic
Extract relaxed plan estimation
Adjust for
Mutexes; Resources
Yes
Partialize the
p.c. plan
Return
o.c and p.c plans
[ECP 2001; AIPS 2002; ICAPS 2003; JAIR 2003]
June 7th, 2006
ICAPS'06 Tutorial T6
92
Search through time-stamped states
 Goal Satisfaction:
S=(P,M,,Q,t)  G if <pi,ti> G
Set <pi,ti> of
predicates pi and the
time of their last
achievement ti < t.
Set of protected
persistent conditions
(could be binary or resource conds).
S=(P,M,,Q,t)
either:
  <pi,tj>  P, tj < ti and no event in Q
deletes pi.
  e  Q that adds pi at time te < ti.
Set of functions represent
resource values.
 Action Application:
 All instantaneous preconditions of A are
satisfied by P and M.
 A’s effects do not interfere with  and Q.
 No event in Q interferes with persistent
preconditions of A.
 A does not lead to concurrent resource
change
 When A is applied to S:
Event queue (contains resource as well
As binary fluent events).
(in-city ?airplane ?city1)
Action A is applicable in S if:
Time stamp of S.
(in-city ?airplane ?city2)
consume (fuel ?airplane)
Flying
(in-city ?airplane ?city1)
(fuel ?airplane) > 0
Search:
Pick a state S from the queue.
If S satisfies the goals, end
 P is updated according to A’s
instantaneous effects.
Else non-deterministically do one of
 Persistent preconditions of A are put in 
--Advance the clock
 Delayed effects of A are put in Q.
(by executing the earliest event
in Qs
June 7th, 2006
ICAPS'06 Tutorial T6
93
--Apply one of the applicable actions to S
Temporal Planning
drive(, )
drive(, )
drive(, )
drive(, )
drive(, )
sample(rock, )
commun(rock)
commun(soil)
drive(, )
sample(soil, )
sample(image, )
commun(image)
at()
avail(soil,)
avail(rock,)
avail(image,)
Assume Latest Start
Time for actions in RP
Record First time
Point Action/Proposition
is First Reachable
comm(image)
have(image)
comm(soil)
at()
have(soil)
June 7th, 2006
0
10
20
30
comm(rock)
have(rock)
at()
ICAPS'06 Tutorial T6
40
50
60
70
80
94
90
100
Planning Problem
Select state with
lowest f-value
Generate
start state
Queue of TimeStamped states
SAPA at IPC-2002
Satellite (complex setting)
Satisfies
Goals?
Expand state
by applying
actions
Build RTPG
Propagate Cost
functions
Heuristic
Extract relaxed plan estimation
Adjust for
Mutexes; Resources
Satellite (complex setting)
Rover (time setting)
June 7th, 2006
No
f can have both
Cost & Makespan
components
ICAPS'06 Tutorial T6
Rover (time setting)
[JAIR 2003]
95
Yes
Partialize the
p.c. plan
Return
o.c and p.c plans
Temporal Planning Conclusion
 Levels become Time Points
 Makespan and plan length/cost are different
objectives
 Set-Level heuristic measures makespan
 Relaxed Plans measure makespan and plan
cost
June 7th, 2006
ICAPS'06 Tutorial T6
96
Non-Deterministic Planning
June 7th, 2006
ICAPS'06 Tutorial T6
97
Non-Deterministic Planning




Belief State Distance
Multiple Planning Graphs
Labelled Uncertainty Graph
Implicit Belief states and the CFF heuristic
June 7th, 2006
ICAPS'06 Tutorial T6
98
Conformant Rover Problem
June 7th, 2006
ICAPS'06 Tutorial T6
99
Search in Belief State Space
avail(soil, )
at()
have(soil)
sample(soil, )
avail(soil, )
drive(, )
avail(soil, )
at()
have(soil)
avail(soil, )
at()
avail(soil, )
at()
avail(soil, )
avail(soil, )
at()
at()
at()
avail(soil, )
at()
sample(soil, )
avail(soil, )
at()
have(soil)
avail(soil, )
at()
have(soil)
avail(soil, )
at()
drive(, )
drive(, )
avail(soil, )
avail(soil, )
at()
at()
drive(, )
avail(soil, )
at()
sample(soil, )
avail(soil, )
at()
avail(soil, )
at()
have(soil)
avail(soil, )
avail(soil, )
at()
at()
drive(b, )
June 7th, 2006
ICAPS'06 Tutorial T6
100
Belief State Distance
BS1
Compute Classical Planning Distance Measures
Assume can reach closest goal state
Aggregate State Distances
BS3
30
15
P1
5
10
P2 BS2
20
Max = 15, 20
40
Sum = 20, 20
Union = 17, 20
June 7th, 2006
ICAPS'06 Tutorial T6
[ICAPS 2004]
101
Belief State Distance
BS1
BS3
15
P1
5
Estimate Plans for each state pair
a1 a3 a4 a5 a8 a44 a27
a14 a22 a37 a19 a50 a80 a34 a11
a19
a13
a19 a3 a4
Capture Positive Interaction &
Independence
a5 a13
Union = 17
June 7th, 2006
ICAPS'06 Tutorial T6
[ICAPS 2004]
102
State Distance Aggregations
Over-estimates
[Bryce et.al, 2005]
Max
Sum
Union
Just Right
Under-estimates
June 7th, 2006
ICAPS'06 Tutorial T6
103
Extract Relaxed Plans from each
Multiple Planning Graphs
P0
avail(soil, )
at()
avail(soil, )
at()
A0
3
drive(, )
drive(, )
sample(soil, )
drive(, )
drive(, )
P1
avail(soil, )
drive(, )
at()
at()
at()
drive(, )
sample(soil, )
have(soil)
commun(soil)
avail(soil, )
at()
at()
at()
h=7
A1
3
drive(, )
drive(, )
sample(soil, )
drive(, )
drive(, )
drive(, )
drive(, )
avail(soil, )
avail(soil, )
at()
drive(, )
drive(, )
at()
drive(, )
at()
sample(soil, )
at()
drive(, )
drive(, )
drive(, )
drive(, )
Build A planning Graph
For each State in the belief state
June 7th, 2006
drive(, )
ICAPS'06 Tutorial T6
Step-wise union relaxed plans
P2
avail(soil, )
A2
1
P3
at()
at()
at()
comm(soil)
have(soil)
avail(soil, )
at()
at()
at()
have(soil)
avail(soil, )
at()
at()
at()
have(soil)
drive(, )
drive(, )
sample(soil, )
drive(, )
drive(, )
drive(, )
drive(, )
commun(soil)
drive(, )
drive(, )
sample(soil, )
drive(, )
drive(, )
drive(, )
drive(, )
commun(soil)
avail(soil, )
at()
at()
at()
have(soil)
comm(soil)
avail(soil, )
at()
at()
at()
have(soil)
comm(soil)
104
Labelled Uncertainty Graph
P0
A0
P1
A1
avail(soil, )
avail(soil, )
Action labels are the
Conjunction (intersection)
of their Precondition labels
P2
A2
P3
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
sample(soil, )
have(soil)
at() Æ
( avail(soil, ) Ç
avail(soil, ) Ç
avail(soil,)
)Æ
: at() Æ …
sample(soil, )
Labels correspond
To sets of states, and
Are represented as
Propositional
formulas
June 7th, 2006
sample(soil, )
have(soil)
sample(soil, )
comm(soil)
sample(soil, )
commun(soil)
commun(soil)
drive(, )
drive(, )
at()
drive(, )
have(soil)
sample(soil, )
drive(, )
at()
sample(soil, )
comm(soil)
at()
at()
drive(, )
at()
drive(, )
at()
drive(, )
drive(, )
at()
at()
at()
drive(,are
) the
Effect labels
Disjunction (union)
drive(, )
of ICAPS'06
supporterTutorial
labels T6
drive(, )
at()
drive(,
Stop
when)goal is
Labeled
drive(,with
) every
state
105
drive(, )
Labelled Relaxed Plan
P0
A0
P1
A1
avail(soil, )
avail(soil, )
P2
A2
P3
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
sample(soil, )
have(soil)
sample(soil, )
have(soil)
sample(soil, )
h=6
sample(soil, )
drive(, )
at()
comm(soil)
sample(soil, )
commun(soil)
drive(, )
drive(, )
comm(soil)
at()
at()
drive(, )
at()
drive(, )
at()
drive(, )
drive(, )
drive(, )
ICAPS'06 Tutorial T6
drive(, )
at()
drive(, )
at()
at()
at()
Subgoals and Supporters
Need not be used for
EveryJune
state
7th,where
2006 they
are reached (labeled)
have(soil)
sample(soil, )
commun(soil)
at()
drive(, )
sample(soil, )
drive(,
Must
Pick )Enough
Supporters
drive(, )to cover
the (sub)goals
106
drive(, )
Comparison of Planning Graph Types
[JAIR, 2006]
LUG saves Time
Sampling more worlds,
Increases cost of MG
Sampling more worlds,
Improves effectiveness of LUG
June 7th, 2006
ICAPS'06 Tutorial T6
107
State Agnostic Planning Graphs (SAG)
 LUG represents multiple explicit planning graphs
 SAG uses LUG to represent a planning graph for
every state
 The SAG is built once per search episode and we
can use it for relaxed plans for every search node,
instead of building a LUG at every node
 Extract relaxed plans from SAG by ignoring
planning graph components not labeled by states
in our search node.
June 7th, 2006
ICAPS'06 Tutorial T6
108
SAG
Build a LUG for all states (union of all belief states)
G
G
1
3
3
5
1
5
oG
1
3
5
o12
o34
o56
oG
1
2
3
4
5
6
o12
o23
o34
o45
o56
o67
G
oG
1
2
3
4
5
6
7
 Ignore irrelevant labels
 Largest LUG == all LUGs
June 7th, 2006
ICAPS'06 Tutorial T6
[ AAAI, 2005]
109
CFF (implicit belief states + PG)
at()
drive(, )
sample(soil, )
at()
avail(soil, )
drive(, )
at()
avail(soil, )
at()
SAT Solver
drive(, )
avail(soil, )
at()
sample(soil, )
at()
at()
drive(, )
Planning
Graph
drive(b, )
June 7th, 2006
ICAPS'06 Tutorial T6
110
Belief Space Problems
Conformant
June 7th, 2006
Classical Problems
More Results at the IPC!!!
Conditional
ICAPS'06 Tutorial T6
111
Conditional Planning
 Actions have Observations
 Observations branch the plan because:
 Plan Cost is reduced by performing less “just in case”
actions – each branch performs relevant actions
 Sometimes actions conflict and observing determines
which to execute (e.g., medical treatments)
 We are ignoring negative interactions
 We are only forced to use observations to remove
negative interactions
 Ignore the observations and use the conformant relaxed
plan
 Suitable because the aggregate search effort over all plan
branches is related to the conformant relaxed plan cost
June 7th, 2006
ICAPS'06 Tutorial T6
112
Non-Deterministic Planning Conclusions
 Measure positive interaction and
independence between states cotransitioning to the goal via overlap
 Labeled planning graphs and CFF SAT
encoding efficiently measure conformant plan
distance
 Conformant planning heuristics work for
conditional planning without modification
June 7th, 2006
ICAPS'06 Tutorial T6
113
Stochastic Planning
June 7th, 2006
ICAPS'06 Tutorial T6
114
Stochastic Rover Example
June 7th, 2006
ICAPS'06 Tutorial T6
[ICAPS 2006]
115
Search in Probabilistic Belief State Space
0.04
0.04
avail(soil, )
avail(soil, )
at()
at()
sample(soil, )
0.4
avail(soil, )
at()
0.5
0.5
0.1
avail(soil, )
at()
avail(soil, )
0.1
0.36
avail(soil, )
at()
have(soil)
0.36
drive(, )
avail(soil, )
at()
0.5
avail(soil, )
0.1
at()
avail(soil, )
at()
have(soil)
sample(soil, )
0.04
at()
at()
0.4
avail(soil, )
at()
have(soil)
avail(soil, )
at()
0.36
avail(soil, )
0.05
avail(soil, )
at()
0.45
avail(soil, )
at()
have(soil)
at()
drive(, )
drive(, )
avail(soil, )
0.1
avail(soil, )
at()
avail(soil, )
at()
drive(, )
0.5
avail(soil, )
at()
avail(soil, )
0.1
June 7th, 2006
at()
ICAPS'06 Tutorial T6
116
Handling Uncertain Actions
[ICAPS 2006]
 Extending LUG to handle uncertain actions
requires label extension that captures:
 State uncertainty (as before)
 Action outcome uncertainty
 Problem: Each action at each level may have a
different outcome. The number of uncertain events
grows over time – meaning the number of joint
outcomes of events grows exponentially with time
 Solution: Not all outcomes are important. Sample
some of them – keep number of joint outcomes
constant.
June 7th, 2006
ICAPS'06 Tutorial T6
117
Monte Carlo LUG (McLUG)
[ICAPS 2006]
 Use Sequential Monte Carlo in the Relaxed
Planning Space
 Build several deterministic planning graphs by
sampling states and action outcomes
 Represent set of planning graphs using LUG
techniques
 Labels are sets of particles
 Sample which Action outcomes get labeled with
particles
 Bias relaxed plan by picking actions labeled with
most particles to prefer more probable support
June 7th, 2006
ICAPS'06 Tutorial T6
118
McLUG for rover example
P0
A0
P1
A1
P2
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
sample(soil, )
have(soil)
[ICAPS 2006]
sample(soil, )
A2
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
have(soil)
sample(soil, )
sample(soil, )
at()
comm(soil)
commun(soil)
commun(soil)
drive(, )
drive(, )
at()
drive(, )
drive(, )
drive(, )
drive(, )
at()
at()
at()
June 7th, 2006
at()
at()
at()
Particles in action
Label must sample
action outcome
at()
at()
drive(, )
Sample States for
Initial layer
-- avail(soil, )
not sampled
have(soil)
sample(soil, )
comm(soil)
drive(, )
P3
drive(, )
drive(, )
drive(, )
drive(, )
¾ of particles
¼ of particles
Support goal, need drive(, )Support goal,
Okay to stop
At least ½
drive(, )
ICAPS'06 Tutorial T6
119
Monte Carlo LUG (McLUG)
P0
A0
P1
A1
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
sample(soil, )
have(soil)
sample(soil, )
P2
A2
avail(soil, )
avail(soil, )
avail(soil, )
avail(soil, )
have(soil)
sample(soil, )
sample(soil, )
at()
comm(soil)
commun(soil)
commun(soil)
drive(, )
drive(, )
at()
drive(, )
at()
at()
drive(, )
at()
drive(, )
at()
at()
drive(, )
drive(, )
at()
at()
at()
drive(, )
June 7th, 2006
have(soil)
sample(soil, )
comm(soil)
drive(, )
P3
Support
drive(, )
Preconditions
for
Particles
the)
drive(,
Action supports
ICAPS'06 Tutorial T6
drive(, )
Pick Persistence
Must Support
By default, drive(, )Goal in All
Commun covers
Particles
Other particlesdrive(, )
120
Logistics Domain Results
P2-2-2 time (s)
P2-2-2 length
June 7th, 2006
P4-2-2 time (s)
[ICAPS 2006]
P2-2-4 time (s)
Scalable, w/P4-2-2
reasonable
qualityP2-2-4 length
length
ICAPS'06 Tutorial T6
121
Grid Domain Results
[ICAPS 2006]
Grid(0.8) time(s)
Grid(0.5) time(s)
Again, good scalability and quality!
Grid(0.8) length
Need More Particles for broad beliefs
Grid(0.5) length
June 7th, 2006
ICAPS'06 Tutorial T6
122
Direct Probability Propagation
 Alternative to label propagation, we can
propagate numeric probabilities
 Problem: Numeric Propagation tends to assume
only independence or positive interaction
between actions and propositions.
 With probability, we can vastly under-estimate the
probability of reaching propositions
 Solution: Propagate Correlation – measures
pair-wise independence/pos interaction/neg
interaction
 Can be seen as a continuous mutex
June 7th, 2006
ICAPS'06 Tutorial T6
123
Correlation
 C(x, y) = Pr(x, y)/(Pr(x)Pr(y))
 If :





C(x, y) = 0, then x, y are mutex
0< C(x, y) < 1, then x, y interfere
C(x, y) = 1, then x, y are independent
1< C(x, y) < 1/Pr(x), then x, y synergize
C(x, y) = 1/Pr(x) = 1/Pr(y), then x, y are
completely correlated
June 7th, 2006
ICAPS'06 Tutorial T6
124
Probability of a set of Propositions
 Pr(x1, x2, …, xn) = i=1..n Pr(xi|x1…xi-1)
 Pr(xi|x1…xi-1) = Pr(x1… xi-1 | xi) Pr(xi)
Pr(x1 … xi-1)
 Pr(x1|xi) … Pr(xi-1|xi)Pr(xi)
Pr(x1) … Pr(xi-1)
= Pr(xi|x1) … Pr(xi|xi-1) Pr(xi)
Pr(xi)
Pr(xi)
= Pr(xi) C(xi, x1)…C(xi, xi-1)
= Pr(xi) j = 1..i-1C(xi, xj)
Chain Rule
Bayes Rule
Assume
Independence
Bayes Rule
Correlation
 Pr(x1, x2, …, xn) = i=1..n Pr(xi) j = 1..i-1C(xi, xj)
June 7th, 2006
ICAPS'06 Tutorial T6
125
Probability Propagation
 The probability of an Action being enabled is the
probability of its preconditions (a set of
propositions).
 The probability of an effect is the product of the
action probability and outcome probability
 A single (or pair of) proposition(s) has probability
equal to the probability it is given by the best set
of supporters.
 The probability that a set of supporters gives a
proposition is the sum over the probability of all
possible executions of the actions.
June 7th, 2006
ICAPS'06 Tutorial T6
126
Results
June 7th, 2006
ICAPS'06 Tutorial T6
127
Stochastic Planning Conclusions
 Number of joint action outcomes too large
 Sampling outcomes to represent in labels is
much faster than exact representation
 SMC gives us a good way to use multiple
planning graph for heuristics, and the
McLUG helps keep the representation small
 Numeric Propagation of probability can
better capture interactions with correlation
 Can extend to cost and resource propagation
June 7th, 2006
ICAPS'06 Tutorial T6
128
Hybrid Planning Models
June 7th, 2006
ICAPS'06 Tutorial T6
129
Hybrid Models
 Metric-Temporal w/ Resources (SAPA)
 Temporal Planning Graph w/ Uncertainty
(Prottle)
 PSP w/ Resources (SAPAMPS)
 Cost-based Conditional Planning (CLUG)
June 7th, 2006
ICAPS'06 Tutorial T6
130
Propagating Temporal Cost Functions
Drive-car(Tempe,LA)
Tempe
Hel(T,P)
Airplane(P,LA)
Airplane(P,LA)
Shuttle(T,P)
L.A
Phoenix
t=0

t = 0.5
t=1
t = 1.5
t = 2.0
t = 10
Shuttle(Tempe,Phx):
Cost: $20; Time: 1.0 hour
Helicopter(Tempe,Phx):
Cost: $100; Time: 0.5 hour
Car(Tempe,LA):
Cost: $100; Time: 10 hour
Airplane(Phx,LA):
Cost: $200; Time: 1.0 hour
$300
$220
$100
$20
0
0.5 1 1.5
2
Cost(At(LA))
June 7th,
2006
10
time
Cost(At(Phx))
= Cost(Flight(Phx,LA))
ICAPS'06
Tutorial T6
131
Heuristics based on cost functions
Using Relaxed Plan
Direct
 If we want to minimize
makespan:
 h = t0
 If we want to minimize cost
 h = CostAggregate(G, t)
 If we want to minimize a
function f(time,cost) of cost
and makespan
 h = min f(t,Cost(G,t)) s.t. t0
 t  t
 E.g. f(time,cost) =
100.makespan + Cost then h
= 100x2 + 220 at t0  t = 2
 t

Extract a relaxed plan using h
as the bias

If the objective function is
f(time,cost), then action A ( to
be added to RP) is selected
such that:
f(t(RP+A),C(RP+A)) +
f(t(Gnew),C(Gnew))
is minimal
Gnew = (G  Precond(A)) \ Effects)
cost
Time of Earliest
achievement

$300
Cost(At(LA))
$220
$100 Time of lowest cost
0
June 7th, 2006
ICAPS'06 Tutorial T6
t0=1.5
2
t = 10
time
132
Phased Relaxation
The relaxed plan can be adjusted to take into account
constraints that were originally ignored
Adjusting for Mutexes:
Adjust the make-span estimate of the relaxed plan by
marking actions that are mutex (and thus cannot be
executed concurrently
Adjusting for Resource Interactions:
Estimate the number of additional resource-producing
actions needed to make-up for any resource short-fall
in the relaxed plan
C = C + R  (Con(R) – (Init(R)+Pro(R)))/R * C(AR)
June 7th, 2006
ICAPS'06 Tutorial T6
133
Handling Cost/Makespan Tradeoffs
Cost variation
Planning Problem
Select state with
lowest f-value
Generate
start state
f can have both
Cost & Makespan
components
Makespan variation
60
Queue of TimeStamped states
Satisfies
Goals?
No
Expand state
by applying
actions
Build RTPG
Propagate Cost
functions
Heuristic
Extract relaxed plan estimation
Adjust for
Mutexes; Resources
Total Cost
50
Yes
Partialize the
p.c. plan
40
30
20
10
0
Return
o.c and p.c plans
0.1
0.2
0.3
0.4
0.5
0.6
0
0.8
0.9
0.95
Alpha
Results over 20 randomly generated
temporal logistics problems involve
moving 4 packages between different
locations in 3 cities:
O = f(time,cost) = .Makespan + (1- ).TotalCost
June 7th, 2006
1
ICAPS'06 Tutorial T6
134
Prottle
 SAPA-style (time-stamped states and event
queues) search for fully-observable
conditional plans using L-RTDP
 Optimize probability of goal satisfaction
within a given, finite makespan
 Heuristic estimates probability of goal
satisfaction in the plan suffix
June 7th, 2006
ICAPS'06 Tutorial T6
135
Prottle planning graph
Probabilistic
Outcomes at
Different Times
Time Stamped
Propositions
June 7th, 2006
ICAPS'06 Tutorial T6
136
Probability Back-Propagation
Vector of probability
Costs for each top-level
Goal:
1 – Pr(Satisfy gi)
Outcome cost is max of
affected proposition
costs
Action Cost is
expectation of outcome
costs
Proposition cost is
Product of supported
Action costs
Heuristic is product of
Relevant Node costs
June 7th, 2006
ICAPS'06 Tutorial T6
137
Prottle Results
June 7th, 2006
ICAPS'06 Tutorial T6
138
PSP w/ Resources
 Utility and Cost based on the values of
resources
 Challenges:
 Need to propagate cost for resource intervals
 Need to support resource goals at different
levels
June 7th, 2006
ICAPS'06 Tutorial T6
139
Resource Cost Propagation
 Propagate reachable values with cost
Move(Waypoint1)
Sample_Soil
Sample_Soil
Communicate
v1:
cost(
):
June 7th, 2006
0
[0,0]
1
[0,1]
2
[0,2]
0
1
2
ICAPS'06 Tutorial T6
2.5
A range of
possible values
Cost of achieving
each value bound
140
Cost Propagation on Variable Bounds
Sample_Soil
Effect: v1+=1
 Bound cost dependent
upon
 action cost
 previous bound cost
- current bound cost
adds to the next
 Cost of all bounds in
expressions
Sample_Soil
v1:
[0,0]
Sample_Soil
[0,1]
[0,2]
Cost(v1=2)
C(Sample_Soil)+Cost(v1=1)
Sample_Soil
Effect: v1+=v2
Sample_Soil
v1:
[0,0]
v2:
[0,3]
Sample_Soil
[0,3]
[0,6]
Cost(v1=6)
C(Sample_Soil)+Cost(v2=3)+Cost(v1=3)
June 7th, 2006
ICAPS'06 Tutorial T6
141
Results – Modified Rovers (numeric soil)
Average improvement: 3.06
June 7th, 2006
ICAPS'06 Tutorial T6
142
Anytime A* Search Behavior
June 7th, 2006
ICAPS'06 Tutorial T6
143
Results – Modified Logistics (#of packages)
Average
improvement: 2.88
June 7th, 2006
ICAPS'06 Tutorial T6
144
Cost-Based Conditional Planning
 Actions may reduce uncertainty, but cost a lot
 Do we want more “just in case” actions that are cheap,
or less that are more expensive
 Propagate Costs on the LUG (CLUG)
 Problem: LUG represents multiple explicit planning
graphs and the costs can be different in each planning
graph.
 A single cost for every explicit planning assumes full positive
interaction
 Multiple costs, one for each planning graph is too costly
 Solution: Propagate cost for partitions of the explicit
planning graphs
June 7th, 2006
ICAPS'06 Tutorial T6
145
Cost Propagation on the LUG (CLUG)
s
0
0
:s
:r
0(B)
B
0
:s
0
C
s
10
s
0
10
0
:s
20
0(C)
:r
0
R
0
:r
20
0(C)
:s
7
0(R)
0
24
10
0
R
r
17
7
0(R)
0
:r
:r
: s:r
s:r
:r
June 7th, 2006
0
C
10
0
27
r
0
B
10
0(B)
s
0
Relaxed Plan (bold)
Cost = c(B) + C(R) = 10 + 7 = 17
ICAPS'06 Tutorial T6
146
The Medical Specialist
[Bryce & Kambhampati, 2005]
High Cost Sensors
Low Cost Sensors
Using Propagated Costs Improves Plan Quality,
Average Path Cost
Total Time
Without much additional time
June 7th, 2006
ICAPS'06 Tutorial T6
147
Overall Conclusions
 Relaxed Reachability Analysis
 Concentrate strongly on positive interactions and
independence by ignoring negative interaction
 Estimates improve with more negative interactions
 Heuristics can estimate and aggregate costs of
goals or find relaxed plans
 Propagate numeric information to adjust estimates
 Cost, Resources, Probability, Time
 Solving hybrid problems is hard
 Extra Approximations
 Phased Relaxation
 Adjustments/Penalties
June 7th, 2006
ICAPS'06 Tutorial T6
148
Why do we love PG Heuristics?
 They work!
 They are “forgiving”


You don't like doing mutex? okay
You don't like growing the graph all the way? okay.
 Allow propagation of many types of information
 Level, subgoal interaction, time, cost, world support, probability
 Support phased relaxation
 E.g. Ignore mutexes and resources and bring them back later…
 Graph structure supports other synergistic uses
 e.g. action selection
 Versatility…
June 7th, 2006
ICAPS'06 Tutorial T6
149
Versatility of PG Heuristics
 PG Variations




Serial
Parallel
Temporal
Labelled
 Planning Problems
 Classical
 Resource/Temporal
 Conformant
June 7th, 2006
 Propagation Methods




Level
Mutex
Cost
Label
 Planners




ICAPS'06 Tutorial T6
Regression
Progression
Partial Order
Graphplan-style
150
Download