Planning Graph Based Reachability Heuristics

advertisement
Planning Graph Based
Reachability Heuristics
Daniel Bryce & Subbarao Kambhampati
ICAPS’06 Tutorial 6
June 7, 2006
http://rakaposhi.eas.asu.edu/pg-tutorial/
dan.bryce@asu.edu
rao@asu.edu,
http://verde.eas.asu.edu
http://rakaposhi.eas.asu.edu
Motivation
ƒ Ways to improve Planner Scalability
ƒ Problem Formulation
ƒ Search Space
ƒ Reachability Heuristics
ƒ Domain (Formulation) Independent
ƒ Work for many search spaces
ƒ Flexible – work with most domain features
ƒ Overall compliment other scalability techniques
ƒ Effective!!
June 7th, 2006
ICAPS'06 Tutorial T6
2
Scalability of Planning
Problem is Search Control!!!
ƒ Before, planning
algorithms could
synthesize about 6
– 10 action plans in
minutes
ƒ Significant scaleup in the last 6-7
years
Realistic encodings
of Munich airport!
ƒ Now, we can
synthesize 100
action plans in
seconds.
The primary revolution in planning in the recent years has been
domain-independent heuristics to scale up plan synthesis
June 7th, 2006
ICAPS'06 Tutorial T6
3
Topics
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Classical Planning Rao
Cost Based Planning
Dan
Partial Satisfaction Planning
Resources (Continuous Quantities)
Rao
Temporal Planning
Non-Deterministic/Probabilistic Planning
Hybrid Models
June 7th, 2006
ICAPS'06 Tutorial T6
Dan
4
Classical Planning
June 7th, 2006
ICAPS'06 Tutorial T6
5
ICAPS'06 Tutorial T6
6
Rover Domain
June 7th, 2006
Classical Planning
ƒ Relaxed Reachability Analysis
ƒ Types of Heuristics
ƒ Level-based
ƒ Relaxed Plans
ƒ Mutexes
ƒ Heuristic Search
ƒ Progression
ƒ Regression
ƒ Plan Space
ƒ Exploiting Heuristics
June 7th, 2006
ICAPS'06 Tutorial T6
7
Planning Graph and Search Tree
ƒ
Envelope of Progression Tree
(Relaxed Progression)
ƒ Proposition lists: Union of
states at kth level
ƒ Lowerbound reachability
June 7th, 2006
information
ICAPS'06 Tutorial T6
8
Level Based Heuristics
ƒ The distance of a proposition is the index of the
first proposition layer in which it appears
ƒ Proposition distance changes when we propagate cost
functions – described later
ƒ What is the distance of a Set of propositions??
ƒ Set-Level: Index of first proposition layer where all
goal propositions appear
ƒ Admissible
ƒ Gets better with mutexes, otherwise same as max
ƒ Max: Maximum distance proposition
ƒ Sum: Summation of proposition distances
June 7th, 2006
ICAPS'06 Tutorial T6
9
Example of Level Based Heuristics
set-level(sI, G) = 3
max(sI, G) = max(2, 3, 3) = 3
sum(sI, G) = 2 + 3 + 3 = 8
June 7th, 2006
ICAPS'06 Tutorial T6
10
Distance of a Set of Literals
Prop list
Level 0
Action list
Level 0
At(0,0)
noop
At(0,0)
Action list
Level 1
Prop list
Level 1
x
Move(0,0,0,1)
x
xx
Move(0,0,1,0)
Key(0,1)
noop
At(1,0)
key(0,1)
…...
noop
noop
At(0,0)
Move(0,1,1,1)
At(0,1)
Prop list
Level 2
…...
x
Move(1,0,1,1)
noop
Pick_key(0,1)
x
noop
x
At(0,1)
xx
x
At(1,1)
x
x
x
At(1,0)
x
Have_key
~Key(0,1)
Key(0,1)
Admissible
x
x
Mutexes
h(S) = ∑p∈S lev({p})
Sum
Partition-k
ƒ
ƒ
h(S) = lev(S)
Set-Level
Adjusted Sum
Set-Level
with memos
Combo
lev(p) : index of the first level at which p comes into the planning graph
lev(S): index of the first level where all props in S appear non-mutexed.
ƒ
If there is no such level, then
If the graph is grown to level off, then ∞
Else k+1 (k is the current length of the graph)
June 7th, 2006
ICAPS'06 Tutorial T6
11
How do Level-Based Heuristics Break?
P0
q
A0
B1
B2
B3
B99
B100
P1
q
p1
p2
p3
p99
p100
A1
B1
B2
B3
B99
B100
A
P2
q
p1
p2
p3
p99
p100
g
The goal g is reached at level 2, but
requires 101 actions to support it.
June 7th, 2006
ICAPS'06 Tutorial T6
12
Relaxed Plan Heuristics
ƒ When Level does not reflect distance well, we can find a
relaxed plan.
ƒ A relaxed plan is subgraph of the planning graph, where:
ƒ Every goal proposition is in the relaxed plan at the level where it
first appears
ƒ Every proposition in the relaxed plan has a supporting action in the
relaxed plan
ƒ Every action in the relaxed plan has its preconditions supported.
ƒ Relaxed Plans are not admissible, but are generally
effective.
ƒ Finding the optimal relaxed plan is NP-hard, but finding a
greedy one is easy. Later we will see how “greedy” can
change.
June 7th, 2006
ICAPS'06 Tutorial T6
13
Example of Relaxed Plan Heuristic
Support Goal Propositions
Individually
June 7th, 2006
Count Actions
RP(sI, G) = 8
ICAPS'06 Tutorial T6
Identify Goal Propositions
14
Results
Relaxed Plan
Level-Based
June 7th, 2006
ICAPS'06 Tutorial T6
15
Results (cont’d)
Relaxed Plan
Level-Based
June 7th, 2006
ICAPS'06 Tutorial T6
16
Optimizations in Heuristic Computation
ƒ Taming Space/Time costs
ƒ Bi-level Planning Graph
representation
ƒ Partial expansion of the PG (stop
before level-off)
ƒ It is FINE to cut corners when
using PG for heuristics (instead
of search)!!
Goals C,D are present
Example:
•A
•B
•C
ƒ Select actions in lev(S) vs Levels-off
(incomplete)
ƒ Consider action appearing in RP
June 7th, 2006
ICAPS'06 Tutorial T6
(incomplete)
•E
•E
100000
Levels-off
Lev(S)
Time(Seconds)
1000
100
10
1
1
11
21
31
41
51
61
71
81
91
101
111
121
131
0.1
Problems
17
Adjusting for Negative Interactions
ƒ Until now we assume actions only positively
interact, but they often conflict
ƒ Mutexes help us capture some negative
interactions
ƒ Types
ƒ Actions: Interference/Competing Needs
ƒ Propositions: Inconsistent Support
ƒ Binary are the most common and practical
ƒ |A| + 2|P|-ary will allow us to solve the planning
problem with a backtrack-free GraphPlan search
ƒ An action layer may have |A| actions and 2|P| noops
ƒ Serial Planning Graph assumes all non-noop actions are
mutex
June 7th, 2006
ICAPS'06 Tutorial T6
x
x
•D
Heuristic extracted from partial graph vs. leveled graph
10000
ƒ Branching factor can still be quite high
ƒ Use actions appearing in the PG
(complete)
•C
x
x
•D
•D
x
•B
•C
•C
x
ff
•A
x
•B
Trade-off
•B
x
•B
els o
Le v
•A
•A
•A
x
Discarded
18
141
151
161
Binary Mutexes
P0
A0
avail(soil, α)
sample(soil, α)
avail(soil, α)
A1
avail(soil, α)
sample(soil, α)
have(soil) only supporter is
mutex with at(β) only supporter
drive(α, β)
--Inconsistent
Support-avail(image,
γ)
avail(rock, β)
avail(rock, β)
avail(rock, β)
avail(image,γ)
P1
have(soil) has a supporter not
mutex with a supporter of at(β),
P2 2---feasible together at level
drive(α, β)
drive(α, γ)
avail(image, γ)
drive(α, γ)
at(α)
at(α)
sample needs at(α),
drive negates at(α)
--Interference--
at(α)
sample(image,γ)
have(soil)
have(soil)
sample(rock, β)
have(image)
commun(soil)
have(rock)
drive(β, γ)
at(β)
drive(β, α)
Set-Level(sI, {at(Β),.have(soil)}) = 2
Max(sI, {at(Β),. have(soil)}) = 1
June 7th, 2006
at(β)
comm(soil)
drive(γ, α)
at(γ)
ICAPS'06 Tutorialdrive(γ,
T6 β)
at(γ)
19
Adjusting the Relaxed Plans
ƒ Start with RP heuristic and
adjust it to take subgoal
interactions into account
ƒ Negative interactions in terms
of “degree of interaction”
ƒ Positive interactions in terms
of co-achievement links
ƒ Ignore negative interactions
when accounting for positive
interactions (and vice versa)
PROBLEM
Level
Sum
AdjSum2M
Gripper-25
-
69/0.98
67/1.57
Gripper-30
-
81/1.63
77/2.83
Tower-7
127/1.28
127/0.95
127/1.37
Tower-9
511/47.91
511/16.04
511/48.45
8-Puzzle1
31/6.25
39/0.35
31/0.69
8-Puzzle2
30/0.74
34/0.47
30/0.74
Mystery-6
-
-
16/62.5
8/0.49
Mistery-9
8/0.53
8/0.66
Mprime-3
4/1.87
4/1.88
4/1.67
Mprime-4
8/1.83
8/2.34
10/1.49
Aips-grid1
14/1.07
14/1.12
14/0.88
Aips-grid2
-
-
34/95.98
HAdjSum2M(S) = length(RelaxedPlan(S)) + max p,q∈S δ(p,q)
Where δ(p,q) = lev({p,q}) - max{lev(p), lev(q)} /*Degree of –ve Interaction */
June 7th, 2006
ICAPS'06 Tutorial T6
[AAAI 2000]
20
Anatomy of a State-space Regression planner
Action Templates
Planning
Graph
Graphplan Graph
Extension Phase
(based on STAN)
Actions in the
Last Level
Extraction of
Heuristics
Heuristic
Regression Planner
(based on HSP-R)
Problem Specification
(Initial and Goal State)
Problem: Given a set of subgoals (regressed state)
estimate how far they are from the initial state
[AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003]
June 7th, 2006
ICAPS'06 Tutorial T6
21
Rover Example in Regression
s4
comm(soil)
s3 comm(soil)
s2 comm(soil)
s1 comm(soil)
sG comm(soil)
avail(rock, β)
avail(rock, β)
have(rock)
comm(rock)
comm(rock)
avail(image, γ)
have(image)
have(image)
have(image)
comm(image)
at(β)
at(β)
at(γ)
commun(rock)
commun(image)
sample(rock, β)
sample(image, γ)
Should be ∞, s4
is inconsistent,
how do we improve
the heuristic??
Sum(sI, s4) = 0+0+1+1+2=4
June 7th, 2006
Sum(sI, s3) = 0+1+2+2=5
ICAPS'06 Tutorial T6
22
AltAlt Performance
Logistics Domain(AIPS-00).
Schedule Domain (AIPS-00)
100000
10000
100000
STAN3.0
HSP2.0
HSP-r
AltAlt1.0
Level-based
Adjusted RP
Logistics
STAN3.0
HSP2.0
AltAlt1.0
1000
Time(Seconds)
Time(Seconds)
1000
10000
100
10
100
10
Scheduling
1
1
1
0.1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
0.1
0.01
0.01
Problems.
Problems
Problem sets from IPC 2000
June 7th, 2006
ICAPS'06 Tutorial T6
23
Plan Space Search
Then it was cruelly
UnPOPped
The good times
return with Re(vived)POP
In the beginning it was all POP.
June 7th, 2006
ICAPS'06 Tutorial T6
24
POP Algorithm
1. Plan Selection: Select a plan P from
the search queue
2. Flaw Selection: Choose a flaw f
(open cond or unsafe link)
3. Flaw resolution:
If f is an open condition,
choose an action S that achieves f
If f is an unsafe link,
choose promotion or demotion
Update P
Return NULL if no resolution exist
4. If there is no flaw left, return P
1. Initial plan:
g1
g2 Sinf
S0
2. Plan refinement (flaw selection and resolution):
q1
S0
p
S1
oc1
oc2
S2
S3
g1
Sinf
g2
g2
~p
Choice points
• Flaw selection (open condition? unsafe link? Non-backtrack choice)
• Flaw resolution/Plan Selection (how to select (rank) partial plan?)
June 7th, 2006
ICAPS'06 Tutorial T6
25
PG Heuristics for Partial Order Planning
ƒ Distance heuristics to estimate
cost of partially ordered plans
(and to select flaws)
ƒ If we ignore negative
interactions, then the set of
open conditions can be seen as
a regression state
q1 S1
p
S0
ƒ Post disjunctive
precedences and use
propagation to simplify
June 7th, 2006
ICAPS'06 Tutorial T6
g1
S5
S4
q
r
S2
ƒ Mutexes used to detect
indirect conflicts in partial
plans
ƒ A step threatens a link if there
is a mutex between the link
condition and the steps’ effect
or precondition
S3
g2
g2
~p
p
Si
Sinf
Sj
q
Sk
r
if mutex( p, q) or mutex( p, r )
S k p Si ∨ S j p S k
26
Regression and Plan Space
s4
comm(soil)
s3 comm(soil)
s2 comm(soil)
s1 comm(soil)
sG comm(soil)
avail(rock, β)
avail(rock, β)
have(rock)
comm(rock)
comm(rock)
avail(image, γ)
have(image)
have(image)
have(image)
comm(image)
at(β)
at(β)
at(γ)
commun(rock)
commun(image)
sample(rock, β)
sample(image, γ)
at(α)
S0 avail(soil,α)
avail(rock,β)
avail(image, γ)
avail(rock, β)
at(β)
have(rock)
S3
have(rock)
S2
comm(rock)
commun(rock)
sample(rock, β)
have(image)
comm(soil)
comm(rock) S∞
comm(image)
S1
comm(image)
commun(image)
at(α)
S0 avail(soil,α)
avail(rock,β)
avail(image, γ)
June 7th, 2006
avail(rock, β)
at(β)
have(rock)
S3
have(rock)
comm(rock)
commun(rock)
sample(rock, β)
have(image)
avail(image, γ)
at(γ)
S2
comm(soil)
comm(rock) S∞
comm(image)
S1
comm(image)
commun(image)
ICAPS'06 Tutorial T6
sample(image, γ)
S4
have(image)
27
RePOP’s Performance
ƒ RePOP implemented on
top of UCPOP
ƒ Dramatically better than
any other partial order
planner before it
ƒ Competitive with
Graphplan and AltAlt
ƒ VHPOP carried the
torch at ICP 2002
Problem
UCPOP
RePOP
Graphplan
AltAlt
Gripper-8
-
1.01
66.82
.43
Gripper-10
-
2.72
47min
1.15
Gripper-20
-
81.86
-
15.42
Rocket-a
-
8.36
75.12
1.02
Rocket-b
-
8.17
77.48
1.29
Logistics-a
-
3.16
306.12
1.59
Logistics-b
-
2.31
262.64
1.18
Logistics-c
-
22.54
-
4.52
Logistics-d -
91.53
-
20.62
Bw-large-a
45.78
(5.23) -
14.67
4.12
Bw-large-b
-
(18.86) -
122.56
14.14
(137.84) -
-
116.34
Bw-large-c -
Written in Lisp, runs on Linux, 500MHz, 250MB
You see, pop, it is possible to Re-use all the old POP work!
[IJCAI, 2001]
June 7th, 2006
ICAPS'06 Tutorial T6
28
Exploiting Planning Graphs
ƒ Restricting Action Choice
ƒ Use actions from:
ƒ
ƒ
ƒ
ƒ
Last level before level off (complete)
Last level before goals (incomplete)
First Level of Relaxed Plan (incomplete) – FF’s helpful actions
Only action sequences in the relaxed plan (incomplete) –
YAHSP
ƒ Reducing State Representation
ƒ Remove static propositions. A static proposition is only
ever true or false in the last proposition layer.
June 7th, 2006
ICAPS'06 Tutorial T6
29
Classical Planning Conclusions
ƒ Many Heuristics
ƒ Set-Level, Max, Sum, Relaxed Plans
ƒ Heuristics can be improved by adjustments
ƒ Mutexes
ƒ Useful for many types of search
ƒ Progresssion, Regression, POCL
June 7th, 2006
ICAPS'06 Tutorial T6
30
Cost-Based Planning
June 7th, 2006
ICAPS'06 Tutorial T6
31
Cost-based Planning
ƒ Propagating Cost Functions
ƒ Cost-based Heuristics
ƒ Generalized Level-based heuristics
ƒ Relaxed Plan heuristics
June 7th, 2006
ICAPS'06 Tutorial T6
32
Rover Cost Model
June 7th, 2006
ICAPS'06 Tutorial T6
33
Cost Propagation
P0
avail(soil, α)
avail(rock, β)
A0
20
sample(soil, α)
avail(image,γ)
at(α)
10
drive(α, β)
30
drive(α, γ)
P1
avail(soil, α)
avail(rock, β)
avail(image, γ)
at(α)
20
have(soil)
Cost Reduces because
Of different supporter
At a later level
June 7th, 2006
10
at(β)
30
at(γ)
ICAPS'06 Tutorial T6
A1
20
sample(soil, α)
10
drive(α, β)
30
drive(α, γ)
35
sample(image,γ)
35
sample(rock, β)
25
commun(soil)
25
drive(β, γ)
15
drive(β, α)
35
drive(γ, α)
40
drive(γ, β)
P2
avail(soil, α)
avail(rock, β)
avail(image, γ)
at(α)
20
have(soil)
35
have(image)
35
have(rock)
10
at(β)
25
comm(soil)
25
at(γ)
34
1-lookahead
Cost Propagation (cont’d)
P2
A2
avail(soil, α)
avail(rock, β)
avail(image, γ)
at(α)
20
have(soil)
35
have(image)
35
have(rock)
10
at(β)
25
comm(soil)
25
at(γ)
June 7th, 2006
20
sample(soil, α)
10
drive(α, β)
30
drive(α, γ)
30
sample(image,γ)
35
sample(rock, β)
25
commun(soil)
25
drive(β, γ)
15
drive(β, α)
30
drive(γ, α)
40
drive(γ, β)
40
commun(image)
40
commun(rock)
P3
avail(soil, α)
avail(rock, β)
avail(image, γ)
at(α)
20
have(soil)
30
have(image)
35
have(rock)
10
at(β)
25
comm(soil)
25
at(γ)
40
comm(image)
40
comm(rock)
ICAPS'06
Tutorial T6
A3
20
sample(soil, α)
10
drive(α, β)
30
drive(α, γ)
30
sample(image,γ)
35
sample(rock, β)
25
commun(soil)
25
drive(β, γ)
P
4
avail(soil,
α)
avail(rock, β)
avail(image, γ)
at(α)
20
have(soil)
30
have(image)
35
have(rock)
10
at(β)
15
drive(β, α)
30
drive(γ, α)
40
drive(γ, β)
35
commun(image)
40
commun(rock)
25
comm(soil)
25
at(γ)
35
comm(image)
40
comm(rock)
35
Terminating Cost Propagation
ƒ Stop when:
ƒ goals are reached (no-lookahead)
ƒ costs stop changing (∞-lookahead)
ƒ k levels after goals are reached (k-lookahead)
June 7th, 2006
ICAPS'06 Tutorial T6
36
Guiding Relaxed Plans with Costs
Start Extract at first level
goal proposition is cheapest
June 7th, 2006
ICAPS'06 Tutorial T6
37
Cost-Based Planning Conclusions
ƒ Cost-Functions:
ƒ Remove false assumption that level is
correlated with cost
ƒ Improve planning with non-uniform cost
actions
ƒ Are cheap to compute (constant overhead)
June 7th, 2006
ICAPS'06 Tutorial T6
38
Partial Satisfaction (Over-Subscription)
Planning
June 7th, 2006
ICAPS'06 Tutorial T6
39
Partial Satisfaction Planning
ƒ Selecting Goal Sets
ƒ Estimating goal benefit
ƒ Anytime goal set selection
ƒ Adjusting for negative interactions between
goals
June 7th, 2006
ICAPS'06 Tutorial T6
40
Partial Satisfaction (Oversubscription) Planning
In many real world planning tasks, the agent often has
more goals than it has resources to accomplish.
Example: Rover Mission Planning (MER)
Need automated support for
Over-subscription/Partial Satisfaction
Planning
PLAN EXISTENCE
PLAN LENGTH
Actions have execution
costs, goals have utilities,
and the objective is to find
the plan that has the highest
net benefit.
PSP GOAL
PSP GOAL LENGTH
PLAN COST
PSP UTILITY
PSP NET BENEFIT
PSP UTILITY COST
June 7th, 2006
ICAPS'06 Tutorial T6
41
Adapting PG heuristics for PSP
0
ƒ Challenges:
5
ƒ Need to propagate costs on
the planning graph
ƒ The exact set of goals are not
clear
0
4
l=0
ƒ Interactions between goals
ƒ Obvious approach of
considering all 2n goal subsets ƒ
is infeasible
Cost-sensitive
Planning
Graph
Actions in the
Last Level
Problem Spec
(Init, Goal state)
Goal Set selection
Algorithm
Cost sensitive
Search
June 7th, 2006
5
0
5
4
4
3
l=1
0
5
0
8
12
4
l=2
Idea: Select a subset of the top
level goals upfront
ƒ Challenge: Goal interactions
Action Templates
Graphplan
Plan Extension Phase
(based on STAN)
+
Cost Propagation
0
Solution Plan
Extraction of
Heuristics
Heuristics
ƒ Approach: Estimate the net
benefit of each goal in terms of its
utility minus the cost of its relaxed
plan
ƒ Bias the relaxed plan
extraction to (re)use the
actions already chosen for
other goals
ICAPS'06 Tutorial T6
42
Goal Set Selection In Rover Problem
Found By
RP
50-25 = 25
comm(soil)
60-40 = 20
comm(rock)
20-35 = -15
comm(image)
50-25 = 25
110-65 = 45
comm(rock)
Found By
Cost
Propagation
Found By
Biased RP
70-60 = 20
comm(image)
130-100 = 30
comm(image)
June 7th, 2006
ICAPS'06 Tutorial T6
43
SAPAPS (anytime goal selection)
ƒ A* Progression search
ƒ g-value: net-benefit of plan so far
ƒ h-value: relaxed plan estimate of best goal set
ƒ Relaxed plan found for all goals
ƒ Iterative goal removal, until net benefit does not
increase
ƒ Returns plans with increasing g-values.
June 7th, 2006
ICAPS'06 Tutorial T6
44
Some Empirical Results for AltAltps
Exact algorithms based on MDPs don’t scale at all
[AAAI 2004]
June 7th, 2006
ICAPS'06 Tutorial T6
45
Adjusting for Negative Interactions (AltWlt)
ƒ Problem:
ƒ What if the apriori goal set is not achievable because of
negative interactions?
ƒ What if greedy algorithm gets bad local optimum?
ƒ Solution:
ƒ Do not consider mutex goals
ƒ Add penalty for goals whose relaxed plan has mutexes.
ƒ Use interaction factor to adjust cost, similar to adjusted sum
heuristic
ƒ maxg1, g2 ∈ G {lev(g1, g2) – max(lev(g1), lev(g2)) }
ƒ Find Best Goal set for each goal
June 7th, 2006
ICAPS'06 Tutorial T6
46
The Problem with Plangraphs [Smith, ICAPS 04]
0
0
5
5
0
1
0
1
5
0
5
1
0
5
2
1
7
5
3
1
7 5
0
5 2
0
3 3
1
0
Assume independence between objectives
For rover: all estimates from starting location
June 7th, 2006
ICAPS'06 Tutorial T6
47
Approach
– Construct orienteering
problem
– Solve it
– Use as search guidance
June 7th, 2006
ICAPS'06 Tutorial T6
48
Orienteering Problem
TSP variant
– Given:
network of cities
rewards in various cities
finite amount of gas
– Objective:
collect as much reward as possible
before running out of gas
June 7th, 2006
ICAPS'06 Tutorial T6
49
Orienteering Graph
0
3
1
Loc1
7
Loc0
5
Sample1
Image1
3
0
4
2
8
3
Rover
3
6
Loc2
1
5
Sample3
Loc3
3
Image2
0
June 7th, 2006
ICAPS'06 Tutorial T6
50
The Big Question:
How do we determine which propositions
go in the orienteering graph?
Propositions that:
are changed in achieving one goal
impact the cost of another goal
Sensitivity analysis
June 7th, 2006
ICAPS'06 Tutorial T6
51
Recall: Plan Graph
0
L0
0
7
L0
Im1
M0,1
1
I1
3
6
M0,2
Sa1
M1,2
8
M0,3
7
L1
6
M1,3
2
S1
L1
L2
M2,1 2
1
I2
Im2
M2,3
0
8
10
7
4
L2
8
L3
L0
3
L3
6
7
9
11
S3
3
June 7th, 2006
ICAPS'06 Tutorial T6 Sa3
52
Sensitivity Analysis
∞
L0
L0
L0
Im1
M0,1
0
I1
S1
Sa1
L1
M0,2
L1
M1,2
M1,3
L2
M0,3
L2
M2,1
I2
Im2
L3
L3
M2,3
S3
ICAPS'06 Tutorial T6 Sa3
June 7th, 2006
53
Sensitivity Analysis
∞
∞
L0
L0
7
Im1
M0,1
1
I1
3
0
L1
6
M0,2
∞
8
1
∞
Image1
2
4
3
8
Sample3
Rover
3
6
Image2
1
I2
Im2
L3
M2,3
Loc3
1
Loc
June 7th, 22006
L1
L2
Sample1
Loc1
S1
∞
1
8 ⇒1
3
0
4
M2,1 2
3
7
M1,3
2
L2
M0,3
Loc0
Sa1
M1,2
L0
3
L3
S3
2
∞
7⇒3
4
∞
11 ⇒ 7
3
ICAPS'06 Tutorial T6 Sa3
54
Basis Set Algorithm
For each goal:
Construct a relaxed plan
For each net effect of relaxed plan:
Reset costs in PG
Set cost of net effect to 0
Set cost of mutex initial conditions to ∞
Compute revised cost estimates
If significantly different,
add net effect to basis set
June 7th, 2006
ICAPS'06 Tutorial T6
55
ICAPS'06 Tutorial T6
56
25 Rocks
June 7th, 2006
PSP Conclusions
ƒ Goal Set Selection
ƒ Apriori for Regression Search
ƒ Anytime for Progression Search
ƒ Both types of search use greedy goal insertion/removal
to optimize net-benefit of relaxed plans
ƒ Orienteering Problem
ƒ Interactions between goals apparent in OP
ƒ Use solution to OP as heuristic
ƒ Planning Graphs help define OP
June 7th, 2006
ICAPS'06 Tutorial T6
57
Planning with Resources
June 7th, 2006
ICAPS'06 Tutorial T6
58
Planning with Resources
ƒ Propagating Resource Intervals
ƒ Relaxed Plans
ƒ Handling resource subgoals
June 7th, 2006
ICAPS'06 Tutorial T6
59
Rover with power Resource
Resource Usage,
Same as costs for
This example
June 7th, 2006
ICAPS'06 Tutorial T6
60
Resource Intervals
P0
A0
avail(soil, α)
P1
avail(soil, α)
sample(soil, α)
avail(rock, β)
A1
drive(α, γ)
sample(soil, α)
avail(rock, β)
avail(rock, β)
avail(image, γ)
avail(image, γ)
recharge
recharge
at(α)
at(α)
at(α)
[-5,50]
power
30
[25,25]
power
5
Resource Interval assumes independence have(soil)
among Consumers/Producers
UB: UB(noop) + recharge = 25 + 25 = 50 at(β)
LB: LB(noop) + sample + drive = 25 + (-20) + (-10) = -5
sample(rock, β)
commun(soil)
[-115,75]
power
5
have(soil)
drive(β, γ)
at(β)
drive(β, α)
UB: UB(noop) + recharge = 50 + 25 = 75
LB: LB(noop) + drive + sample + drive + sample + commun + drive + drive
= -5 + (-30) + (-20) + (-10) + (-25) + (-5) + (-15) + (-5) = -115
June 7th, 2006
avail(soil, α)
drive(α, β)
drive(α, β)
avail(image, γ)
P2
ICAPS'06 Tutorial T6
have(rock)
comm(soil)
61
at(γ)
Resource Intervals (cont’d)
P2
avail(soil, α)
avail(rock, β)
avail(image, γ)
at(α)
[-115,75]
power
5
have(soil)
at(β)
A2
drive(α, γ)
sample(soil, α)
drive(α, β)
recharge
sample(rock, β)
commun(soil)
drive(β, γ)
drive(β, α)
have(rock)
P3
avail(soil, α)
avail(rock, β)
avail(image, γ)
at(α)
[-250,100]
power
5
have(soil)
at(β)
have(rock)
sample(image,γ)
A3
drive(α, γ)
sample(soil, α)
drive(α, β)
recharge
sample(rock, β)
commun(soil)
drive(β, γ)
drive(β, α)
sample(image,γ)
comm(soil)
comm(soil)
drive(γ, β)
drive(γ, α)
commun(rock)
avail(image, γ)
at(α)
[-390,125]
power
have(soil)
at(β)
have(rock)
comm(soil)
at(γ)
commun(image)
June 7th, 2006
avail(rock, β)
drive(γ, β)
at(γ)
at(γ)
avail(soil, α)
have(image)
have(image)
drive(γ, α)
P4
ICAPS'06 Tutorial T6
comm(rock)
commun(rock)
comm(image)
62
comm(rock)
Relaxed Plan Extraction with Resources
Start Extraction as before
Track “Maximum” Resource
Requirements For actions
chosen at each level
May Need more than One Supporter
For a resource!!
June 7th, 2006
ICAPS'06 Tutorial T6
63
ICAPS'06 Tutorial T6
64
Results
June 7th, 2006
Results (cont’d)
June 7th, 2006
ICAPS'06 Tutorial T6
65
Planning With Resources Conclusion
ƒ Resource Intervals allow us to be optimistic
about reachable values
ƒ Upper/Lower bounds can get large
ƒ Relaxed Plans may require multiple
supporters for subgoals
ƒ Negative Interactions are much harder to
capture
June 7th, 2006
ICAPS'06 Tutorial T6
66
Temporal Planning
June 7th, 2006
ICAPS'06 Tutorial T6
67
Temporal Planning
ƒ Temporal Planning Graph
ƒ From Levels to Time Points
ƒ Delayed Effects
ƒ Estimating Makespan
ƒ Relaxed Plan Extraction
June 7th, 2006
ICAPS'06 Tutorial T6
68
Rover with Durative Actions
June 7th, 2006
ICAPS'06 Tutorial T6
69
SAPA
Planning Problem
f can have both
Select state with
Cost & Makespan
lowest f-value
components
Generate
start state
Queue of TimeStamped states
No
Satisfies
Goals?
Expand state
by applying
actions
Build RTPG
Propagate Cost
functions
Heuristic
Extract relaxed plan estimation
Adjust for
Mutexes; Resources
Yes
Partialize the
p.c. plan
Return
o.c and p.c plans
[ECP 2001; AIPS 2002; ICAPS 2003; JAIR 2003]
June 7th, 2006
ICAPS'06 Tutorial T6
70
Search through time-stamped states
ƒ Goal Satisfaction:
S=(P,M,Π,Q,t) ⇒ G if ∀<pi,ti>∈ G
Set <pi,ti> of
predicates pi and the
time of their last
achievement ti < t.
Set of protected
persistent conditions
(could be binary or resource conds).
S=(P,M,Π,Q,t)
either:
ƒ ∃ <pi,tj> ∈ P, tj < ti and no event in Q
deletes pi.
ƒ ∃ e ∈ Q that adds pi at time te < ti.
Set of functions represent
resource values.
ƒ Action Application:
Event queue (contains resource as well
As binary fluent events).
¬(in-city ?airplane ?city1)
Action A is applicable in S if:
ƒ All instantaneous preconditions of A are
satisfied by P and M.
ƒ A’s effects do not interfere with Π and Q.
ƒ No event in Q interferes with persistent
preconditions of A.
ƒ A does not lead to concurrent resource
change
Time stamp of S.
(in-city ?airplane ?city2)
consume (fuel ?airplane)
Flying
(in-city ?airplane ?city1)
(fuel ?airplane) > 0
Search:
Pick a state S from the queue.
If S satisfies the goals, end
ƒ P is updated according to A’s
instantaneous effects.
Else non-deterministically do one of
ƒ Persistent preconditions of A are put in Π
--Advance the clock
ƒ Delayed effects of A are put in Q.
(by executing the earliest event
in Qs
June 7th, 2006
ICAPS'06 Tutorial T6
71
--Apply one of the applicable actions to S
ƒ When A is applied to S:
Temporal Planning
drive(α, β)
drive(α, γ)
drive(β, α)
drive(γ, α)
drive(γ, β)
sample(rock, β)
commun(rock)
commun(soil)
drive(β, γ)
sample(soil, α)
sample(image, γ)
commun(image)
at(α)
comm(image)
avail(soil,α)
avail(rock,β)
avail(image,γ)
Assume Latest Start
Time for actions in RP
Record First time
Point Action/Proposition
is First Reachable
have(image)
comm(soil)
at(γ)
have(soil)
June 7th, 2006
0
10
20
30
comm(rock)
have(rock)
at(β)
ICAPS'06 Tutorial T6
40
50
60
70
80
72
90
100
Planning Problem
Select state with
lowest f-value
Generate
start state
SAPA at IPC-2002
Satellite (complex setting)
Queue of TimeStamped states
Build RTPG
Propagate Cost
functions
Heuristic
Extract relaxed plan estimation
Adjust for
Mutexes; Resources
Satellite (complex setting)
Rover (time setting)
ICAPS'06 Tutorial T6
[JAIR 2003]
73
Temporal Planning Conclusion
ƒ Levels become Time Points
ƒ Makespan and plan length/cost are different
objectives
ƒ Set-Level heuristic measures makespan
ƒ Relaxed Plans measure makespan and plan
cost
June 7th, 2006
Satisfies
Goals?
Expand state
by applying
actions
Rover (time setting)
June 7th, 2006
No
f can have both
Cost & Makespan
components
ICAPS'06 Tutorial T6
74
Yes
Partialize the
p.c. plan
Return
o.c and p.c plans
Non-Deterministic Planning
June 7th, 2006
ICAPS'06 Tutorial T6
75
Non-Deterministic Planning
ƒ
ƒ
ƒ
ƒ
Belief State Distance
Multiple Planning Graphs
Labelled Uncertainty Graph
Implicit Belief states and the CFF heuristic
June 7th, 2006
ICAPS'06 Tutorial T6
76
Conformant Rover Problem
June 7th, 2006
ICAPS'06 Tutorial T6
77
Search in Belief State Space
avail(soil, α)
at(α)
have(soil)
sample(soil, α)
avail(soil, α)
drive(α, β)
avail(soil, α)
at(β)
have(soil)
avail(soil, β)
at(α)
avail(soil, β)
at(β)
avail(soil, γ)
avail(soil, γ)
at(α)
at(β)
at(α)
avail(soil, β)
at(α)
at(β)
at(α)
drive(α, γ)
avail(soil, β)
at(β)
have(soil)
avail(soil, γ)
avail(soil, α)
at(β)
avail(soil, α)
avail(soil, γ)
avail(soil, α)
at(β)
have(soil)
at(β)
drive(α, γ)
drive(α, β)
sample(soil, β)
sample(soil, β)
avail(soil, β)
at(β)
avail(soil, β)
at(β)
have(soil)
avail(soil, γ)
avail(soil, γ)
at(β)
at(β)
drive(b, γ)
June 7th, 2006
ICAPS'06 Tutorial T6
78
Belief State Distance
BS1
Compute Classical Planning Distance Measures
Assume can reach closest goal state
Aggregate State Distances
BS3
30
15
P1
5
10
P2 BS2
20
Max = 15, 20
40
Sum = 20, 20
[ICAPS 2004]
Union = 17, 20
June 7th, 2006
ICAPS'06 Tutorial T6
79
Belief State Distance
BS1
BS3
15
P1
5
Estimate Plans for each state pair
a1 a3 a4
a5 a8 a44 a27
a14 a22 a37 a19 a50 a80 a34 a11
a19
a13
a19 a3 a4
Capture Positive Interaction &
Independence
a5 a13
Union = 17
June 7th, 2006
ICAPS'06 Tutorial T6
[ICAPS 2004]
80
State Distance Aggregations
[Bryce et.al, 2005]
Max
Sum
Union
Over-estimates
Just Right
Under-estimates
June 7th, 2006
ICAPS'06 Tutorial T6
81
Extract Relaxed Plans from each
Multiple Planning Graphs
P0
avail(soil, α)
at(α)
avail(soil, β)
at(α)
A0
3
drive(α, β)
drive(α, γ)
sample(soil, α)
drive(α, β)
drive(α, γ)
h=7
P1
avail(soil, α)
A1
drive(α, β)
at(α)
at(β)
at(γ)
drive(α, γ)
sample(soil, α)
have(soil)
commun(soil)
avail(soil, β)
at(α)
at(β)
at(γ)
3
drive(α, β)
drive(α, γ)
sample(soil, β)
drive(β, α)
drive(β, γ)
drive(γ, α)
drive(γ, β)
avail(soil, γ)
avail(soil, γ)
at(α)
drive(α, β)
drive(α, γ)
at(α)
at(β)
at(γ)
Build A planning Graph
For each State in the belief state
June 7th, 2006
drive(α, β)
drive(α, γ)
sample(soil, γ)
drive(β, α)
drive(β, γ)
drive(γ, α)
drive(γ, β)
ICAPS'06 Tutorial T6
Step-wise union relaxed plans
P2
avail(soil, α)
A2
1
P3
at(α)
at(β)
at(γ)
comm(soil)
have(soil)
avail(soil, β)
at(α)
at(β)
at(γ)
have(soil)
avail(soil, γ)
at(α)
at(β)
at(γ)
have(soil)
drive(α, β)
drive(α, γ)
sample(soil, β)
drive(β, α)
drive(β, γ)
drive(γ, α)
drive(γ, β)
commun(soil)
drive(α, β)
drive(α, γ)
sample(soil, γ)
drive(β, α)
drive(β, γ)
drive(γ, α)
drive(γ, β)
commun(soil)
avail(soil, β)
at(α)
at(β)
at(γ)
have(soil)
comm(soil)
avail(soil, β)
at(α)
at(β)
at(γ)
have(soil)
comm(soil)
82
Labelled Uncertainty Graph
P0
A0
P1
A1
avail(soil, α)
avail(soil, α)
Action labels are the
Conjunction (intersection)
of their Precondition labels
P2
A2
P3
avail(soil, α)
avail(soil, α)
avail(soil, β)
avail(soil, β)
avail(soil, β)
avail(soil, β)
avail(soil, γ)
avail(soil, γ)
avail(soil, γ)
avail(soil, β)
sample(soil, α)
at(α) ∧
( avail(soil, α) ∨
avail(soil, β) ∨
avail(soil,γ)
)∧
¬ at(β) ∧ …
sample(soil, α)
have(soil)
sample(soil, β)
sample(soil, γ)
drive(α, β)
at(α)
drive(α, γ)
Labels correspond
To sets of states, and
Are represented as
Propositional
formulas
June 7th, 2006
drive(α, γ)
at(β)
drive(β, α)
at(γ)
sample(soil, α)
have(soil)
sample(soil, β)
comm(soil)
commun(soil)
drive(α, β)
at(α)
have(soil)
sample(soil, γ)
comm(soil)
commun(soil)
drive(α, β)
at(α)
at(α)
at(β)
at(γ)
drive(α, γ)
at(β)
drive(β, α)
at(γ)
drive(β,
Stop
whenγ)goal is
drive(β,are
γ) the
Effect labels
Disjunction (union)
drive(γ, α)
ofICAPS'06
supporterTutorial
labels T6
Labeled
with every
drive(γ, α)
state
83
drive(γ, β)
drive(γ, β)
Labelled Relaxed Plan
P0
A0
P1
A1
avail(soil, α)
avail(soil, α)
P2
A2
P3
avail(soil, α)
avail(soil, α)
avail(soil, β)
avail(soil, β)
avail(soil, β)
avail(soil, β)
avail(soil, γ)
avail(soil, γ)
avail(soil, γ)
avail(soil, β)
sample(soil, α)
have(soil)
sample(soil, α)
have(soil)
sample(soil, β)
h=6
sample(soil, γ)
drive(α, β)
at(α)
drive(α, γ)
at(β)
drive(α, β)
drive(α, γ)
drive(β, α)
at(γ)
Subgoals and Supporters
Need not be used for
Every
state
June
7th,where
2006 they
are reached (labeled)
drive(β, γ)
drive(γ, α)
ICAPS'06 Tutorial T6
drive(γ, β)
have(soil)
sample(soil, β)
comm(soil)
commun(soil)
at(α)
sample(soil, α)
sample(soil, γ)
comm(soil)
commun(soil)
drive(α, β)
at(α)
at(β)
at(γ)
drive(α, γ)
drive(β, α)
drive(β,
Must
Pickγ)Enough
Supporters
to cover
drive(γ, α)
the (sub)goals
84
drive(γ, β)
at(α)
at(β)
at(γ)
Comparison of Planning Graph Types
[JAIR, 2006]
LUG saves Time
Sampling more worlds,
Increases cost of MG
Sampling more worlds,
Improves effectiveness of LUG
June 7th, 2006
ICAPS'06 Tutorial T6
85
State Agnostic Planning Graphs (SAG)
ƒ LUG represents multiple explicit planning graphs
ƒ SAG uses LUG to represent a planning graph for
every state
ƒ The SAG is built once per search episode and we
can use it for relaxed plans for every search node,
instead of building a LUG at every node
ƒ Extract relaxed plans from SAG by ignoring
planning graph components not labeled by states
in our search node.
June 7th, 2006
ICAPS'06 Tutorial T6
86
SAG
Build a LUG for all states (union of all belief states)
G
G
1
3
oG
1
3
5
o12
3
o34
5
1
5
o56
G
oG
1
2
3
4
5
6
oG
1
o12
2
o23
3
o34
4
o45
5
o56
6
o67
7
ƒ Ignore irrelevant labels
¾ Largest LUG == all LUGs
June 7th, 2006
[ AAAI, 2005]
ICAPS'06 Tutorial T6
87
CFF (implicit belief states + PG)
at(α)
drive(α, β)
sample(soil, α)
at(β)
avail(soil, α)
at(α)
avail(soil, β)
at(α)
drive(α, γ)
SAT Solver
drive(α, β)
avail(soil, γ)
at(α)
sample(soil, β)
at(β)
at(β)
drive(α, γ)
Planning
Graph
drive(b, γ)
June 7th, 2006
ICAPS'06 Tutorial T6
88
Belief Space Problems
Conformant
June 7th, 2006
Classical Problems
More Results at the IPC!!!
Conditional
ICAPS'06 Tutorial T6
89
Conditional Planning
ƒ Actions have Observations
ƒ Observations branch the plan because:
ƒ Plan Cost is reduced by performing less “just in case”
actions – each branch performs relevant actions
ƒ Sometimes actions conflict and observing determines
which to execute (e.g., medical treatments)
ƒ We are ignoring negative interactions
ƒ We are only forced to use observations to remove
negative interactions
ƒ Ignore the observations and use the conformant relaxed
plan
ƒ Suitable because the aggregate search effort over all plan
branches is related to the conformant relaxed plan cost
June 7th, 2006
ICAPS'06 Tutorial T6
90
Non-Deterministic Planning Conclusions
ƒ Measure positive interaction and
independence between states cotransitioning to the goal via overlap
ƒ Labeled planning graphs and CFF SAT
encoding efficiently measure conformant plan
distance
ƒ Conformant planning heuristics work for
conditional planning without modification
June 7th, 2006
ICAPS'06 Tutorial T6
91
Stochastic Planning
June 7th, 2006
ICAPS'06 Tutorial T6
92
Stochastic Rover Example
June 7th, 2006
[ICAPS 2006]
ICAPS'06 Tutorial T6
93
Search in Probabilistic Belief State Space
0.04
sample(soil, α)
0.4
0.5
0.1
avail(soil, α)
at(α)
0.36
0.5
0.1
avail(soil, β)
at(α)
avail(soil, γ)
0.4
drive(α, γ)
0.5
0.1
June 7th, 2006
at(α)
avail(soil, α)
at(α)
have(soil)
0.36
drive(α, β)
0.5
avail(soil, β)
at(α)
0.1
avail(soil, γ)
at(α)
avail(soil, α)
at(β)
avail(soil, α)
at(β)
have(soil)
avail(soil, β)
at(β)
avail(soil, γ)
sample(soil, β)
0.04
0.36
avail(soil, α)
at(β)
avail(soil, α)
at(β)
have(soil)
0.05
avail(soil, β)
at(β)
0.45
avail(soil, β)
at(β)
have(soil)
at(β)
drive(α, γ)
drive(α, β)
at(α)
0.04
avail(soil, α)
0.1
avail(soil, γ)
at(β)
avail(soil, α)
at(β)
avail(soil, β)
at(β)
avail(soil, γ)
at(β)
ICAPS'06 Tutorial T6
94
Handling Uncertain Actions
[ICAPS 2006]
ƒ Extending LUG to handle uncertain actions
requires label extension that captures:
ƒ State uncertainty (as before)
ƒ Action outcome uncertainty
ƒ Problem: Each action at each level may have a
different outcome. The number of uncertain events
grows over time – meaning the number of joint
outcomes of events grows exponentially with time
ƒ Solution: Not all outcomes are important. Sample
some of them – keep number of joint outcomes
constant.
June 7th, 2006
ICAPS'06 Tutorial T6
Monte Carlo LUG (McLUG)
95
[ICAPS 2006]
ƒ Use Sequential Monte Carlo in the Relaxed
Planning Space
ƒ Build several deterministic planning graphs by
sampling states and action outcomes
ƒ Represent set of planning graphs using LUG
techniques
ƒ Labels are sets of particles
ƒ Sample which Action outcomes get labeled with
particles
ƒ Bias relaxed plan by picking actions labeled with
most particles to prefer more probable support
June 7th, 2006
ICAPS'06 Tutorial T6
96
McLUG for rover example
P0
A0
P1
A1
P2
avail(soil, α)
avail(soil, α)
avail(soil, β)
avail(soil, β)
sample(soil, α)
A2
P3
avail(soil, α)
avail(soil, α)
avail(soil, β)
avail(soil, β)
sample(soil, α)
have(soil)
[ICAPS 2006]
sample(soil, α)
have(soil)
sample(soil, β)
sample(soil, β)
comm(soil)
commun(soil)
drive(α, β)
at(α)
drive(α, γ)
drive(α, β)
drive(β, α)
Particles in action
Label must sample
action outcome
June 7th, 2006
drive(α, γ)
at(β)
¾ of particles
¼ of particles
drive(γ,
α)
Support goal,
Support goal, need
Okay to stop
At least ½
drive(γ, β)
drive(γ, α)
drive(γ, β)
ICAPS'06 Tutorial T6
97
[ICAPS 2006]
P4-2-2 time (s)
P2-2-2 time (s)
P2-2-4 time (s)
1000
1000
10000
100
100
1000
10
10
100
1
(16)
(32)
(64)
(128)
(CPplan)
0.1
.3
at(γ)
drive(β, γ)
drive(β, γ)
Logistics Domain Results
.2
at(β)
drive(β, α)
at(γ)
at(γ)
Sample States for
Initial layer
-- avail(soil, γ)
not sampled
at(α)
at(α)
drive(α, γ)
at(β)
.4
.5
P2-2-2 length
.6
1
.7
.8 .9
10
(16)
(32)
(64)
(128)
(CPplan)
0.1
.1
comm(soil)
commun(soil)
drive(α, β)
at(α)
have(soil)
.2
.3
.4
(16)
(32)
(64)
(128)
(CPplan)
1
.5 .6 .7 .8
0.02
0.05
0.1
Scalable, w/P4-2-2
reasonable
qualityP2-2-4 length
length
100
100
70
60
50
40
10
30
20
(16)
(32)
(64)
(128)
(CPplan)
1
.2
.3
June 7th, 2006
.4
.5
.6
(16)
(32)
(64)
(128)
(CPplan)
10
.7
.8 .9
.1
.2
.3
.4
ICAPS'06 Tutorial T6
.5 .6 .7 .8
(16)
(32)
(64)
(128)
(CPplan)
10
0.02
0.05
98
0.1
Grid Domain Results
[ICAPS 2006]
1000
Grid(0.8) time(s)
Grid(0.5) time(s)
1000
(4)
(8)
(16)
(32)
(CPplan)
100
100
10
.2
.3
.4
.5
.6
.7
.8
Again,.1good
scalability
and
quality!
Grid(0.8) length
100
(4)
(8)
(16)
(32)
(CPplan)
(4)
(8)
(16)
(32)
(CPplan)
10
.1
.2
.3
.1
.2
.3
June 7th, 2006
.4
.5
.6
.7
.8
.5
(4)
(8)
(16)
(32)
(CPplan)
10
10
.4
Need More Particles for broad beliefs
Grid(0.5) length
100
.1
.2
.3
.4
.5
ICAPS'06 Tutorial T6
99
Direct Probability Propagation
ƒ Alternative to label propagation, we can
propagate numeric probabilities
ƒ Problem: Numeric Propagation tends to assume
only independence or positive interaction
between actions and propositions.
ƒ With probability, we can vastly under-estimate the
probability of reaching propositions
ƒ Solution: Propagate Correlation – measures
pair-wise independence/pos interaction/neg
interaction
ƒ Can be seen as a continuous mutex
June 7th, 2006
ICAPS'06 Tutorial T6
100
Correlation
ƒ C(x, y) = Pr(x, y)/(Pr(x)Pr(y))
ƒ If :
ƒ
ƒ
ƒ
ƒ
ƒ
C(x, y) = 0, then x, y are mutex
0< C(x, y) < 1, then x, y interfere
C(x, y) = 1, then x, y are independent
1< C(x, y) < 1/Pr(x), then x, y synergize
C(x, y) = 1/Pr(x) = 1/Pr(y), then x, y are
completely correlated
June 7th, 2006
ICAPS'06 Tutorial T6
101
Probability of a set of Propositions
ƒ Pr(x1, x2, …, xn) = Πi=1..n Pr(xi|x1…xi-1)
ƒ Pr(xi|x1…xi-1) = Pr(x1… xi-1 | xi) Pr(xi)
Pr(x1 … xi-1)
≈ Pr(x1|xi) … Pr(xi-1|xi)Pr(xi)
Pr(x1) … Pr(xi-1)
= Pr(xi|x1) … Pr(xi|xi-1) Pr(xi)
Pr(xi)
Pr(xi)
= Pr(xi) C(xi, x1)…C(xi, xi-1)
= Pr(xi) Πj = 1..i-1C(xi, xj)
Chain Rule
Bayes Rule
Assume
Independence
Bayes Rule
Correlation
ƒ Pr(x1, x2, …, xn) = Πi=1..n Pr(xi) Πj = 1..i-1C(xi, xj)
June 7th, 2006
ICAPS'06 Tutorial T6
102
Probability Propagation
ƒ The probability of an Action being enabled is the
probability of its preconditions (a set of
propositions).
ƒ The probability of an effect is the product of the
action probability and outcome probability
ƒ A single (or pair of) proposition(s) has probability
equal to the probability it is given by the best set
of supporters.
ƒ The probability that a set of supporters gives a
proposition is the sum over the probability of all
possible executions of the actions.
June 7th, 2006
ICAPS'06 Tutorial T6
103
ICAPS'06 Tutorial T6
104
Results
June 7th, 2006
Stochastic Planning Conclusions
ƒ Number of joint action outcomes too large
ƒ Sampling outcomes to represent in labels is
much faster than exact representation
ƒ SMC gives us a good way to use multiple
planning graph for heuristics, and the
McLUG helps keep the representation small
ƒ Numeric Propagation of probability can
better capture interactions with correlation
ƒ Can extend to cost and resource propagation
June 7th, 2006
ICAPS'06 Tutorial T6
105
Hybrid Planning Models
June 7th, 2006
ICAPS'06 Tutorial T6
106
Hybrid Models
ƒ Metric-Temporal w/ Resources (SAPA)
ƒ Temporal Planning Graph w/ Uncertainty
(Prottle)
ƒ PSP w/ Resources (SAPAMPS)
ƒ Cost-based Conditional Planning (CLUG)
June 7th, 2006
ICAPS'06 Tutorial T6
107
Propagating Temporal Cost Functions
Drive-car(Tempe,LA)
Tempe
Hel(T,P)
Airplane(P,LA)
Airplane(P,LA)
Shuttle(T,P)
L.A
Phoenix
t=0
∞
t = 0.5
t=1
t = 1.5
t = 2.0
t = 10
Shuttle(Tempe,Phx):
Cost: $20; Time: 1.0 hour
Helicopter(Tempe,Phx):
Cost: $100; Time: 0.5 hour
Car(Tempe,LA):
Cost: $100; Time: 10 hour
Airplane(Phx,LA):
Cost: $200; Time: 1.0 hour
$300
$220
$100
$20
0
0.5 1 1.5
2
Cost(At(LA))
June 7th,
2006
10
time
Cost(At(Phx))
= Cost(Flight(Phx,LA))
ICAPS'06
Tutorial T6
108
Heuristics based on cost functions
Using Relaxed Plan
Direct
ƒ If we want to minimize
makespan:
ƒ h = t0
ƒ If we want to minimize cost
ƒ h = CostAggregate(G, t∞)
ƒ If we want to minimize a
function f(time,cost) of cost
and makespan
ƒ h = min f(t,Cost(G,t)) s.t. t0
≤ t ≤ t∞
ƒ E.g. f(time,cost) =
100.makespan + Cost then h
= 100x2 + 220 at t0 ≤ t = 2
≤ t∞
ƒ
Extract a relaxed plan using h
as the bias
ƒ
If the objective function is
f(time,cost), then action A ( to
be added to RP) is selected
such that:
f(t(RP+A),C(RP+A)) +
f(t(Gnew),C(Gnew))
is minimal
Gnew = (G ∪ Precond(A)) \ Effects)
cost
$300
$220
Cost(At(LA))
$100 Time of lowest cost
0
June 7th, 2006
Time of Earliest
achievement
∞
ICAPS'06 Tutorial T6
t0=1.5
2
t∞ = 10
time
109
Phased Relaxation
The relaxed plan can be adjusted to take into account
constraints that were originally ignored
Adjusting for Mutexes:
Adjust the make-span estimate of the relaxed plan by
marking actions that are mutex (and thus cannot be
executed concurrently
Adjusting for Resource Interactions:
Estimate the number of additional resource-producing
actions needed to make-up for any resource short-fall
in the relaxed plan
C = C + ΣR  (Con(R) – (Init(R)+Pro(R)))/∆R * C(AR)
June 7th, 2006
ICAPS'06 Tutorial T6
110
Handling Cost/Makespan Tradeoffs
Cost variation
Planning Problem
Select state with
lowest f-value
Generate
start state
f can have both
Cost & Makespan
components
Makespan variation
60
Queue of TimeStamped states
Satisfies
Goals?
No
Expand state
by applying
actions
Build RTPG
Propagate Cost
functions
Heuristic
Extract relaxed plan estimation
Adjust for
Mutexes; Resources
Total Cost
50
Yes
Partialize the
p.c. plan
40
30
20
10
0
Return
o.c and p.c plans
0.1
0.2
0.3
0.4
0.5
0.6
0
0.8
0.9
0.95
Results over 20 randomly generated
temporal logistics problems involve
moving 4 packages between different
locations in 3 cities:
O = f(time,cost) = α.Makespan + (1- α).TotalCost
June 7th, 2006
ICAPS'06 Tutorial T6
111
Prottle
ƒ SAPA-style (time-stamped states and event
queues) search for fully-observable
conditional plans using L-RTDP
ƒ Optimize probability of goal satisfaction
within a given, finite makespan
ƒ Heuristic estimates probability of goal
satisfaction in the plan suffix
June 7th, 2006
1
Alpha
ICAPS'06 Tutorial T6
112
Prottle planning graph
p4
p2
p3
o4
p1
o3
o2
o1
50%
Time Stamped
Propositions
60%
50%
Probabilistic
Outcomes at
Different Times
a2
o2
a1
40%
<0.4>
50%
p2
p1
p3
o1
50%
a1
p2
p1
June 7th, 2006
ICAPS'06 Tutorial T6
113
Probability Back-Propagation
<0>
<0.8>
<0.5>
p2
p3
p4
<1>
<1>
p1
o4
<0>
<0.5>
<0.5>
o2
o3
o1
50%
<0.5>
50%
<1>
<0.8>
<0.2>
<0.4>
50%
<0.5>
<0.2>
a1
<0.4>
p1 <0.05>
June 7th, 2006
Outcome cost is max
of affected proposition
costs
<0.04>
Action Cost is
expectation of
outcome costs
p3
o1
<0.1>
<0.5>
40%
a2
<0.4>
50%
p2
p1
Heuristic is product of
Relevant Node costs
<0>
o2
a1 <0.5>
60%
Vector of probability
Costs for each toplevel Goal:
1 – Pr(Satisfy gi)
Proposition cost is
Product of supported
Action costs
p2
ICAPS'06 Tutorial T6
114
Prottle Results
June 7th, 2006
ICAPS'06 Tutorial T6
115
PSP w/ Resources
ƒ Utility and Cost based on the values of
resources
ƒ Challenges:
ƒ Need to propagate cost for resource intervals
ƒ Need to support resource goals at different
levels
June 7th, 2006
ICAPS'06 Tutorial T6
116
Resource Cost Propagation
ƒ Propagate reachable values with cost
Move(Waypoint1)
Sample_Soil
Sample_Soil
Communicate
v1:
cost(
):
June 7th, 2006
2
0
[0,0]
1
[0,1]
[0,2]
0
1
2
A range of
possible values
2.5
Cost of achieving
each value bound
ICAPS'06 Tutorial T6
117
Cost Propagation on Variable Bounds
Sample_Soil
Effect: v1+=1
ƒ Bound cost dependent
upon
ƒ action cost
ƒ previous bound cost
- current bound cost
adds to the next
ƒ Cost of all bounds in
expressions
Sample_Soil
v1:
[0,0]
Sample_Soil
[0,1]
[0,2]
Cost(v1=2)
C(Sample_Soil)+Cost(v1=1)
Sample_Soil
Effect: v1+=v2
Sample_Soil
v1:
[0,0]
v2:
[0,3]
Sample_Soil
[0,3]
[0,6]
Cost(v1=6)
C(Sample_Soil)+Cost(v2=3)+Cost(v1=3)
June 7th, 2006
ICAPS'06 Tutorial T6
118
Results – Modified Rovers (numeric soil)
Average improvement: 3.06
June 7th, 2006
ICAPS'06 Tutorial T6
119
Anytime A* Search Behavior
June 7th, 2006
ICAPS'06 Tutorial T6
120
Results – Modified Logistics (#of packages)
Average
improvement: 2.88
June 7th, 2006
ICAPS'06 Tutorial T6
121
Cost-Based Conditional Planning
ƒ Actions may reduce uncertainty, but cost a lot
ƒ Do we want more “just in case” actions that are cheap,
or less that are more expensive
ƒ Propagate Costs on the LUG (CLUG)
ƒ Problem: LUG represents multiple explicit planning
graphs and the costs can be different in each planning
graph.
ƒ A single cost for every explicit planning assumes full positive
interaction
ƒ Multiple costs, one for each planning graph is too costly
ƒ Solution: Propagate cost for partitions of the explicit
planning graphs
June 7th, 2006
ICAPS'06 Tutorial T6
122
Cost Propagation on the LUG (CLUG)
s
0
10
0
ϕ0(B)
B
¬s
0
¬s
¬r
C
s
s
0
10
0
¬s
20
0
ϕ (C)
0
0
¬r
R
0
¬r
¬ s∧¬r
B
0
C
ϕ0(C)
10
0
¬s
20
24
10
0
R
7
ϕ0(R)
10
ϕ0(B)
0
¬r
r
17
7
0
ϕ (R)
0
¬r
Relaxed Plan (bold)
Cost = c(B) + C(R) = 10 + 7 = 17
s∧¬r
¬r
June 7th, 2006
ICAPS'06 Tutorial T6
The Medical Specialist
123
[Bryce & Kambhampati, 2005]
Using Propagated Costs Improves Plan Quality,
Average Path Cost
Total Time
cost-medical1
cost-medical1
Without much additional time
Low Cost Sensors
90
70
coverage
cost
MBP
GPT
300000
250000
60
50
200000
40
150000
30
100000
20
50000
10
0
1
High Cost Sensors
350000
coverage
cost
MBP
GPT
80
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
10
7
8
9
10
cost-medical2
450000
coverage
cost
MBP
GPT
120
0
10
cost-medical2
140
coverage
cost
MBP
GPT
400000
350000
100
300000
80
250000
200000
60
150000
40
100000
20
50000
0
0
1
2
3
June 7th, 2006
4
5
6
7
8
9
10
0
27
r
0
s
ICAPS'06 Tutorial T6
1
2
3
4
5
6
124
Overall Conclusions
ƒ Relaxed Reachability Analysis
ƒ Concentrate strongly on positive interactions and
independence by ignoring negative interaction
ƒ Estimates improve with more negative interactions
ƒ Heuristics can estimate and aggregate costs of
goals or find relaxed plans
ƒ Propagate numeric information to adjust estimates
ƒ Cost, Resources, Probability, Time
ƒ Solving hybrid problems is hard
ƒ Extra Approximations
ƒ Phased Relaxation
ƒ Adjustments/Penalties
June 7th, 2006
ICAPS'06 Tutorial T6
125
Why do we love PG Heuristics?
ƒ They work!
ƒ They are “forgiving”
ƒ
ƒ
You don't like doing mutex? okay
You don't like growing the graph all the way? okay.
ƒ Allow propagation of many types of information
ƒ Level, subgoal interaction, time, cost, world support, probability
ƒ Support phased relaxation
ƒ E.g. Ignore mutexes and resources and bring them back later…
ƒ Graph structure supports other synergistic uses
ƒ e.g. action selection
ƒ Versatility…
June 7th, 2006
ICAPS'06 Tutorial T6
126
Versatility of PG Heuristics
ƒ PG Variations
ƒ
ƒ
ƒ
ƒ
Serial
Parallel
Temporal
Labelled
ƒ Planning Problems
ƒ Classical
ƒ Resource/Temporal
ƒ Conformant
June 7th, 2006
ƒ Propagation Methods
ƒ
ƒ
ƒ
ƒ
Level
Mutex
Cost
Label
ƒ Planners
ƒ
ƒ
ƒ
ƒ
ICAPS'06 Tutorial T6
Regression
Progression
Partial Order
Graphplan-style
127
Download