Dan’s Multi-Option Talk

advertisement
Dan’s Multi-Option Talk
• Option 1: HUMIDRIDE: Dan’s Trip to the East Coast
– Whining: High
– Duration: Med
– Viruses: Low
• Option 2: T-Cell: Attacking Dan’s Cold Virus
– Whining: Med
– Duration: Low
– Viruses: High
• Option 3: Model-Lite Planning: Diverse Multi-Option Plans and Dynamic
Objectives
– Whining: Low
– Duration: High
– Viruses: Low
© 2007 SRI International
1
Model-Lite Planning: Diverse MultiOption Plans and Dynamic Objectives
Daniel Bryce
William Cushing
Subbarao Kambhampati
© 2007 SRI International
Questions
• When must the plan executor decide on their planning objective?
– Before synthesis?

Traditional model
– Before execution?

Similar to IR model: select plan from set of diverse, but relevant plans
– During execution?

Multi-Option Plans (subsumes previous)
– At all?

“Keep your options open”
• Can the executor change their planning objective without replanning?
• Can the executor start acting without committing to an objective?
© 2007 SRI International
3
Overview
• Diverse Multi-Option Plans
– Diversity
– Representation
– Connection to Conditional Plans
– Execution
• Synthesizing Multi-Option Plans
– Example
– Speed-ups
• Analysis
– Synthesis
– Execution
• Conclusion
© 2007 SRI International
4
Diverse Multi-Option Plans
Cost
• Each plan step presents several diverse choices
–
–
–
–
Option 1: Train(MP, SFO), Fly(SFO, BOS), Car(BOS, Prov.)
Option 1a: Train(MP, SFO), Fly(SFO, BOS), Fly(BOS, PVD), Cab(PVD, Prov.)
Option 2: Shuttle(MP, SFO), Fly(SFO, BOS), Car(BOS, Prov.)
Option2a: Shuttle(MP, SFO), Fly(SFO, BOS), Fly(BOS, PVD), Cab(PVD, Prov.)
• Diversity is Reliant on Pareto Optimality
Fly(BOS,PVD)
Car(BOS,Prov.)
Fly(BOS,PVD)
O1 O2a
O1a
Duration
Diversity
– Each option is non-dominated
– Diversity through Pareto Front w/ High Spread
Train(MP, SFO) Fly(SFO, BOS)
O2
Cab(PVD, Prov.)
O1a
O1
Cab(PVD, Prov.)
Fly(SFO, BOS)
O2a
Shuttle(MP, SFO)
Car(BOS,Prov.)
© 2007 SRI International
O2
5
Dynamic Objectives
Train(MP, SFO) Fly(SFO, BOS)
Fly(BOS,PVD)
Car(BOS,Prov.)
Fly(BOS,PVD)
Cab(PVD, Prov.)
O1a
O1
Cab(PVD, Prov.)
Fly(SFO, BOS)
O2a
Shuttle(MP, SFO)
Car(BOS,Prov.)
O2
• Multi-Options Plans are a type of Conditional Plan
– Conditional on the user’s Objective Function
– Allow the objective Function to change
– Ensured that, irrespective of their obj. fn., will have non-dominated options
© 2007 SRI International
6
Executing Multi-Option Plans
Local action choice
corresponds
to multiple options
Cost
O2
Option values
Change at each step
Cost
O1 O2a
O1a
Duration
Cost
O1
O1
O1a
O1a
Duration
Train(MP, SFO) Fly(SFO, BOS)
Cost
Duration
Fly(BOS,PVD)
Car(BOS,Prov.)
Fly(BOS,PVD)
O1
Duration
Cab(PVD, Prov.)
O1a
O1
Cab(PVD, Prov.)
Fly(SFO, BOS)
O2a
Shuttle(MP, SFO)
Car(BOS,Prov.)
© 2007 SRI International
O2
7
Multi-Option Conditional Probabilistic Planning
• (PO)MDP setting: (Belief) State Space Search
– Stochastic Actions, Observations, Uncertain Initial State, Loops
– Two Objectives: Expected Plan Cost, Probability of Plan Success

Traditional Reward functions are linear combination of above. Assume objective fn. 
• Extend LAO* to multiple objectives (Multi-Option LAO*)
– Each generated (belief) state has an associated Pareto set of “best” sub-plans
– Dynamic programming (state backup) combines successor state Pareto sets

Yes, its exponential time per backup per state 
♦
There are approximations 
– Basic Algorithm

While not have a good planS
♦
♦
© 2007 SRI International
ExpandPlan S
RevisePlanS
8
Example of State Backup
© 2007 SRI International
9
Search Example -- Initially
Initialize Root Pareto Set
with null plan and heuristic
estimate
Pr(G)

0.0

© 2007 SRI International
C
10
Search Example – 1st Expansion
Expand Root Node and
Initialize Pareto Sets of
Children with null plan
And Heuristic Estimate
Pr(G)
Pr(G)

Pr(G)
0.0

0.0
C
0.8
a2
a1



C
0.0
C
0.2
Pr(G)

0.0

© 2007 SRI International

C
11
Search Example – 1st Revision
Recompute Pareto Set
For Root, find best heuristic
Point is through a1
Pr(G)
Pr(G)

Pr(G)
0.0

0.0
C
0.8
a2
a1



C
0.0
C
0.2
Pr(G)
 a1
0.0

© 2007 SRI International

C
12
Search Example – 2nd Expansion
Expand Children of a1 and initialize their Pareto
Sets with null plan and Heuristic estimate –
Both children Satisfy the Goal with non-zero probability
0.7
Pr(G)


Pr(G)


C
0.5
C
a4
Pr(G)

Pr(G)
0.0

Pr(G)
0.0
C
0.8
a2
a1
C
0.0
C
0.2
Pr(G)
 a1
0.0

© 2007 SRI International




a3
C
13
Search Example – 2nd Revision
Recompute Pareto Set of both expanded nodes and the root node –
There is a feasible plan a1, [a4, a3] that satisfies the goal with 0.66
probability and cost 2. The heuristic estimate indicates extending
a1, [a4, a3] will lead to a plan that satisfies the goal with 1.0 probability
0.7
Pr(G)


Pr(G)


C
0.5
C
a4
Pr(G)

Pr(G)
0.0

Pr(G)
0.0
C
0.8
a2
a1

 a4
a3
 a3
C
0.0
C
0.2
Pr(G)
 a1,[a4|a3]
a1,[a4|a3]
0.0

© 2007 SRI International

a4
a3
C
14
Search Example – 3rd Expansion
Expand Plan to include a7.
There is no applicable action after a3
0.9
Pr(G)
 
C
a7
0.7
Pr(G)


Pr(G)


C
0.5
C
a4
Pr(G)

Pr(G)
0.0

Pr(G)
0.0
C
0.8
a2
a1

 a4
a3
 a3
C
0.0
C
0.2
Pr(G)
 a1,[a4|a3]
a1,[a4|a3]
0.0

© 2007 SRI International

a4
a3
C
15
Search Example – 3rd Revision
Recompute all Pareto Sets that are
Ancestors of Expanded Nodes.
Heuristic for plans extended through
a3 is higher because of no applicable
action. Heuristic at root node changes
to plans extended through a2
0.9
Pr(G)
 
C
a7
Pr(G)
0.7

Pr(G)
a7 , a7

C
C
a4
Pr(G)

0.0

0.0
C
0.8
a2
a1
Pr(G)
 a ,a
a4, a7 4 7
a4

C

a3
0.5
a3
Pr(G)
 a3
C
0.0
0.2
Pr(G)
0.0

© 2007 SRI International

 a2
a1,[a4, a7|a3]
a1,[a4|a3]
C
16
Search Example – 4th Expansion
0.9
Expand Plan through a2,
one expanding child satisfies
the goal with 0.1 probability.

Pr(G)
0.1
Pr(G)
C

a6
a5
Pr(G)
0.7

C

0.0
C
0.8
a2
a1
C
Pr(G)
 a ,a
a4, a7 4 7
a4

C

a3
0.5
a3
Pr(G)
 a3
C
0.0
0.2
Pr(G)
0.0

© 2007 SRI International


Pr(G)
 a7
a7 ,

C
a4
0.0

C
a7
Pr(G)

0.0
Pr(G)
 
a
a1,[a42, a7|a3]
a1,[a4|a3]
C
17
Search Example – 4th Revision
0.9
Recompute Pareto sets of expanded
Ancestors. Plan a2, a5 is dominated
at the root.
0.1

Pr(G)
C

a6
a5
0.7

C

a5
0.0
C
0.8
a2
a1
0.0
C
Pr(G)
 a ,a
a4, a7 4 7
a4

C

a3
0.5
a3
Pr(G)
 a3
C
0.0
0.2
Pr(G)
 a ,a
a1,[a42, a76|a3]
a1,[a4|a3]
 a2,a5
© 2007 SRI International


Pr(G)
a7  a7

C
a4
Pr(G)
 a6
0.0
C
a7
Pr(G)
Pr(G)

0.0
Pr(G)
 
C
18
Search Example – 5th Expansion
Pr(G)


0.6
0.9
C
a8
Pr(G)
0.1

Pr(G)
C

a6
a5
Pr(G)
0.0

a5
C
a7
Pr(G)

0.0
Pr(G)
 
0.7

C
0.0
C
0.8
Expand Plan through a6
C
a2
a1
Pr(G)
 a ,a
a4, a7 4 7
a4

C

a3
0.5
a3
Pr(G)
 a3
C
0.0
0.2
Pr(G)
0.0

© 2007 SRI International

Pr(G)
a7  a7

C
a4
 a6

 a2, a6
a1,[a4, a7|a3]
a1,[a4|a3]
C
19
Search Example – 5th Revision
0.6
Pr(G)


0.9
C
a8
0.0
Pr(G)
 a8
a8

Pr(G)
0.1
Pr(G)
C
a5
0.7

C
0.0
0.8
© 2007 SRI International
a2
a1
0.0


Pr(G)
a7  a7

C
a4
Pr(G)
 a6,a8
a6, a8
0.0
 a5
C
Recompute Pareto Sets.
Plans a2, a6, a8,
and a2, a5 are
dominated at root.
C
a7

a6
Pr(G)
 
C
a3
Pr(G)
Pr(G)
 a ,a
a4, a7 4 7
a4

C

a3
0.5
 a3
C
0.0
0.2
Pr(G)
 a2,a6,a8
a1,[a4, a7|a3]
a1,[a4|a3]
a2,a6,a8
 a2,a5
C
20
Search Example – Final
Pr(G)


0.6
0.9
C
a8
0.0
Pr(G)
 a8
a8

Pr(G)
0.1
Pr(G)

a5
0.7

C

 a6,a8
a6, a8
a5
C
0.0
0.8
a2
a1
C
Pr(G)
 a ,a
a4, a7 4 7
a4

C

a3
0.5
a3
Pr(G)
 a3
C
0.0
0.2
Pr(G)
0.0

© 2007 SRI International


Pr(G)
a7  a7

C
a4
Pr(G)
0.0
C
a7
C
a6
Pr(G)
 
 a2,a6,a8
a1,[a4, a7|a3]
a1,[a4|a3]
C
21
Speed-ups
-domination [Papadimtriou & Yannakakis, 2003]
• Randomized Node Expansions
– Simulate Partial Plan to Expand a single node
• Reachability Heuristics
– Use the McLUG (CSSAG)
© 2007 SRI International
22
-domination
1-Pr(G)
Check
Domination
Non-Dominated
Each Hyper-Rectangle
Has a single point
Multiply Each
Objective
By (1+)
Dominated
x
x’
Cost
x’/x = 1+
© 2007 SRI International
23
Synthesis Results
© 2007 SRI International
24
Execution Results
• Random Option: Sample Option, execute action
• Keep Options Open
– Most Options: Execute action in most options
– Diverse Options: Execute action in most
diverse set of options
© 2007 SRI International
25
Summary & Future Work
• Summary
– Multi-Option Plans let executor delay/change commitments to objective functions
– Multi-Option Plans help executor understand alternatives
– Multi-Option Plans passively enforce diversity through Pareto set approximation
• Future Work
– Synthesis


Proactive Diversity: Guide search to broaden Pareto set
Speedups: Alternative Pareto set representation, standard MDP tricks
– Execution


Option Lookahead: how will set of options change?
Meta-Objectives: Diversity, Decision Delay
– Model-Lite Planning


© 2007 SRI International
Unspecified objectives (not just unspecified objective function)
Objective Function preference elicitation
26
Final Options
• Option 1: Questions
• Option 2: Criticisms
• Option 3: Next Talk!
© 2007 SRI International
27
Download