PlanningwithLocalSearch-jonk

advertisement
Planning with Local Search
MERS Seminar Lecture
March 6, 2003
Jonathan Kennell
Presentation Outline

Planning Overview
–
–
What is planning? – 5 mins.
Taxonomy of planners – 40 mins.
(or everything you ever wanted to know about planning in approximately 40 minutes)
5 minute break

LPG
–
–
–
–
Background information (WalkSAT) – 10 mins.
Linear action graphs and precedence graphs – 10 mins.
WalkPlan planning algorithm – 10 mins.
Example – 10 mins.
What is Planning?

Input
–
–
–
–

Set of world-states
Action operators (fn: world-state  world-state)
Initial world-state
Goal (possibly a partial state / set of world-states)
Output
–
Ordering of actions
From 6.834J POP lecture
World State

Set of facts and their degree of truth
–
Examples:





(Student Jonathan)
(Likes Jonathan Golf)
(Graduating Jonathan June)
// true
// false
// unknown *
Note: lisp notation used extensively in planning community
* Most planners don’t consider unknown facts
Planning Operators

Fn: world-state  world-state

Generally use STRIPS format:

–
Preconditions: facts that must be true before action can occur
–
Effects: facts that become true (or false) after the action occurs
Extra properties:
–
Separate start / invariant / end conditions and effects
–
Durations
–
Resource constraints
(:action Move
(:params ((robot ?r) (location ?a) (location ?b))
(:preconds (at ?r ?a))
(:effects (and (not (at ?r ?a)) (at ?r ?b))))
Mutual Exclusion

Sometimes planning operators conflict with each
other – we call a pair of conflicting operators mutex

Examples of mutex actions:
–
–

Interference: A deletes precondition or effect of B
Competing Needs: A and B have mutex preconditions
Planner must ensure no mutex actions co-occur.
What is a plan?

A plan is an ordering of actions that will
transition the system from the initial state to
the goal state.
Activity-A
fact-L
Activity-C
fact-J
fact-O
fact-M
Start
End
Activity-B
fact-K
Activity-D
fact-N
fact-P
Completeness / Consistency / Minimality

Complete Plan
–
–
A plan is complete IFF every precondition of every activity is achieved.
An activity’s precondition is achieved IFF:



Consistent Plan
–
–

The precondition is the effect of a preceding activity (support), and
No intervening step conflicts with the precondition (mutex).
The plan is consistent IFF the temporal constraints of its activities are
consistent (the associated distance graph has no negative cycles), and
no conflicting (mutex) activities can co-occur.
Minimal Plan
–
The plan is minimal IFF every constraint serves a purpose, i.e.,

If we remove any temporal or symbolic constraint from a minimal plan,
the new plan is not equivalent to the original plan
Variations on Classical Planning

Temporal planning
–

Planning with resources
–

Actions have durations
Facts can be quantified
Planning with uncertainty
–
Effects / durations of actions not guaranteed
Taxonomy of Planners
TLPlan
Global Search
Forward Chaining /
Backward Propagation
(entire plan-space)
Kirk Deductive
Controller
Local Search
Graphplan
Planners
Plan Graph
(condensed plan-space)
LPGP
Macro Decomposition
(restricted plan-space)
SHOP2
Kirk TPN Planner
LPG
Forward Chaining / Backward Propagation

Searches through entire plan-space by nondeterministically adding actions to plan
candidates.

Advantages:
–
–

generative (does not require strategies)
expressive (can handle time, resources, easily)
Disadvantages:
–
Inherently slow (plan-space is enormous)
Forward Chaining Example
Etc.
Familiar tradeoff: Efficient pruning methods versus optimality.
Case Study: TLPlan

TLPlan (Temporal Logic Planner) by Fahiem
Bacchus and Froduald Kabanza

TLPlan is based on a forward-chaining
planner

TLPlan uses domain-dependent temporal
logic to prune the search space
TLPlan: First-order Temporal Logic

Definition: First-order linear temporal logic
–
standard first-order logic, plus:


U (until), □ (always), ◊ (eventually), ○ (next)
Bounded quantifiers:
[x:y]  x . y(x)(x)
– [x:y]  x . y(x)(x)
–

Example:
–
□(on(B,C)  (on(B,C) U on(A,B)))
–
Asserts that whenever we enter a state in which B is on C it
remains on C until A is on B
TLPlan: Formula Progression Algorihtm

The Progress algorithm is used to check control strategies as the
system searches for a plan.

Inputs: An LTL formula f and a world w (generated by forwardchaining)

Output: A new formula f+, also expressed as an LTL formula,
representing the progression of f through the world w.

Algorithm: Progress(f,w)
–
Case
1.
2.
3.
4.
f =  is atomic:
if w entails f, f+ := TRUE, else f+ = FALSE
f = f1  f2:
f+ := Progress(f1,w)  Progress(f2,w)
f = f1:
f+ := Progress(f1,w)
… etc. … (see paper for complete algorithm)
TLPlan Example
Forward chaining begins…
Rules:
Etc.
(Any color)
This thread is efficiently guided by the rules
This thread is not guided well since no rules apply.
This results in pure forward-chaining search.
TLPlan Review

TLPlan has been around in various implementations
since 1995, although improvements have been
made as recently as last year.

TLPlan functions initially as a forward-chaining
planner, but can use logical rules to guide its search
and prune unfeasible threads.

TLPlan was the fastest domain-specific planner in
the 2002 AIPS competition.
Domain Knowledge

Planning is hard – the most general planners are
extremely slow

To increase speed, some planners sacrifice
generality by using domain-specific strategies.

TLPlan encodes the strategy into the goal
specification, while other planners decouple the
goals and the strategies.
Forward Chaining Speedup

Many researches have focused on discovering ways
to help speedup domain-independent forward chaining
planners.
–

Ex. SAPA by Minh B. Do & Subbarao Kambhampati
Methods focus on estimating plan cost using:
–
Relaxed plan-graphs

–
Estimated remaining cost to goal
Cost metrics

Ex. # actions, plan duration, etc.
Taxonomy of Planners
TLPlan
Global Search
Forward Chaining /
Backward Propagation
(entire plan-space)
Kirk Deductive
Controller
Local Search
Graphplan
Planners
Plan Graph
(condensed plan-space)
LPGP
Macro Decomposition
(restricted plan-space)
SHOP2
Kirk TPN Planner
LPG
Plan Graph

Plan-graph based planners first construct a compact representation of the planspace (the plan-graph), and then search that space.

Plan-graphs contains all possible plans up to a certain size, excluding
incomplete plans with co-occurring binary mutex actions.

Plan-graphs do not exclude all invalid plans, and depending on the domain may
yield extremely efficient or inefficient results.

Advantages:
–
–
–

generative
much faster than most forward-chaining planners
plan-graph can be generated in polynomial time and space
Disadvantages:
–
–
plan-graphs are less expressive (resources and time difficult)
in certain domains, search of plan-graph can be very inefficient
Forward Chaining vs. Plan Graph
Forward Chaining
Plan Graph
Case Study: Graphplan
Note the compact structure in this graph – it’s polynomial in size!
Mutex Relationships
Case Study: LPGP

Idea:
–
–

Advantage:
–

use Graphplan to identify complete plan (action structure)
then use Linear Programming to determine plan consistency and perform
scheduling (assign durations to actions)
Two-phase approach accomplishes temporal planning with the speed of a
plan-graph based planner
Disadvantages:
–
–
Cannot optimize over time (only optimizes over makespan)
Two-phase approach is potentially very inefficient


no temporal conflicts are used to guide Graphplan search
search not incremental – LP must be started from scratch each time
Taxonomy of Planners
TLPlan
Global Search
Forward Chaining /
Backward Propagation
(entire plan-space)
Kirk Deductive
Controller
Local Search
Graphplan
Planners
Plan Graph
(condensed plan-space)
LPGP
Macro Decomposition
(restricted plan-space)
SHOP2
Kirk TPN Planner
LPG
Macro Decomposition

Operates similar to context-free grammar
–
–
planner non-deterministically expands “macro-activities” until all plan actions
are primitive.
rules ensure that planner only explores space of complete plans

Planner still must ensure plan consistency.

Advantages
–

Fast
Disadvantages
–
–
all achieving strategies must be pre-encoded into macros
non-optimal: explores restricted plan-space, potentially excluding optimal
solutions
Case Study: SHOP2

SHOP2 by Dana Nau, Hector Munoz-Avila, Yue Cao, Amnon Lotem
and Steven Mitchell

SHOP2 works similar to the task-decomposition mechanism in Kirk

SHOP2 problems consist of:
–
–
–
Operators (with preconditions, add-effects and delete-effects)
Methods (rules for how to progress the plan)
Initial conditions and goals

SHOP2 is fairly fast, but all plan happenings must be pre-designed (at
some level) by a programmer.

SHOP2 plans do not support concurrency
SHOP2 Example
(defdomain basic-example (
(:operator (pickup ?a) () () ((have ?a)))
(:operator (drop ?a) ((have ?a)) ((have ?a)) ())
(:method (swap ?x ?y)
((have ?x))
((drop ?x) (pickup ?y))
((have ?y))
((drop ?y) (pickup ?x)))))
(defproblem problem1 basic-example
((have banjo)) ((swap banjo kiwi)))
Initial Condition
Start Strategy
Preconds
Delete-effects
Add-effects
Condition
Strategy
Allows one method to
decompose into multiple
possible subplans, depending
on the current state
SHOP2 In Action
(defdomain basic-example (
(:operator (pickup ?a) () () ((have ?a)))
(:operator (drop ?a) ((have ?a)) ((have ?a)) ())
?
(:method (swap ?x
banjo
?y) kiwi)
((have ?x))
banjo)) 
((drop ?x)
banjo)
(pickup
(pickup
?y))kiwi))
((have ?y))
kiwi))
((drop ?y)
kiwi)(pickup
(pickup?x)))))
banjo)))))
(defproblem problem1 basic-example
(have banjo)
((have
banjo)) ((swap banjo kiwi)))
DONE
State:
(have kiwi)
Case Study: SHOP2
Case Study: Kirk TPN Planner
Macro-Activity() [l,u]
Decomposition 1
Decomposition 2
5 Minute Break
Presentation Outline

Planning Overview
–
–
What is planning? – 5 mins.
Taxonomy of planners – 40 mins.
(or everything you ever wanted to know about planning in approximately 40 minutes)
5 minute break

LPG
–
–
–
–
Background information (WalkSAT) – 10 mins.
Linear action graphs and precedence graphs – 10 mins.
WalkPlan planning algorithm – 10 mins.
Example – 10 mins.
Taxonomy of Planners
TLPlan
Global Search
Forward Chaining /
Backward Propagation
(entire plan-space)
Kirk Deductive
Controller
Local Search
Graphplan
Planners
Plan Graph
(condensed plan-space)
LPGP
Macro Decomposition
(restricted plan-space)
SHOP2
Kirk TPN Planner
LPG
Local Search: WalkSAT

WalkSAT is a randomized algorithm for
solving SAT (propositional satisfiability)
problems.

It builds on the DPLL algorithm, but utilizes
local search and randomness.
WalkSAT

Problem:
–
Find a satisfying assignment to a logic formula


(A || !B) && (B || !C) && (C || !A) && (A || B || C)
WalkSAT:
–
–
Pick a random assignment to the variables
Until formula satisfied (or up to some max # of iterations),


Choose an unsatisfied clause and enumerate the ways of adjusting
the variables in order to satisfy it
With probability p
–

Choose the best-utility adjustment
Else
–
Choose a random adjustment
WalkSAT Example

(A || !B) && (B || !C) && (C || !A) && (A || B || C)

Pick !A, !B, !C
–
–

Pick A, !B, !C
–
–

(A || !B) && (B || !C) && (C || !A) && (A || B || C)
Options are to switch A or C
Pick A, !B, C
–
–

(A || !B) && (B || !C) && (C || !A) && (A || B || C)
Options are to switch A, B, or C
(A || !B) && (B || !C) && (C || !A) && (A || B || C)
Options are to switch B or C
Pick A, B, C
–
–
(A || !B) && (B || !C) && (C || !A) && (A || B || C)
Formula Satisfied!
WalkSAT Discussion

WalkSAT has proven to be very fast at
solving complicated SAT problems
–

WalkSAT can solve some problems that
systematic algorithms simply can’t handle
Due to randomness, WalkSAT is incomplete
–
WalkSAT may fail to discover a solution
Introduction to LPG

LPG (local search for plan-graphs) – by Alfonso
Gerevini and Ivan Serina

Blackbox mapped the planning problem to a CSP
and solved it using a SAT solver.

LPG unifies the planning and WalkSAT algorithms to
create the WalkPlan search algorithm.
LPG Big Idea

Big Idea:
–
–
Start with a random plan
While plan is incorrect / inconsistent


Identify and repair conflict
Basically the same idea of WalkSAT, but
applied to a special form of plan-graph
Temporal Action Graphs

Definitions:
–
Action-graph: the subset of a plan-graph containing the
action layers
–
Support: a fact is said to be “supported” if it is achieved by
some action in the previous action layer
–
Conflict:

a mutex between two actions

an action with an unsupported precondition
Linearization of Action Graphs

An Action Graph can be made linear by allowing only
one action per action layer.

The layers no longer explicitly represent an ordering
of time (temporal concurrency is still possible)

The layer ordering simply presents an action
sequence for the purposes of establishing fact
support relationships.
Example: Linear Action Graph
A
A0
A
A1
A
A0
A
No-op
No-op
A
No-op
B
B
B
B
C
C
C
C
B
A2
C
A plan-graph consists of alternating fact layers and action layers.
The actions alone constitute an action graph.
LPG operates directly on the action graph structure, inserting and removing
actions from various action layers as it repairs incomplete plans.
Example: Temporal Action Graph
Conflicts and Repair

An incomplete plan is manifested as an action graph with conflicts.

Example conflicts with resolution (repair) strategies:
Conflict Description
Conflict Resolution Strategies
Permanent mutex between two actions in
the same action layer
Remove one of the actions
Precondition mutex between two actions in
the same action layer
Unsupported precondition for an action in
an action layer
Remove one of the actions
Add support for one of the mutex
preconditions
Add an action to the previous action layer
that achieves the unsupported precondition
Remove the action whose preconditions
are not satisfied
LPG’s WalkPlan Planning Algorithm
LPG Algorithm
LPG:

Generate an initial dummy plan, P, either…
1.
–
Randomly
–
By adding actions to support all facts ignoring mutexes, or
–
Via some front-end plan generator
Generate Initial Plan
2.
Randomly choose a conflict in the action-graph, C
3.
Identify all possible ways of resolving C and evaluate them using the action evaluation function
Choose Conflict
–
Resolution techniques include: removing one of two mutex actions, adding a supporting action for an
unsupported fact, or removing an action that has an unsupported precondition
–
If a conflict resolution has cost 0, the plan is complete
–
Note: The action evaluation function uses Lagrange multipliers to dynamically weight the different factors
in the action evaluation function
Resolve & Evaluate
If a resolution introduces no new conflicts, apply it and go to step (2)
Else,
4.
Resolution Selection
–
with probability p, randomly choose a resolution, apply it and go to step (2)
–
with probability 1-p, choose the lowest cost conflict resolution, apply it and go to step (2)
–
Note: The resolution step includes a mechanism for extending the plan-graph
LPG Example
No-op
A0
A
Permanently mutex actions
A1
A1
in the same action layer
No-op
No-op by removing
No-op
(resolved
one
B
B
B
B
of the two actions)
Unsupported
Unsupported precondition
precondition
(resolved
achieving
Unsupported
precondition
(resolved by
by adding
removing
the
No-op
No-op
A2
action
at previous
layer)A2
achieving
conflicting
action) action
C
C(resolved by adding
C
C
action at previous action layer)
A
A0
A
Initial Conditions: ( nil )
Goals: ( A, B, C )
Actions:
A
Note: No-ops are
propagated during
conflict resolution
A0: preconds ( nil ) effects ( A )
Initial dummy plan
Identify conflict
Resolve conflict
A1: preconds ( A ) effects ( A, B )
A2: preconds ( A, B ) effects ( C )
Plan complete
A
B
C
LPG Analysis

Advantages:
–
–
–

LPG is fast – four orders of magnitude faster than the leading
optimal planners
LPG is domain-independent
LPG can easily handle resources and durative actions
Disadvantages:
–
LPG is randomized, so plans are not usually optimal and often
contain extraneous actions


LPG includes option to continue searching for multiple solutions, in the
hope of finding better plans
While maintaining expressivity, LPG sacrifices optimality for
speed.
AIPS 2002 Results (subset)
Planner
Problems
Solved
Problems
Attempted
Success Ratio
Capabilities
99%
(Strips, Numeric,
HardNumeric,
SimpleTime, Time,
Complex)
100%
(Strips, Numeric,
HardNumeric,
SimpleTime, Time,
Complex)
87%
(Strips, Numeric,
HardNumeric,
SimpleTime, Time)
SHOP2
2nd place
(hand-coded)
899
904
TLPlan
1st place
(hand-coded)
894
894
LPG
1st place
(fully-automated)
372
428
Summary

Planning is hard!
–
We want planners that





Want a speedup?
–
–
–

are fast
are domain-independent
are optimal
handle durative actions / resources / uncertainty
Sacrificing expressivity helps
Sacrificing optimality helps more
Sacrificing generality helps the most
LPG is today’s best planner than is domain-independent, expressive,
and fast – to achieve speed, it sacrifices optimality and uses local
search.
Planning References

Planning in general:
–

AIPS International Planning Competition, 2002:
–

http://www.dur.ac.uk/d.p.long/competition.html
Graphplan:
–
–

Russell and Norvig, “Artificial Intelligence: A Modern Approach”, section IV, Prentice
Hall; 2nd edition (December 20, 2002)
A. Blum and M. Furst, “Fast Planning Through Planning Graph Analysis”, Artificial
Intelligence, 90:281—300 (1997).
www.cs.cmu.edu/~avrim/graphplan.html
LPG:
–
–
A. Gerevini and I. Serina, “Planning through Stochastic Local Search and Temporal
Action Graphs”, technical report from Universita degli Studi di Brescia, November,
2002.
prometeo.ing.unibs.it/lpg/
Download