Satisfiability and State- Transition Systems: An AI Perspective Henry Kautz

advertisement
Satisfiability and StateTransition Systems: An AI
Perspective
Henry Kautz
University of Washington
Introduction
Both the AI and CADE/CAV communities have
long been concerned with reasoning about
state-transition systems
• AI – Planning
• CADE/CAV – Hardware and software verification
Recently propositional satisfiability testing has
turned out to be surprisingly powerful tool
• Planning – SATPLAN (Kautz & Selman)
• Verification – Bounded model checking (Clarke),
Debugging relational specifications (Jackson)
Shift in KR&R
Traditional approach: specialized languages /
specialized reasoning algorithms
New direction:
• Compile combinatorial reasoning problems into a
common propositional form (SAT)
• Apply new, highly efficient general search engines
Combinatorial
Task
SAT Encoding
Decoder
SAT Solver
Advantages
Rapid evolution of fast solvers
• 1990: 100 variable hard SAT problems
• 2000: 100,000 variables
Sharing of algorithms and implementations from
different fields of computer science
AI, theory, CAD, OR, CADE, CAV, …
Competitions - Germany 91 / China 96 / DIMACS93/97/98
JAR Special Issues – SAT 2000
RISC vs CISC
Can compile control knowledge into encodings
OUTLINE
1. Planning  Model Checking
2. Planning as Satisfiability
3. SAT + Petri Nets + Randomization = Blackbox
4. State of the Art
5. Using Domain-Specific Control Knowledge
6. Learning Domain-Specific Control Knowledge
GOAL: Overview of recent advances in planning that
may (or may not!) be relevant to the CADE
community!
1. Planning  Model Checking
The AI Planning Problem
Given a world description, set of
primitive actions, and goal description
(utility function), synthesize a control
program to achieve those goals
(maximize utility)
most general case covers huge area of
computer science, OR, economics
program synthesis, control theory, decision theory,
optimization …
STRIPS Style Planning
“Classic” work in AI has concentrated on STRIPS
style planning (“state space”)
•
•
•
•
Open loop – no sensing
Deterministic actions
Sequential (straight line) plans
SHAKEY THE ROBOT (Fikes & Nilsson 1971)
Terminology
•
•
•
•
Fluent – a time varying proposition, e.g. “on(A,B)”
State – complete truth assignment to a set of fluents
Goal – partial truth assignment (set of states)
Action – a partial function State  State
specified by Operator schemas
Operator Schemas
Each yields set of primitive actions, when
instantiated over a given finite set of objects
(constants)
Pickup(x, y)
• precondition: on(x,y), clear(x), handempty
• delete: on(x,y), clear(x), handempty
• add: holding(x), clear(y)
Plan: A (shortest) sequence of actions that
transforms the initial state into a goal state
• E.g.: Pickup(A,B); Putdown(A,C)
Parallelism
Useful extension: parallel composition of
primitive actions
• Only allowed when all orderings are well defined
and equivalent – no shared pre / effects
(act1 || act2)(s) = act2(act1(s)) = act1(act2(s))
• Can dramatically reduce size of search space
• Easy to serialize
• Distinguish:
– number of actions in a plan – “sequential length”
– number of sequentially composition operators in a
plan – “parallel length”, “horizon”
(a1 || a2); (a3 || a4 || a5) ; a6
- sequential length 6, parallel length 3
Some Applications of STRIPSStyle Planning
Autonomous systems
• Deep Space One Remote Agent (Williams & Nayak 1997)
Natural language understanding
• TRAINS (Allen 1998)
Internet agents
• Rodney (Etzioni 1994)
Manufacturing
• Supply chain management (Crawford 1998)
Abdundance of Negative
Complexity Results
Unbounded STRIPS planning: PSPACE-complete
•
Exponentially long solutions
(Bylander 1991; Backstrom 1993)
Bounded STRIPS planning: NP-complete
•
Is there a solution of (sequential/parallel) length N?
(Chenoweth 1991; Gupta and Nau 1992)
Domain-specific planning: may depend on whether
solutions must be the shortest such plan
•
Blocks world –
– Shortest plan – NP-hard
– Approximately shortest plan – NP-hard
(Selman 1994)
– Plan of length 2 x number blocks – Linear time
Approaches to AI Planning
Three main paradigms:
• Forward-chaining heuristic search over state space
– original STRIPS system
– recent resurgence – TLPlan, FF, …
• “Causal link” Planning
– search in “plan space”
– Much work in 1990’s (UCPOP, NONLIN, …), little now
• Constraint based planning
– view planning as solving a large set of constraints
– constraints specify relationships between actions and
their preconditions / effects
– SATPLAN (Kautz & Selman), Graphplan (Blum & Furst)
Relationship to Model Checking
Model checking – determine whether a formula
in temporal logic evaluates to “true” in a
Kripke structure described by a finite state
machine
• FSM may be represented explicitly or symbolically
STRIPS planning – special case where
• Finite state matchine (transition relation) specified
by STRIPS operators
– Very compact
– Expressive – can translate many other
representations of FSM’s into STRIPS with little or no
blowup
Relationship, continued
• Formula to be checked is of the form
“exists path . eventually . GOAL”
– Reachability
– Distinctions between linear / branching temporal
logics not important
Difference:
• Concentration on finding shortest plans
• Emphasis on efficiently finding single witness
(plan) as opposed to verifying a property holds in
all states
– NP vs co-NP
Why Not Use OBDD’s?
Size of OBDD explodes for typical AI
benchmark domains
• Overkill – need not / cannot check all states, even if
they are represented symbolically!
O(
2
n
2
) states
(But see recent work by M. Velosa on using OBDD’s for
non-deterministic variant of STRIPS)
Verification using SAT
Similar phenomena occur in some verification
domains
• Hardware multipliers
Has led to interest in using SAT techniques for
verification and bug finding
• Bounded – fixed horizon
• Under certain conditions can prove that only
considering a fixed horizon is adequate
– Empirically, most bugs found with small bounds
• E. Clarke – Bounded Model Checking
– LTL specifications, FSM in SMV language
• D. Jackson – Nitpick
– Debugging relational specifications in Z
2. Planning as Satisfiability
Planning as Satisfiability
SAT encodings are designed so that plans
correspond to satisfying assignments
Use recent efficient satisfiability procedures
(systematic and stochastic) to solve
Evaluation performance on benchmark
instances
SATPLAN
problem
description
axiom
schemas
instantiate
instantiated
propositional
clauses
length
plan
interpret
satisfying
model
SAT
engine(s)
SAT Encodings
Target: Propositional conjunctive normal form
Sets of clauses specified by axiom schemas
1. Create model by hand
2. Compile STRIPS operators
Discrete time, modeled by integers
•
•
upper bound on number of time steps
predicates indexed by time at which fluent holds /
action begins
– each action takes 1 time step
– many actions may occur at the same step
fly(Plane, City1, City2, i)  at(Plane, City2, i +1)
Solution to a Planning Problem
A solution is specified by any model (satisfying
truth assignment) of the conjunction of the
axioms describing the initial state, goal state,
and operators
Easy to convert back to a STRIPS-style plan
Complete SAT Algorithms
Davis-Putnam-Loveland-Logeman (DPLL)
• Depth-first backtrack search on partial truth assignments
• Basis of nearly all practical complete SAT algorithms
– Exception: “Stahlmark’s method”
• Key to efficiency: good variable choice at branch points
– 1961 – unit propagation, pure literal rule
– 1993 - explosion of improved heuristics and
implementations
+ MOM’s heuristic
+ satz (Chu Min Li) – lookhead to maximize rate of
creation of binary clauses
• Dependency directed backtracking – derive new clauses
during search – rel_sat (Bayardo), GRASP (di Silva)
– See SATLIB 1998 / Hoos & Stutzle
Incomplete SAT Algorithms
GSAT and Walksat (Kautz, Selman & Cohen 1993)
• Randomized local search over space of complete
truth assignments
• Heuristic function: flip variables to minimize
number of unsatisfied clauses
• Noisy “random walk” moves to escape local
minima
• Provably solves 2CNF, empirically successful on a
broad class of problems
– random CNF, graph coloring, circuit synthesis
encodings (DIMACS 1993, 1997)
Planning Benchmark Test Set
Extension of Graphplan benchmark set
logistics - transportation domain, ranging up to
•
•
•
•
14 time slots, unlimited parallelism
2,165 possible actions per time slot
optimal solutions containing 74 primitive actions
22000 legal states (60,000 Boolean variables)
Problems of this size not previously handled by
any domain-independent planning system
Initial SATPLAN Results
problem
rocket-b
horizon /
actions
7 / 30
Graphplan naïve SAT
encoding
9 min
16 min
hand SAT
encoding
41 sec
log-a
11 / 47
13 min
58 min
1.2 min
log-b
13 / 54
32 min
*
1.3 min
log-c
13 / 63
*
*
1.7 min
log-d
14 / 74
*
*
3.5 min
SAT solver: Walksat (local search)
* indicates no solution found after 24 hours
How SATPLAN Spent its Time
problem
instantiation walksat
DPLL
satz
rocket-b
41 sec
0.04 sec
1.8 sec
0.3 sec
log-a
1.2 min
2.2 sec
*
1.7 min
log-b
1.3 min
3.4 sec
*
0.6 sec
log-c
1.7 min
2.1 sec
*
4.3 sec
log-d
3.5 min
7.2 sec
*
1.8 hours
Hand created SAT encodings
* indicates no solution found after 24 hours
3. SAT + Petri Nets +
Randomization = Blackbox
Automating Encodings
While SATPLAN proved the feasibility of
planning using satisfiability, modeling the
transition function was problematic
• Direct naïve encoding of STRIPS operators as
axiom schemas gave poor performance
• Handcrafted encodings gave good performance,
but were labor intensive to create
– similar issues arise in work in verification – division
of labor between user and model checker!
GOAL: fully automatic generation and solution
of planning problems from STRIPS
specifications
Graphplan
Graphplan (Blum & Furst 1995)
Set new paradigm for planning
Like SATPLAN...
• Two phases: instantiation of propositional structure,
followed by search
Unlike SATPLAN...
• Efficient instantiation algorithm based on Petri-net
type reachability analysis
• Employs specialized search engine
Neither approach best for all domains
• Can we combine advantages of both?
Blackbox
STRIPS
Simplifier
CNF
Petri Net
Analysis
CNF
General
SAT
engines
Plan
Graph
Translator
Solution
Component 1: Petri-Net Analysis
Graphplan instantiates a “plan graph” in a
forward direction, pruning (some) unreachable
nodes
• plan graph  unfolded Petri net (McMillian 1992)
Polynomial-time propagation of mutualexclusion relationships between nodes
• Incomplete – must be followed by search to
determine if all goals can be simultaneously reached
Growing the Plan Graph
facts
P0
actions
facts
actions
Growing the Plan Graph
facts
actions
facts
P0
A1
P2
Q2
B1
R2
actions
Growing the Plan Graph
facts
actions
facts
P0
A1
P2
Q2
B1
R2
actions
C3
Growing the Plan Graph
facts
actions
facts
P0
A1
P2
Q2
B1
R2
actions
C3
Growing the Plan Graph
facts
actions
facts
P0
A1
P2
Q2
B1
R2
actions
C3
Growing the Plan Graph
facts
actions
facts
P0
A1
P2
Q2
B1
R2
actions
Component 2: Translation
facts
actions
facts
P0
A1
P2
actions
Q2
B1
R2
Action implies preconditions: A1  P0 , B1  P0
Mutual exclusion:  A1  B1 ,  P2   Q2
Initial facts hold at time 0
Goals holds at time n
Component 3: Simplification
Generated wff can be further simplified by more
general consistency propagation techniques
• unit propagation: is Wff inconsistant by resolution
against unit clauses?
O(n)
• failed literal rule: is Wff + { P } inconsistant by unit
propagation?
O(n2)
• binary failed literal rule: is Wff + { P V Q } inconsistant by
unit propagation?
O(n3)
General simplification techniques complement Petri net
analysis
Effective of Simplification
Problem
Vars
bw.a
bw.b
bw.c
log.a
log.b
log.c
log.d
2452
6358
19158
2709
3287
4197
6151
Percent vars set by
unit
failed
binary
prop
lit
failed
10%
100%
100%
5%
43%
99%
2%
33%
99%
2%
36%
45%
2%
24%
30%
2%
23%
27%
1%
25%
33%
Component 3: Randomized
Systematic Solvers
Background
Combinatorial search methods often exhibit
a remarkable variability in performance. It is
common to observe significant differences
between:
• different heuristics
• same heuristic on different instances
• different runs of same heuristic with different
random seeds
How SATPLAN Spent its Time
problem
instantiation walksat
DPLL
satz
rocket-b
41 sec
0.04 sec
1.8 sec
0.3 sec
log-a
1.2 min
2.2 sec
*
1.7 min
log-b
1.3 min
3.4 sec
*
0.6 sec
log-c
1.7 min
2.1 sec
*
4.3 sec
log-d
3.5 min
7.2 sec
*
1.8 hours
Hand created SAT encodings
* indicates no solution found after 24 hours
Preview of Strategy
We’ll put variability / unpredictability to our
advantage via randomization / averaging.
Cost Distributions
Consider distribution of running times of backtrack
search on a large set of “equivalent” problem
instances
• renumber variables
• change random seed used to break ties
Observation (Gomes 1996): distributions often have heavy
tails
• infinite variance
• mean increases without limit
• probability of long runs decays by power law (Pareto-Levy),
rather than exponentially (Normal)
Heavy Tails
Bad scaling of systematic solvers can be
caused by heavy tailed distributions
Deterministic algorithms get stuck on particular
instances
• but that same instance might be easy for a different
deterministic algorithm!
• Expected (mean) solution time increases without
limit over large distributions
• Log-log plot of distribution of running times
approximately linear
Heavy-Tailed Distributions
… infinite variance … infinite mean
Introduced by Pareto in the 1920’s
“probabilistic curiosity”
Mandelbrot established the use of heavy-tailed
distributions to model real-world fractal
phenomena
• stock-market, Internet traffic delays, weather
New discovery: good model for backtrack search
algorithms
• formal statement of “folk wisdom” of theorem proving
community
Randomized Restarts
Solution: randomize the systematic solver
• Add noise to the heuristic branching (variable choice)
function
• Cutoff and restart search after a fixed number of
backtracks
Provably Eliminates heavy tails
In practice: rapid restarts with low cutoff can
dramatically improve performance
(Gomes, Kautz, and Selman 1997, 1998)
• Related analysis: Luby & Zuckerman 1993; Alt & Karp 1996
Rapid Restart on LOG.D
log ( backtracks )
1000000
100000
10000
1000
1
10
100
1000
10000
100000
log( cutoff )
Note Log Scale: Exponential speedup!
1000000
Overall insight:
Randomized tie-breaking with
rapid restarts can boost
systematic search algorithms
• Speed-up demonstrated in many versions of Davis-Putnam
– basic DPLL, satz, rel_sat, …
• Related analysis: Luby & Zuckerman 1993; Alt & Karp 1996
Blackbox Results
problem
naïve SAT
encoding
hand SAT
encoding
41 sec
blackbox
walksat
2.5 sec
blackbox
satz-rand
4.9 sec
rocket-b
16 min
log-a
58 min
1.2 min
7.4 sec
5.2 sec
log-b
*
1.3 min
1.7 min
7.1 sec
log-c
*
1.7 min
15 min
9.3 sec
log-d
*
3.5 min
*
52 sec
Naïve/Hand SAT solver: Walksat (local search)
* indicates no solution found after 24 hours
4. State of the Art
Which Strategies Work Best?
Causal-link planning
• <5 primitive actions in solutions
• Works best if few interactions between goals
Constraint-based planning
•
•
•
•
Graphplan, SATPLAN, + descendents
100+ primitive actions in solutions
Moderate time horizon <30 time steps
Handles interacting goals well
1995 – 1999 Constraint-based approaches
dominate
• AIPS 1996, AIPS 1998
Graph Search vs. SAT
Time
SATPLAN
Blackbox with
solver schedule
Graphplan
Problem size / complexity
Caveat: on some domains SAT approach can exhaust
memory even though direct graph search is easy
Resurgence of A* Search
In most of 1980 – 1990’s forward chaining A*
search was considered a non-starter for
planning
Voices in the wilderness:
• TLPlan (Bacchus) – hand-tuned heuristic function
could make approach feasible
• LRTA (Geffner) – can automatically derive good
heuristic functions
Surprise – AIPS-2000 planning competition
dominated by A* planners!
• What happened?
Solution Length vs Hardness
Key issue: relationship between solution length
and problem hardness
• RECALL: In many domains, finding solutions that
minimize the number of time steps is NP-hard,
while finding an arbitrary solution is in P
– Put all the blocks on the table first
– Deliver packages one at a time
• Long solutions minimize goal interactions, so little
or no backtracking required by forward-chaining
search
• AIPS-2000 Planning Competition did not consider
plan length criteria!
Non-Optimal Planning
100000
10000
1000
blackbox
hsp
ff
100
10
1
0.1
0.01
easy
rocket-a
rocket-b
Optimal-Length Planning
100000
10000
1000
blackbox
hsp
ff
100
10
1
0.1
0.01
easy
rocket-a
rocket-b
Which Works Best, Continued
Constraint-based planning
• Short parallel solutions desired
• Many interactions between goals
• SAT translation a win for larger problems where
time is dominated by search (as opposed to
instantiation and Petri net analysis)
Forward-chaining search
• Long sequential solutions okay
• Few interactions between goals
Much recent progress in domain-independent
planning…
but further scaling to large real-world problems
requires domain-dependent techniques!
5. Using Domain-Specific
Control Knowledge
Kinds of Domain-Specific
Knowledge
Invariants true in every state
• A truck is only in one location
Implicit constraints on optimal plans
• Do not remove a package from its destination location
Simplifying assumptions
• Do not unload a package from an airplane, if the
airplane is not at the package’s destination city
– eliminates connecting flights
Expressing Knowledge
Such information is traditionally incorporated in
the planning algorithm itself
Instead: use additional declarative axioms
(Bacchus 1995; Kautz 1998; Huang, Kautz, & Selman 1999)
• Problem instance: operator axioms + initial and
goal axioms + control axioms
• Control knowledge constraints on search and
solution spaces
• Independent of any search engine strategy
Axiomatic Form
State Invariant:
at(truck,loc1,i) & loc1 loc2 
 at(truck,loc2,i)
Optimality:
at(pkg,loc,i) &  at(pkg,loc,i+1) & i<j 
at(pkg,loc,j)
Simplifying Assumption
incity(airport,city) & at(pkg,loc,goal) &
incity(airport,city) 
unload(pkg,plane,airport)
Adding Control Knowledge
Problem
Specification
Axioms
Domain-specific
Control Axioms
Instantiated
Clauses
SAT Simplifier
SAT “Core”
SAT Engine
As control
knowledge
increases, Core
shrinks!
Effect of Domain Knowledge
problem
walksat
DPLL
0.04 sec
walksat +
Kx
0.04 sec
1.8 sec
DPLL +
Kx
0.13 sec
rocket-b
log-a
2.2 sec
0.11 sec
*
1.8 min
log-b
3.4 sec
0.08 sec
*
11 sec
log-c
2.1 sec
0.12 sec
*
7.8 min
log-d
7.2 sec
1.1 sec
*
*
Hand created SAT encodings
* indicates no solution found after 24 hours
6. Learning Domain-Specific
Control Knowledge
Learning Control Rules
Axiomatizing domain-specific control
knowledge by hand is a time consuming art…
• Certain kinds of knowledge can be efficiently
deduced
– simple classes of invariants (Fox & Long; Gerevini &
Schubert)
• Can more powerful control knowledge be
automatically learned, by watching planner solve
small instances?
Form of Rules
We will learn two kinds of control rules,
specified as temporal logic programs
– (Huang, Selman, & Kautz 2000)
• Select rule: conditions under which an action must
be performed at the current time instance
• Reject rule: conditions under which an action must
not be performed at the current time instance
incity(airport,city) & GOAL(at(pkg,loc)) &
incity(airport,city) 
unload(pkg,plane,airport)
Training Examples
Blackbox initially solves a few small problem
instances
Each instance yields
• POSITIVE training examples – states at which
actions occur in the solution
• NEGATIVE training examples – states at which an
action does NOT occur, even though its
preconditions hold in that state
Note that this data is very noisy!
Rule Induction
Rules are induced using a version of Quinlan’s
FOIL inductive logic programming algorithm
• Generates rules one literal at time
• Select rules: maximize coverage of positive
examples, but do not cover negative examples
• Reject rules: maximize coverage of negative
examples, but do not cover positive examples
• Prune rules that are inconsistent with any of the
problem instances
– For details, see “Learning Declarative Control Rules
for Constraint-Based Planning”, Huang, Selman, &
Kautz, ICML 2000
Logical Status of Induced Rules
Some of the learned rules could in principle be
deduced from the domain operators together
with a bound on the length on the plan
• Reject rules for unnecessary actions
But in general: rules are not deductive
consequences
• Could rule out some feasible solutions
• In worst case: could rule out all solutions to some
instances
– not a problem in practice: such rules are usually
quickly pruned in the training phase
Effect of Learning
problem
horizon
blackbox
grid-a
13
21
learning
blackbox
4.8
grid-b
18
74
16.6
gripper-3
15
>7200
7.2
gripper-4
19
>7200
260
log-d
14
15.8
5.7
log-e
15
3522
291
mystery-10 8
>7200
47.2
mystery-13 8
161
12.2
AIPS-98 competition benchmarks
Summary
• Close connections between much work in AI
Planning and CADE/CAV work on model
checking
• Remarkable recent success of general
satisfiability testing programs on hard
benchmark problems
• Success of Blackbox and Graphplan in
combining ideas from planning and
verification suggest many more synergies
exist
• Techniques for learning and applying domain
specific control knowledge dramatically boost
performance for planning – could ideas also
be applied to verification?
Download