Toward a Universal Inference Engine Henry Kautz University of Washington

advertisement
Toward a Universal Inference
Engine
Henry Kautz
University of Washington
With Fahiem Bacchus, Paul Beame,
Toni Pitassi, Ashish Sabharwal, & Tian Sang
Universal Inference Engine
 Old dream of AI –
 General Problem Solver – Newell & Simon
 Logic + Inference – McCarthy & Hayes
 Reality:




1962 – 50 variable toy SAT problems
1992 – 300 variable non-trivial problems
1996 – 1,000 variable difficult problems
2002 – 1,000,000 variable real-world
problems
Pieces of the Puzzle
 Good old Davis-Putnam-LogemannLoveland
Clause learning (nogood-caching)
Randomized restarts
Component analysis
Formula caching




Learning domain-specific heuristics
Generality
 SAT
#P complete
 #SAT
 Bayesian Networks
 Bounded-alternation Quantified Boolean
NP complete
formulas
 Quantified Boolean formulas
PSPACE complete
 Stochastic SAT
1. Clause Learning
with Paul Beame &
Ashish Sabharwal
DPLL Algorithm
DPLL(F)
// Perform unit propagation
while exists unit clause (y)  F
F  F|y
Remove all clauses containing y
Shrink all clauses containing y
if F is empty, report satisfiable and halt
if F contains the empty clause L, return
else choose a literal x
DPLL(F|x)
DPLL(F|x)
Extending DPLL: Clause Learning
When backtracking in DPLL, add new clauses
corresponding to causes of failure of the search
EBL [Stallman & Sussman 77, de Kleer & Williams 87]
CSP [Dechter 90]
CL [Bayardo-Schrag 97, MarquesSilva-Sakallah 96,
Zhang 97, Moskewicz et al. 01, Zhang et al. 01]
Added conflict clauses
 Capture reasons of conflicts
 Obtained via unit propagations from known ones
 Reduce future search by producing conflicts sooner
Conflict Graphs
Known Clauses
(p  q  a)
( a   b   t)
(t  x1)
(t  x2)
(t  x3)
(x1  x2  x3  y)
(x2  y)
1-UIP scheme
t
x1
p
q
y
a
t
b
Current decisions
p  false
q  false
b  true
Decision scheme
(p  q   b)
x2
false
y
x3
FirstNewCut scheme
(x1  x2  x3)
CL Critical to Performance
Best current SAT algorithms rely heavily on
CL for good behavior on real world problems
GRASP [MarquesSilva-Sakallah 96], SATO [H.Zhang 97]
zChaff [Moskewicz et al. 01], Berkmin [Goldberg-Novikov 02]
However,
 No good understanding of strengths
and weaknesses of CL
 Not much insight on why it works well
when it does
Harnessing the Power of Clause Learning
(Beame, Kautz, & Sabharwal 2003)
 Mathematical framework
for analyzing clause learning
 Characterization of its power
in relation to well-studied topics in
proof complexity theory
 Ways to improve solver performance
based on formal analysis
Proofs of Unsatisfiability
When F is unsatisfiable,
 Trace of DPLL on F is a proof of its unsatisfiability
 Bound on shortest proof of F gives
bound on best possible implementation
 Upper bound – “There is a proof no larger than K”
 Potential for finding proofs quickly
 Best possible branching heuristic, backtracking, etc.
 Lower bound – “Shortest proof is at least size K”
 Inherent limitations of the algorithm or proof system
Proof System: Resolution
F = (a  b)  (a  c)  a  (b  c)  (a  c)
Unsatisfiable CNF formula
L
empty clause
Proof size = 9
c
c
(b  c)
(a  b)
(a  c)
a
(bc)
(a  c)
Special Cases of Resolution
Tree-like resolution
 Graph of inferences forms a tree  DPLL
Regular resolution
 Variable can be resolved on only once on any
path from input to empty clause
 Directed acyclic graph analog of DPLL tree
 Natural to not branch on a variable once it has
been eliminated
 Used in original DP [Davis-Putnam 60]
…
Proof System Hierarchy
Frege systems
…
Space of formulas
with poly-size proofs
Pigeonhole principle
[Haken 85]
[Alekhnovich et al. 02]
[Bonet et al. 00]
General RES
Regular
RES
Tree-like
RES
Thm1. CL can beat Regular RES
Formula f
• Poly-size RES proof 
• Exp-size Regular proof
General RES
Regular
RES
Formula PT(f,)
• Poly-size CL proof
• Exp-size Regular proof
Regular
RES
CL
DPLL
Example formulas
GTn Ordering principle
Peb Pebbling formulas
[Alekhnovich et al. 02]
PT(f,): Proof Trace Extension
Start with
 unsatisfiable formula f with poly-size RES proof 
PT(f, ) contains
• All clauses of f
• For each derived clause Q=(abc) in ,
– Trace variable tQ
– New clauses (tQ  a), (tQ  b), (tQ  c)
CL proof of PT(f, ) works by branching negatively
on tQ’s in bottom up order of clauses of 
PT(f,): Proof Trace Extension
Formula f
RES proof 
L
Q  (a  b  c)
(a  b  x)
…
…
(c  x)
…
…
PT(f,): Proof Trace Extension
Formula f
RES proof 
PT(f,)
(tQ  a)
(tQ  b)
(tQ  c)
L
Q  (a  b  c)
(a  b  x)
…
…
(c  x)
…
• Trace variable tQ
• New clauses
…
PT(f,): Proof Trace Extension
Formula f
RES proof 
PT(f,)
(tQ  a)
(tQ  b)
(tQ  c)
L
Q  (a  b  c)
(a  b  x)
• Trace variable tQ
• New clauses
a
(c  x)
 tQ
b
c
…
…
…
…
x
false
 x
FirstNewCut
(a  b  c)
How hard is PT(f,)?
Easy for CL: by construction
CL branches exactly once on each trace variable
 # branches = size() = poly
Hard for Regular RES: reduction argument
 Fact 1: PT(f,)|TraceVars = true  f
 Fact 2: If  is a Regular RES proof of g,
then |x is a Regular RES proof of g|x
 Fact 3: f does not have small Regular RES proofs!
Implications?
DPLL algorithms w/o clause learning are
hopeless for certain formula classes
CL algorithms have
potential for small proofs
Can we use such analysis to
harness this potential?
Pebbling Formulas
(t1  t2)
fG = Pebbling(G)
T
(d1  d2)
E
F
(e1  e2)
A
B
C
(a1  a2)
(b1  b2)
(c1  c2)
A node X is “pebbled”
if (x1 or x2) holds
Source axioms:
A, B, C are pebbled
Pebbling axioms:
A and B are pebbled
 D is pebbled
Target axioms:
T is not pebbled
Pebbling Formulas
(t1  t2)
fG = Pebbling(G)
T
(d1  d2)
E
F
(e1  e2)
A
B
C
(a1  a2)
(b1  b2)
(c1  c2)
[(a1  a2 )  (b1  b2 )]  (d1  d 2 )
(a1  b1  d1  d 2 )
(a1  b2  d1  d 2 )
(a2  b1  d1  d 2 )
(a2  b2  d1  d 2 )
A node X is “pebbled”
if (x1 or x2) holds
Source axioms:
A, B, C are pebbled
Pebbling axioms:
A and B are pebbled
 D is pebbled
Target axioms:
T is not pebbled
Grid vs. Randomized Pebbling
(n1  n2)
m1
(t1  t2)
l1
(h1  h2)
(h1  h2)
(i1  i2)
e1
(e1  e2)
(f1  f2)
(g1  g2)
(i1  i2  i3  i4)
f1
(d1  d2  d3)
(a1  a2)
(b1  b2)
(c1  c2)
(g1  g2)
(d1  d2)
(a1  a2)
b1
(c1  c2  c3)
Branching Sequence
 B = (x1, x4, :x3, x1, :x8, :x2, :x4, x7, :x1, x2)
OLD: “Pick unassigned var x”
NEW: “Pick next literal y from B; delete it from B;
if y already assigned, repeat”
Statement of Results
Given a pebbling graph G, can efficiently generate
a branching sequence BG such that DPLL-Learn*(fG, BG)
is empirically exponentially faster than DPLL-Learn*(fG)
 DPLL-Learn*: Any clause learner with 1-UIP learning
scheme and fast backtracking,
e.g. zChaff [Moskewicz et al ’01]
 Efficient : Q(|fG|) time to generate BG
 Effective: Q(|fG|) branching steps to solve fG using BG
Genseq on Grid Pebbling
Graphs
(t1  t2)
(h1  h2)
(e1  e2)
(a1  a2)
(i1  i2)
(f1  f2)
(b1  b2)
(c1  c2)
(g1  g2)
(d1  d2)
Results: Grid Pebbling
zChaff settings
Max formula size solved
24 hours; 512 MB memory
Unsatisfiable
Naive
DPLL
Satisfiable
Learning
OFF
Branching Seq OFF
45 vars
55 vars
Learning
OFF
Branching Seq ON
45 vars
55 vars
Original
zChaff
Learning
ON
Branching Seq OFF
2,000 vars
4,500 vars
Modified
zChaff
Learning
ON
Branching Seq ON
2,500,000 vars
1,000,000 vars
Results: Randomized Pebbling
zChaff settings
Naive
DPLL
Max formula size solved
24 hours; 512MB memory
Unsatisfiable
Satisfiable
Learning
OFF
Branching Seq OFF
35 vars
35 vars
Learning
OFF
Branching Seq ON
45 vars
45 vars
Original
zChaff
Learning
ON
Branching Seq OFF
350 vars
350 vars
Modified
zChaff
Learning
ON
Branching Seq ON
45,000 vars
20,000 vars
2. Randomized Restarts
Restarts
 Run-time distribution typically has high
variance across instances or random
seeds
 tie-breaking in branching heuristic
 heavy-tailed – infinite mean & variance!
 Leverage by restart strategies
 Heavy-tailed  exponential distribution
short
long
Generalized Restarts
 At conflict backtrack to arbitrary point in
search tree
 Lowest conflict decision variable = backjumping
 Root = restart
 Other = partial restart
 Adding clause learning makes almost any
restart scheme complete (J. Marques-Silva 2002)
Aggressive Backtracking
 zChaff – at conflict backtrack to above
highest conflict variable
 Not traditional backjumping!
 Wasteful?
 Learned clause saves “most” work
 Learned clause provides new evidence about
best branching variable and value!
4. Component Analysis
#SAT – Model Counting
Why #SAT?
 Prototypical #P complete problem
 Can encode probabilistic inference
 Natural encoding for counting problems
Bayesian Nets to Weighted
Counting
 Introduce new vars so all internal vars are
deterministic
A
A
.1
B
B
A
~A
.2
.6
Bayesian Nets to Weighted
Counting
 Introduce new vars so all internal vars are
deterministic
A
.1
B
A
.1
A
A
B
B
A
~A
.2
.6
P
.2
P
Q
.6
Q
B  ( A  P)  (A  Q)
Bayesian Nets to Weighted
Counting
 Weight of a model
is product of
variable weights
 Weight of a
formula is sum of
weights of its
models
A
.1
A
P
.2
P
Q
.6
Q
B
B  ( A  P)  (A  Q)
Bayesian Nets to Weighted
Counting
 Let F be the
formula defining all
internal variables
 Pr(query) =
weight(F & query)
A
.1
A
P
.2
P
Q
.6
Q
B
B  ( A  P)  (A  Q)
Bayesian Nets to Counting
 Unweighted counting is case where all
non-defined variables have weight 0.5
 Introduce sets of variables to define other
probabilities to desired accuracy
 In practice: just modify #SAT algorithm to
weighted #SAT
Component Analysis
 Can use DPLL to count models
 Just don’t stop when first assignment is found
 If formula breaks into separate components
(no shared variables), can count each
separately and multiply results:
#SAT(C1  C2) = #SAT(C1) * #SAT(C2)
 RelSat (Bayardo) – CL + component analysis
at each node in search tree
 50 variable #SAT
 State of the art circa 2000
5. Formula Caching
with Fahiem Bacchus, Paul Beame,
Toni Pitassi, & Tian Sang
Formula Caching
 New idea: cache counts of residual formulas
at each node
 Bacchus, Dalmao & Pitassi 2003
 Beame, Impagliazzo, Pitassi, & Segerlind 2003
 Matches time/space tradeoffs of best known
exact probabilistic inference algorithms
nO (1) 2O ( w) where w is tree-width of formula
2O ( w log n ) if only linear space is used for cache
#SAT with Component Caching
#SAT(F)
a = 1;
for each G  to_components(F) {
if (G == ) m = 1;
else if (  G) m = 0;
else if (in_cache(G)) m = cache_value(G);
else { select v  F;
m = ½ * #SAT(G|v) +
½ * #SAT(G|v);
insert_cache(G,m);}
a = a * m; }
return a;
#SAT with Component Caching
#SAT(F)
Computes probability m that a random
a = 1;
truth assignment satisfies the formula:
# models
for each G  to_components(F)
{ = 2m
if (G == ) m = 1;
else if (  G) m = 0;
else if (in_cache(G)) m = cache_value(G);
else { select v  F;
m = ½ * #SAT(G|v) +
½ * #SAT(G|v);
insert_cache(G,m);}
a = a * m; }
return a;
Putting it All Together
 Goal: combine
 Clause learning
 Component analysis
 Formula caching
to create a practical #SAT algorithm
 Not quite as straightforward as it looks!
Issue 1: How Much to Cache?
 Everything
 Infeasible – 1050 + nodes
 Only sub-formulas on current branch
 Linear space
 Fixed variable ordering + no clause learning
== Recursive Conditioning (Darwiche 2002)
 Surely we can do better...
Efficient Cache Management
 Ideal: make maximum use of RAM, but not
one bit more
 Space & age-bounded caching
 Separate-chaining hash table
 Lazy deletion of entries older than K when
searching chains
 Constant amortized time
 If sum of all chains becomes too large, do
global cleanup
 Rare in practice
Issue 2: Interaction of Component
Analysis & Clause Learning
 Without CL, sub-formulas decrease in size
F
F|p
F|p
 With CL, sub-formulas may become huge
 1,000 clauses  1,000,000 learned clauses
Why this is a Problem
 Finding connected components at each
node requires linear time
 Way too costly for learned clauses
 Components using learned clauses
unlikely to reoccur
 Defeats purpose of formula caching
Suggestion
 Use only clauses derived from original
formula for
 Component analysis
 “Keys” for cached entries
 Use all the learned clauses for unit
propagation
 Can this possibly be sound?
Almost!
Main Theorem
G|
F|
A1
A2
A3
 Therefore: for SAT sub-formulas it is safe
to use learned clauses for unit propagation!
UNSAT Sub-formulas
 But if F| is unsatisfiable, all bets are off...
 Without component caching, there is still no
problem – because the final value is 0 in any
case
 With component caching, could cause
incorrect values to be cached
 Solution
 Flush siblings (& their descendents) of unsat
components from cache
#SAT CC+CL
#SAT(F)
a = 1; s = ;
for each G  to_components(F) {
if (in_cache(G)) { m = cache_value(G);}
else{ m = split(G);
insert_cache(G,m);
a = a * m;
if (m==0) { flush_cache(s);
break; }
else s = s  {G};
}}
return a;
#SAT CC+CL continued
split(G)
if (G == ) return 1;
if (  G) {
learn_new_clause()
return 0; }
select v  G;
return ½ * #SAT(G|v) + ½ * #SAT(G|v);
Results: Pebbling Formulas
30 layers = 930 variables, 1771 clauses
Results: Planning Problems
Results: Circuit Synthesis
Random 3-SAT
75 var random 3-sat
100000
10000
seconds
1000
relsat
100
CC+L
10
1
0.1
0.8
1
1.2
1.4
1.6
1.8
clause / variable ratio
2
2.2
Summary
 Dramatic progress in automating
propositional inference over last decade
 Progress due to the careful refinement of a
handful of ideas –
 DPLL, clause learning, restarts, component
analysis, formula caching
 The successful unification of these
elements for #SAT gives renewed hope for
a universal reasoning engine!
What’s Next?
 Evaluation of weighted-#SAT version on
Bayesian networks
 Better component ordering and
component-aware variable branching
heuristics
 Optimal restart policies for #SAT CC+CL
 Adapt techniques for sampling methods –
approximate inference???
Download