Caching in Backtracking Search

advertisement
Caching in Backtracking Search
Fahiem Bacchus
University of Toronto
Introduction
Backtracking search needs only space linear in the number
of variable (modulo the size of the problem representation).
However, its efficiency can greatly benefit from using more
space to cache information computed during search.





Caching can provable yield exponential improvements in the
efficiency of backtracking search.
Caching is an any-space feature. We can use as much or as little
space for caching as we want without affecting soundness or
completeness.
Unfortunately, caching can also be time consuming.
How do we exploit the theoretical potential of caching
in practice?
2
Fahiem Bacchus, University of Toronto
3/22/2016
Introduction
We will examine this question for



The problem of finding a single solution
And for problems that require considering all solutions


Counting the number of solutions/computing probabilities
Finding optimal solutions.
We will look at




3
The theoretical advantages offered by caching.
Some of the practical issues involved with realizing these
theoretical advantages.
Some of the practical benefits obtained so far.
Fahiem Bacchus, University of Toronto
3/22/2016
Outline
Caching when searching for a single solution.
1.
Clause learning in SAT.



Clause learning in CSPs.

Caching when considering all solutions.
2.
Formula caching for sum of products problems



4
Theoretical results.
Its practical application and impact.
Theoretical results.
Practical application
Fahiem Bacchus, University of Toronto
3/22/2016
1.
5
Caching when searching for a
single solution
Fahiem Bacchus, University of Toronto
3/22/2016
1.1 Clause Learning in SAT
6
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in SAT (DPLL)
Clause learning is the most successful form of caching
when searching for a single solution
[Marques-Silva and Sakallah, 1996;
Zhang et al., 2001].
Has revolutionized DPLL SAT solvers (i.e., Backtracking
SAT solvers).


7
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in SAT
1. Branch on a variable
Assumption  X
2. Perform Propagation
Unit Propagation
( X , A) ( X , B) ( X , C ) ( A, B , C , D)
A  ( X , A)
B  ( X , B)
C  ( X , C)
D  ( A , B , C , D)
8
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in SAT
Assumption  X
( X , A)  A
( X , B)  B
( X , C)  C
( A , B , C , D)  D


Every inferred literal is labeled
with a clausal reason.
The clausal reason for a literal
is a subset of the previous
literals on the path whose
setting implies the literal
( A , B , C , D)  A  B  C  D
9
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in SAT
Assumption  X
( X , A)  A
( X , B)  B
( X , C)  C
( A , B , C , D)  D
Assumption  Y
(Y , P)  P
(Y , Q)  Q
(Q , P , D )  D
10
Contradiction:
1. D is forced to be both True and False.
2. The clause (Q,P,D) has been falsified
Falsified clauses are called
conflict clauses.
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in SAT
Assumption  X
( X , A)  A
( X , B)  B
( X , C)  C
( A , B , C , D)  D
Assumption  Y
(Y , P)  P
(Y , Q)  Q
(Q , P , D )  D
11



Clause learning occurs when a
contradiction is reached.
This involves a sequence of
resolution steps.
Any implied literal in a clausal
reason can be resolved away by
resolving the clause with the
clausal reason for the implied
literal.
(Q , P , D ), (Y , Q)  ( P , D , Y )
(Q , P , D ), (Y , P)  (Q , D , Y )
( A , B , C , D), ( X , C )  ( A, B , D, X )
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in SAT
Assumption  X
( X , A)  A

( X , B)  B
( X , C)  C
( A , B , C , D)  D
Assumption  Y
(Y , P)  P
(Y , Q)  Q
(Q , P , D )  D
12

SAT solvers utilize a particular
sequence of resolutions
against the conflict clause.
1-UIP learning [Zhang et al.,
2001]—iteratively resolve
away the deepest implied
literal in the clause until the
clause contains only one
literal from the level the
contradiction was generated.
(Q , P , D ), (Y , Q)  ( P , D , Y )
( P , D , Y ), (Y , P)  ( D , Y )
Fahiem Bacchus, University of Toronto
3/22/2016
Far Backtracking in SAT
Assumption  X
( X , A)  A
1-UIP Clause
(D ,Y )
( X , B)  B
( X , C)  C
( A , B , C , D)  D

Once the 1-UIP clause is learnt
the SAT Solver backtracks to the
level this clause became unit.
It then uses the clause to force
a new literal.
Performs UP

Continues its search.

(Y , D)  Y
Assumption  Y
(Y , P)  P
(Y , Q)  Q

(Q , P , D )  D
13
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Power of Clause Learning





The power of clause learning has been examined from the
point of view of the theory of proof complexity [Cook &
Reckhow 1977].
This area looks at the question of how large proofs can
become and their relative sizes in in different propositional
proof systems.
DPLL with Clause learning performing resolution (a
particular type or resolution).
Various restricted versions of resolution have been well
studied.
[Buresh-Oppenhiem, Pitassi 2003] contains a nice review of
previous results and a number of new results in this area.
14
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Power of Clause Learning



Every DPLL search tree refuting an UNSAT instance contains
a TREE-Resolution.
TREE-Resolution proofs can be exponentially larger than
REGULAR-Resolutions proofs.
REGULAR-Resolutions proofs can be exponentially larger
than general (unrestricted) resolution proofs.
 UNSAT formulas
min_size(DPLL Search Tree)
≥ min_size(TREE-Resolution)
>> min_size(REGULAR-Resolution)
>> min_size(general resolution)
15
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Power of Clause Learning

Furthermore every TREE-Resolution proof is a REGULARResolution proof and every REGULAR-Resolution proof is a
general resolution proof.
 UNSAT formulas
min_size(DPLL Search Tree)
≥ min_size(TREE-Resolution)
≥ min_size(REGULAR-Resolution)
≥ min_size(general resolution)
16
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Power of Clause Learning


[Beame, Kautz, and Sabharwal 2003] showed that clause
learning can SOMETIMES yield exponentially smaller proofs
than REGULAR.
Unknown if general resoution proofs are some times
smaller.
 UNSAT formulas
min_size(DPLL Search Tree)
≥ min_size(TREE-Resolution)
>> min_size(REGULAR-Resolution)
>> min_size(Clause Learning DPLL Search Tree)
≥ min_size(general resolution)
17
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Power of Clause Learning

It is still unknown if REGULAR or even TREE resolutions can
sometimes be smaller than the smallest Clause Learning
DPLL Search tree.
18
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Power of Clause Learning

It is also easily observed [Beame, Kautz, and Sabharwal
2003] that with restarts clause learning can make the DPLL
Search Tree as small as the smallest general resolution
proof on any formula.
 UNSAT formulas
min_size(Clause Learning + Restarts
DPLL Search Tree)
= min_size(general resolution)
19
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Power of Clause Learning

In sum. Clause Learning, especially with restarts, has the
potential to yield exponential reductions in the size of the
DPLL search tree.


With clause learning DPLL can potentially solve problems
exponentially faster.
That this can happen in practice has been irrefutably
demonstrated by modern SAT solvers.

20
Modern SAT solvers have been able to exploit
the theoretical potential of clause learning.
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Power of Clause Learning
The theoretical advantages of clause
learning also hold for CSP backtracking
search
 So the question that arises is can the
theoretical potential of clause learning also
be exploited in CSP solvers.

21
Fahiem Bacchus, University of Toronto
3/22/2016
1.1 Clause Learning in CSPs
22
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in CSPs

Joint work with George Katsirelos who just completed his
PhD with me “NoGood Processing in CSPs”

Learning has been used in CSPs, but have not had the kind
of impact Clause Learning has had in SAT.
[Decther 1990; T. Schiex & G. Verfaillie 1993; Frost &
Dechter 1994; Jussien & Barichard 2000]

This work has investigated NoGood learning.
A NoGood is a set of variable assignments that
cannot be extended to a solution.
23
Fahiem Bacchus, University of Toronto
3/22/2016
NoGood Learning

NoGood Learning is NOT Clause Learning.


It is strictly less powerful.
To illustrate this let us consider encoding a CSP as a SAT
problem, and compare what Clause Learning will do on the
SAT encoding to what NoGood Learning would do.
24
Fahiem Bacchus, University of Toronto
3/22/2016
Propositional Encoding of a CSP—the propositions.

A CSP consists of a



Set of variables Vi and constraints Cj
Each variable has a domain of values Dom[Vi] = {d1, …, dm}.
Consider the set of propositions Vi=dj one for each value of
each variable.

Vi=dj means that Vi has been assigned the value dj.


¬(Vi=dj) means that Vi has not been assigned the value dj



True when the assignment has been made.
True when dj has been pruned from Vi’s domain.
 if Vi has been assigned a different value, all other values (including
dj) are pruned from its domain.
Usually write Vi≠dj instead of ¬(Vi=dj).
We encode the CSP using clauses over these assignment
propositions.
25
Fahiem Bacchus, University of Toronto
3/22/2016
Propositional Encoding of a CSP—the clauses.

For each variable V with Dom[V]={d1,…,dk} we have the
following clauses:



For each constraint C(X1,…,Xk) over some set of variables
we have the following clauses:



(V=d1,V=d2,…,V=dk) (must have a value)
For every pair of values (di, dk) the clause (V ≠ di, V ≠ dk) (has a
unique value)
For each assignment to its variables that falsifies the constraint we
have a clause blocking that assignment.
If C(a,b,…,k) = FALSE then we have the clause
(X1 ≠ a, X2 ≠ b, …, Xk ≠ k)
This is the direct encoding of [Walsh 2000].
26
Fahiem Bacchus, University of Toronto
3/22/2016
DPLL on this Encoded CSP.

Unit Propagation on this encoding is essentially
equivalent to Forward Checking on the original CSP.
27
Fahiem Bacchus, University of Toronto
3/22/2016
Assumption  Q  0
DPLL on the encoded CSP




Variables Q, X, Y, Z, ...
Dom[Q] = {0,1}
Dom[X,Y,Z] = {1,2,3}
Constraints



Q+X+Y≥3
Q+X+Z≥3
Q+Y+Z≤3
(Q  0, Q  1)  Q  1
Assumption  X  1
( X  1, X  2)  X  2
( X  1, X  3)  X  3
(Q  0, X  1, Y  1)  Y  1
(Q  0, X  1, Z  1)  Z  1
Assumption  Y  2
(Y  2, Y  3)  Y  3
(Q  0, Y  2, Z  2)  Z  2
(Q  0, Y  2, Z  3)  Z  3
( Z  1, Z  2, Z  3)  Z  3
28
Fahiem Bacchus, University of Toronto
3/22/2016
Assumption  Q  0
DPLL on the encoded CSP

Clause learning
(Q  0, Q  1)  Q  1
Assumption  X  1
( X  1, X  2)  X  2
( X  1, X  3)  X  3
(Q  0, X  1, Y  1)  Y  1
(Q  0, X  1, Z  1)  Z  1
Assumption  Y  2
(Y  2, Y  3)  Y  3
(Q  0, Y  2, Z  2)  Z  2
(Q  0, Y  2, Z  3)  Z  3
( Z  1, Z  2, Z  3)  Z  3
( Z  1, Z  2, Z  3), (Q  0, Y  2, Z  3)  (Q  0, Y  2, Z  1, Z  2)
29
Fahiem Bacchus, University of Toronto
3/22/2016
Assumption  Q  0
DPLL on the encoded CSP

(Q  0, Q  1)  Q  1
Assumption  X  1
Clause learning
( X  1, X  2)  X  2
( X  1, X  3)  X  3
(Q  0, X  1, Y  1)  Y  1
(Q  0, X  1, Z  1)  Z  1
Assumption  Y  2
(Y  2, Y  3)  Y  3
(Q  0, Y  2, Z  2)  Z  2
(Q  0, Y  2, Z  1, Z  2), (Q  0, Y  2, Z  2)  (Q  0, Y  2, Z  1)
A 1-UIP Clause
30
Fahiem Bacchus, University of Toronto
3/22/2016
DPLL on the encoded CSP
(Q  0, Y  2, Z  1)





This clause is not a NoGood!
It asserts that we cannot have Q = 0, Y = 2, and Z ≠ 1
simultaneously.
This is a set of assignments and domain prunings that
cannot lead to a solution.
A NoGood is only a set of assignments.
To obtain a NoGood we have to further resolve away
Z = 1 from the clause.
31
Fahiem Bacchus, University of Toronto
3/22/2016
Assumption  Q  0
DPLL on the encoded CSP

NoGood learning
(Q  0, Q  1)  Q  1
Assumption  X  1
( X  1, X  2)  X  2
( X  1, X  3)  X  3
(Q  0, X  1, Y  1)  Y  1
(Q  0, X  1, Z  1)  Z  1
Assumption  Y  2
(Q  0, Y  2, Z  1), (Q  0, X  1, Z  1)  (Q  0, X  1, Y  2)


This clause is a NoGood. It says that we cannot have the
set of assignments Q = 0, X = 1, Y = 2
NoGood learning requires resolving the conflicts back to
the decision literals.
32
Fahiem Bacchus, University of Toronto
3/22/2016
NoGoods vs. Clauses (Generalized NoGoods)
Unit propagation over a collection of learnt NoGoods
is ineffective.
1.

Nogoods are clauses containing negated literals only, e.g.,
(Z ≠ 1, Y ≠ 0, X ≠ 3). If one of these clauses becomes unit,
e.g., (X ≠ 3), the forced literal can only satisfy other
NoGood clauses, it can never reduce the length of those
clauses.
A single clause can represent an exponential number
of NoGoods
2.

33
(Q ≠ 1, Z = 1, Y = 1) is equivalent to (Domain = {1, 2, 3})
(Q ≠ 1, Z ≠ 2, Y ≠ 2)
(Q ≠ 1, Z ≠ 3, Y ≠ 2)
(Q ≠ 1, Z ≠ 2, Y ≠ 3)
(Q ≠ 1, Z ≠ 3, Y ≠ 3)
Fahiem Bacchus, University of Toronto
3/22/2016
NoGoods vs. Clauses (Generalized NoGoods)
3.
4.
34
The 1-UIP clause can prune more branches during the
future search than the NoGood clause [Katsirelos
2007].
Clause Learning can yield super-polynomially smaller
search trees than NoGood Learning [Katsirelos 2007]
Fahiem Bacchus, University of Toronto
3/22/2016
Encoding to SAT

With all of these benefits of clause learning over
NoGood learning the natural question is
Why not encode CSPs to SAT and immediately obtain
the benefits of Clause Learning already implemented
in modern SAT solvers?
35
Fahiem Bacchus, University of Toronto
3/22/2016
Encoding to SAT
1. The SAT theory produced by the direct
encoding is not very effective.

Unit Prop. on this encoding only achieves
Forward Checking (a weak form of propagation).
Under the direct encoding constraints of
arity k yield 2O(k) clauses. Hence the
resultant SAT theory is too large.
No direct way of exploiting propagators.
2.
3.

36
Specialized polynomial time algorithms for doing
propagation on constraints of large arity.
Fahiem Bacchus, University of Toronto
3/22/2016
Encoding to SAT
Some of these issues can be address by better
encodings, e.g., [Bacchus 2007, Katsirelos & Walsh
2007, Quimper & Walsh 2007]. But overall complete
conversion to SAT is currently impractical.

37
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in CSPs without encoding

We can perform Clause Learning in a CSP solver by the
following steps:
1. The CSP solver must keep track of the chronological
sequence of variable assignments and value prunings
made as we descend each path in the search tree.
Q0
Q0
X 1
Q 1
X 1
X 2
X 3
Y 1
38
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in CSPs without encoding
2.
Each item must be labeled with a clausal reason
consisting of items previously falsified along the
path.
Q0
Q 1
X 1
X 2
X 3
Y 1
39
Assumption  Q  0
(Q  0, Q  1)  Q  1
Assumption  X  1
( X  1, X  2)  X  2
( X  1, X  3)  X  3
(Q  0, X  1, Y  1)  Y  1
Fahiem Bacchus, University of Toronto
3/22/2016
Clause Learning in CSPs without encoding
Contradictions are labeled by falsified clauses, e.g.,
Domain Wipe Outs can be labeled by the must have a
value clause.
3.
( Z  1, Z  2, Z  3)



From this information clause learning can be performed
whenever a contradiction is reached.
These clauses can be stored in a clausal database
Unit Propagation can be run on this database as new value
assignments or value prunings are preformed.

40
The inferences of Unit Propagation agument the other constraint
propagation done by the CSP solver.
Fahiem Bacchus, University of Toronto
3/22/2016
Higher Levels of Local Consistency

Note that this technique works irrespective of kinds of
inference performed during search.



That is, we can use any kind of inference we want to infer a
new value pruning or new variable assignment—as long as
we can label the inference with a clausal reason.
This raises the question of how do we generate clausal
reasons for other forms inference.
[Katsirelos 2007] answers this question for the most
commonly used form of inference: Generalized Arc
Consistency.

41
Including ways of obtain clausal reasons from various types
of GAC propagators, ALL-DIFF, GCC.
Fahiem Bacchus, University of Toronto
3/22/2016
Some Empirical Data [Katsirelos 2007]



GAC with NoGood learning helps a bit.
GAC with clause learning but where GAC labels it inferences with NoGoods offers only
minor improvements.
To get significant improvements must do clause learning as well have proper clausal
reasons from GAC.
42
Fahiem Bacchus, University of Toronto
3/22/2016
Observations


Caching techniques have great potential, but to make
them effective in practice it can sometimes require
resolving a number of different issues.
This work goes a long ways towards achieving the goal
of exploiting the theoretical potential of Clause
Learning.
Prediction:
Clause learning will play a fundamental role in the next
generation of CSP solvers, and these solvers will often be
orders of magnitude more effective than current solvers.
43
Fahiem Bacchus, University of Toronto
3/22/2016
Open Issues

Many issues remain open. Here we mention only one:
Restarts.

As previously pointed out, clause learning gains a
great deal more power with restarts. With restarts it
can be as powerful as unrestricted resolution.
Restarts play an essential role in the performance of
SAT solvers. Both full restarts and partial restarts.

44
Fahiem Bacchus, University of Toronto
3/22/2016
Search vs. Inference

With restarts and clause learning, the distinction of
search vs. inference is turned on its head.


Instead the distinction becomes systematic vs.
opportunistic inference.




Now search is performing inference.
Enforcing a high level of consistency during search is
performing systematic inference.
Searching until we learn a good clause is opportunistic.
Sat solvers perform very little systematic inference,
only Unit Propagation, but they perform lots of
opportunistic inference.
CSP solvers essentially do the opposite.
45
Fahiem Bacchus, University of Toronto
3/22/2016
One Open Question


In SAT solvers opportunistic inference is feasible: if a
learnt clause turns out not to be useful it doesn’t
matter much as the search to learn that clause did not
take much time. Search (nodes/second rate) is very
fast.
In CSP solvers the enforcement of higher levels of local
consistency makes restarts and opportunistic
inference very expensive. Search (nodes/second rate)
is very slow.
Is high levels of consistency really the most effective
approach for solving CSP once Clause learning is available?
46
Fahiem Bacchus, University of Toronto
3/22/2016
2.
47
Formula Caching when
considering all solutions.
Fahiem Bacchus, University of Toronto
3/22/2016
Considering All Solutions?
One such class of problems are those that can be
expressed as Sum-Of-Product problems [Decther
1999].
1. Finite Set of Variables, V1, V2, …, Vn
2. A finite domain of values for each variable Dom[Vi].
3. A finite set of real valued local functions f1, f2, …, fm.



48
Each function is local in the sense that it only depends on
a subset of the variables.
f1(V1, V2), f2(V2, V4, V6), …
The locality of the functions can be exploited
algorithmically.
Fahiem Bacchus, University of Toronto
3/22/2016
Sum of Products

The sum of products problem is to compute from this
representation
 f ()  f
1
V1


V2
2
()  f m ()
Vn
The local functions assign a value to every complete
instantiation of the variables (the product) and we want
to compute some amalgamation of these values
A number of different problems can be cast as
instances of sum-of-product [Decther 1999].
49
Fahiem Bacchus, University of Toronto
3/22/2016
Sum of Products—Examples



#CSPs count the number of solutions.
Inference in Bayes Nets.
Optimization: the functions are sub-objective functions
returning real values and the global objective is to
maximize the sum of the sub-objects (cf. soft
constraints, generalized additive utility).
50
Fahiem Bacchus, University of Toronto
3/22/2016
Algorithms Brief History [Arnborg et al. 1988]



It had long been noted that various NP-Complete
problems on graphs were easy on Trees.
With the characterization of NP completeness
systematic study of how to extend these techniques
beyond trees started in the 1970s
A number of Dynamic Programming algorithms were
developed for partial K-trees which could solve many
hard problem in time linear in the size of the graph (but
exponential in K)
51
Fahiem Bacchus, University of Toronto
3/22/2016
Algorithms Brief History [Arnborg et al. 1988]
These ideas were made systematic by Robertson &
Seymour who wrote a series of 20 articles to prove
Wagner’s conjecture [1983].
 Along the way they defined the concept of Tree and
Branch decompositions and the graph parameters TreeWidth and Branch-Width.
 It was subsequently noted that partial k-Trees are
equivalent to the class of graphs with tree width ≤ k. So
all of the dynamic programming algorithms developed
for partial K-Trees worked for tree width k graphs.
 The notion of tree width has been exploited many areas
of computer science and combinatorics & optimization.
.
52
Fahiem Bacchus, University of Toronto
3/22/2016

Three types of Algorithms
These algorithms all take one of three basic
forms all of which achieve the same kinds of
tree-width complexity guarantees.
 To understand these forms we first introduce
the notion of a branch decomposition (which
is somewhat easier to utilize than treedecompositions when dealing the local
functions with arity greater than 2)

53
Fahiem Bacchus, University of Toronto
3/22/2016
Branch Decomposition
Start with m leaf nodes one for each of the
local functions.
 Map each local function to some leaf node.

f3
54
f6
f1
f4
f2
Fahiem Bacchus, University of Toronto
f7
f5
3/22/2016
Branch Decomposition

Label each leaf node with the variables in the
scope of the associated local function.
V5,
V2
V4,
V5
V5,
V6,
V7
f3
f6
f1
55
V3,
V7
V1,
V3
V4,
V6
V8,
V6
f4
f2
f7
f5
Fahiem Bacchus, University of Toronto
3/22/2016
Branch Decomposition

Build a binary tree on top of these nodes.
V5,
V2
56
V4,
V5
V5,
V6,
V7
V3,
V7
V1,
V3
Fahiem Bacchus, University of Toronto
V4,
V6
V8,
V6
3/22/2016
Branch Decomposition

Then label the rest of the nodes of the tree.
V4,V6,V3
V4, V5
V5,
V2
57
V4,V6,V3
V5,V6,V3
V4,
V5
V5,
V6,
V7
V3,
V7
V3,V4,V6
V1,
V3
Fahiem Bacchus, University of Toronto
V4,
V6
V8,
V6
3/22/2016
Internal Labels
B variables in the rest
of the tree. (Not in
subtree under the
node)
B
A
B
A variables in the subtree below
58
Fahiem Bacchus, University of Toronto
3/22/2016
Internal Labels
AB
v
59
v
Fahiem Bacchus, University of Toronto
3/22/2016
Branch Width
Width of a particular decomposition is the
size of the minimal label.
 Branch width is the minimal width over all
possible branch decompositions.
 Branch width is no more than the tree-width.

60
Fahiem Bacchus, University of Toronto
3/22/2016
Algorithms: Dynamic Programming

Bottom up Dynamic Programming, e.g., Join
Tree algorithms in Bayesian Inference
V4,V6,V3
V4, V5
V5,
V2
61
V4,V6,V3
V5,V6,V3
V4,
V5
V5,
V6,
V7
V3,
V7
V3,V4,V6
V1,
V3
Fahiem Bacchus, University of Toronto
V4,
V6
V8,
V6
3/22/2016
Algorithms: Variable Elimination

Linearize the bottom up process: Variable
Elimination.
V4,V6,V3
V4, V5
V5,
V2
62
V4,V6,V3
V5,V6,V3
V4,
V5
V5,
V6,
V7
V3,
V7
V3,V4,V6
V1,
V3
Fahiem Bacchus, University of Toronto
V4,
V6
V8,
V6
3/22/2016
Algorithms: Instantiation and Decomposition

Instantiate variables starting at the top (V4,
V6 and V3) and decompose the problem.
V4,V6,V3
V5,V2,V7
V4,V6,V3
V8,V1
V4, V5
V5,
V2
63
V5,V6,V3
V4,
V5
V5,
V6,
V7
V3,
V7
V3,V4,V6
V1,
V3
Fahiem Bacchus, University of Toronto
V4,
V6
V8,
V6
3/22/2016
Instantiation and Decomposition

A number of works have used this approach






64
Pseudo Tree Search [Freuder & Quinn 1985]
Counting Solutions [Bayardo & Pehoushek
2000]
Recursive Conditioning [Darwiche 2001]
Tour Merging [Cook & Seymour 2003]
AND-OR Search [Dechter & Mateescu 2004]
…
Fahiem Bacchus, University of Toronto
3/22/2016
Instantiation and Decomposition
Solved by AND/OR search: as we instantiate
variables we examine the residual subproblem.
 If the sub-problem consists of disjoint parts
that share no variables (components) we
solve each component in a separate
recursion.

65
Fahiem Bacchus, University of Toronto
3/22/2016
Theoretical Results
With the right ordering this approach can
solve the problem in time
2O(w log n)
and linear space, were w is the branch (tree)
width of the instance.
 If the solved components are cached so that
they do not have to be solved again the
approach can solve the problem in time
nO(1)2O(w).

But now we need nO(1)2O(w) space.
66
Fahiem Bacchus, University of Toronto
3/22/2016
Solving Sum-Of-Products with Backtracking
 In joint work with Toniann Pitassi & Shannon
Dalmao we showed that caching is in fact
sufficient to achieve these bounds with
standard backtracking search. [Bacchus, et
al. 2003]


67
AND/OR decomposition of the search tree is not
necessary (and may be harmful). Instead an
ordinary decision tree can be searched.
Once again Caching again provides a significant
increase in the theoretical power of
backtracking.
Fahiem Bacchus, University of Toronto
3/22/2016
Simple Formula Caching



As assumptions are made during search the problem
is reduced.
In Simple Formula Caching we cache every solved
residual formula, and if we encounter the same
residual formula again we utilize its cached value
instead of solving the same sub-problem again.
Two residual formulas are the same if


68
They contain the same (unassigned) variables.
All instantiated variables in the remaining constraints
(constraints with at least one unassigned variable) are
instantiated to the same value.
Fahiem Bacchus, University of Toronto
3/22/2016
Simple Formula Caching
C1(X,Y), C2(Y,Z) C3(Y,Q) [X=a,Y=b]
 C2(Y=b,Z) C3(Y=b,Q)
 C1(X,Y), C2(Y,Z) C3(Y,Q) [X=b,Y=b]
 C2(Y=b,Z) C3(Y=b,Q)


These residual formulas are the same even
though we obtained them from different
instantiations.
69
Fahiem Bacchus, University of Toronto
3/22/2016
Simple Formula Caching
BTSimpleCache(Φ)
if InCache(Φ), return CachedValue(Φ)
else
Pick a variable V in Φ, value = 0
for d in Domain[V]
value = value + BTSimpleCache(Φ|V=d)
AddToCache(Φ, value)
return

Runs in time and space 2O(w log n)
70
Fahiem Bacchus, University of Toronto
3/22/2016
Component Caching



We can achieve the same performance as
AND/OR decomposition, i.e., 2O(w log n) time with
linear space or nO(1)2O(w) time with nO(1)2O(w) space
by examining the residual formula for disjoint
Components.
We cache these disjoint components as they are
solved.
We remove any solved component from the residual
formula.
71
Fahiem Bacchus, University of Toronto
3/22/2016
Component Caching
Since components are no longer solved in a
separate recursion we have to be a bit
cleverer about identifying the value of these
components from the search computation.
 This can be accomplished by using the cache
in a clever way, or by dependency tracking
techniques.

72
Fahiem Bacchus, University of Toronto
3/22/2016
Component Caching

There are some potential advantages of
searching a single tree rather than and
AND/OR tree.




73
With an AND/OR tree one has to make a commitment
to which component to solve first.
The wrong decision when doing Bayesian Inference or
optimization with Branch and Bound can be expensive
In the single tree the components are solved in an
interleaved manner.
This also provides more flexibility with respect to
variable ordering.
Fahiem Bacchus, University of Toronto
3/22/2016
Bayesian Inference via Backtracking Search
These ideas were used to build a fairly
successful Bayes Net Reasoner. [Bacchus et
al. 2003].
 Better performance however would require
exploiting more of the structure internal to
the local functions.

74
Fahiem Bacchus, University of Toronto
3/22/2016
Exploiting Micro Structure
C1(A,Y,Z) = TRUE
 A=0  Y = 1
A = 1  Y=0 & Z = 1
Then C(A=0,Y,Z) is in fact not a function of Z.
That is C(A=0,Y,Z)  C(A=0,Y)
 C2(X,Y,Z) = TRUE
X+Y+Z≥3
Then C(X=3, Y, Z) is already satisfied.

75
Fahiem Bacchus, University of Toronto
3/22/2016
Exploiting Micro Structure

In both cases if we could detect this during
search we could potentially


76
Generate more components, e.g., if we could
reduce C1(A=0,Y,Z) to a C1(A=0,Y) perhaps Y
and Z would be in different components.
Generate more cache hits, e.g., if the residual
formula differs from a cached formula only
because it contains C2(X=3,Y,Z), recognizing
that constraint is already satisfied would allow
us to ignore it and generate the cache hit.
Fahiem Bacchus, University of Toronto
3/22/2016
Exploiting Micro Structure

It is interesting to note that if we encode to
CNF we do get to exploit more of the micro
structure (structure internal to the
constraint).


Clauses with a true literal are satisfied and can be removed
from the residual formula.
Bayes Net Reasoners using CNF encodings
have displayed very good performance
[Chavira & Darwiche 2005].
77
Fahiem Bacchus, University of Toronto
3/22/2016
Exploiting Micro Structure


Unfortunately, as pointed out before, encoding in
CNF can result in an impractical blowup in the size
of the problem representation.
Practical techniques for exploiting the micro
structure remain a promising area for further
research.

78
Some promising results by Kitching to detect when a
symmetric version of a current component has already
been solved [Kitching & Bacchus 2007], but more work
to be done.
Fahiem Bacchus, University of Toronto
3/22/2016
Observations





Component caching solvers are the most effective ways
of exactly computing the number of solutions of a SAT
formula.
Allow solution of certain types of Bayesian Inference
problems not solvable by other methods.
Have shown promise in solving decomposable
optimization problems [Dechter & Marinescu 2005, de
Givry et al. 2006, Kitching & Bacchus 2007]
To date all these works have used AND/OR search. So
exploiting the advantages of plain backtracking search
remains work to be done [Kitching in progress].
Better exploiting micro structure also remains work to
be done.
79
Fahiem Bacchus, University of Toronto
3/22/2016
Conclusions



Caching is a technique that has great potential for
making a material difference in the effectiveness of
backtracking search.
The range of practical mechanisms for exploiting
caching remains a very fertile area for future
research.
Research in this direction might well change
present day “accepted practice” in constraint
solving.
80
Fahiem Bacchus, University of Toronto
3/22/2016
References

[Marques-Silva and Sakallah, 1996]

J. P. Marques-Silva and K. A. Sakallah. Grasp—a new search algorithm for Satisfiability. In ICCAD,
220-227, 1996.

[Zhang et al., 2001]

L. Zhang, C. F. Madigan, M. H. Moskewicz, and S. Malik. Efficient conflict driven learning in a
Boolean Satisfiability solver. In ICCAD, 279-285, 2001.
[Cook & Reckhow 1977]



[Buresh-Oppenhiem, Pitassi 2003]


P. Beame, H. Kautz, and A. Sabharwal: Towards Understanding and Harnessing the Potential of Clause
Learning. J. Artif. Intell. Res. (JAIR) 22: 319-351 (2004)
[Decther 1990]


J. Buresh-Oppenheim and T. Pitassi, The Complexity of Resolution Refinements, in Proceedings of the 18th
IEEE Symposium on Logic in Computer Science (LICS), June 2003, pp. 138-147
[Beame, Kautz, and Sabharwal 2003]


S. A. Cook and R. A. Reck-how, The relative efficiency of propositional proof systems, J. Symb. Logic, 44
(1977), 36-50.
R. Dechter: Enhancement Schemes for Constraint Processing: Backjumping, Learning, and Cutset
Decomposition. Artif. Intell. 41(3): 273-312 (1990)
[T. Schiex & G. Verfaillie 1993]

81
T. Schiex and G. Verfaillie. Nogood recording for static and dynamic CSP. Proceeding of the 5th IEEE
International Conference on Tools with Artificial Intelligence (ICTAI'93), p. 48-55, Boston, MA, november
1993.
Fahiem Bacchus, University of Toronto
3/22/2016
References








[Frost & Dechter 1994]

D. Frost, R. Dechter: Dead-End Driven Learning. AAAI 1994: 294-300
[Jussien & Barichard 2000]

N. Jussien, V. Barichard "The PaLM system: explanation-based constraint programming" , Proceedings of TRICS:
Techniques foR Implementing Constraint programming Systems, a post-conference workshop of CP 2000, pp. 118133, 2000
[Walsh 2000]

T. Walsh. SAT v CSP, Proceedings of CP-2000, pages 441-456, Springer-Verlag LNCS-1894, 2000.
[Katsirelos 2007]

G. Katsirelos, NoGood Processing in CSPs. PhD thesis. Department of Computer Science, University of Toronto.
[Bacchus 2007]

F. Bacchus. GAC via Unit Propagation. International Conference on Principles and Practice of Constraint
Programming (CP 2007) , pages 133-147.
[Katsirelos & Walsh 2007]

G. Katsirelos and T. Walsh. A Compression Algorithm for Large Arity Extensional Constraints.. Proceedings of CP2007, LNCS 4741, 2007.
[Quimper & Walsh 2007]

C. Quimper and T. Walsh. Decomposing Global Grammar Constraints. Proceedings of CP-2007, LNCS 4741, 590-604
2007.
[Decther 1999]

R. Decther. "Bucket Elimination: A unifying framework for Reasoning." In "Artificial Intelligence", October, 1999.
82
Fahiem Bacchus, University of Toronto
3/22/2016
References

[de Givry et al. 2006]

83
S. de Givry, T. Schiex, G. Verfaillie. Exploiting Tree Decomposition and Soft
Local Consistency in Weighted CSP. Proc. of AAAI'2006. Boston (MA),
USA.
Fahiem Bacchus, University of Toronto
3/22/2016
Download