Towards Efficient Sampling: Exploiting Random Walk Strategy 1

advertisement
Towards Efficient Sampling: Exploiting
Random Walk Strategy
Wei Wei, Jordan Erenrich, and Bart Selman
1
Motivations

Recent years have seen tremendous improvements
in SAT solving. Formulas with up to 300 variables
(1992) to formulas with one million variables.
 Various techniques for answering
“does a satisfying assignment exist for a formula?”
 But there are harder questions to be answered .
“how many satisfying assignments does a formula
have?” Or closely related “can we sample from the
satisfying assignments of a formula?”
2
Complexity
SAT is NP-complete. 2-SAT is solvable
in linear time.
 Counting assignments (even for 2cnf) is
#P-complete, and is NP-hard to
approximate (Valiant, 1979).
 Approximate counting and sampling are
equivalent if the problem is “downward
self-reducible”.

3
Challenge

Can we extend SAT techniques to solve
harder counting/sampling problems?

Such an extension would lead us to a wide
range of new applications.
SAT testing
logic inference
counting/sampling
probabilistic reasoning
4
Standard Methods for Sampling MCMC
Based on setting up a Markov chain
with a predefined stationary distribution.
 Draw samples from the stationary
distribution by running the Markov chain
for sufficiently long.
 Problem: for interesting problems,
Markov chain takes exponential time to
converge to its stationary distribution

5
Simulated Annealing

Simulated Annealing uses Boltzmann
distribution as the stationary distribution.
 At low temperature, the distribution
concentrates around minimum energy states.
 In terms of satisfiability problem, each
satisfying assignment (with 0 cost) gets the
same probability.
 Again, reaching such a stationary distribution
takes exponential time for interesting
problems. – shown in a later slide.
6
Standard Methods for Counting
Current solution counting procedures
extend DPLL methods with component
analysis.
 Two counting precedures are available.
relsat (Bayardo and Pehoushek, 2000) and
cachet (Sang, Beame, and Kautz, 2004). They
both count exact number of solutions.

7

Question: Can state-of-the-art local
search procedures be used for SAT
sampling/counting? (as alternatives to
standard Monte Carlo Markov Chain
and DPLL methods)
Yes! Shown in this talk
8
Our approach – biased random
walk
Biased random walk = greedy bias +
pure random walk. Example: WalkSat
(Selman et al, 1994), effective on SAT.
 Can we use it to sample from solution
space?

– Does WalkSat reach all solutions?
– How uniform is the sampling?
9
WalkSat
visited 500,000 times
visited 60 times
Hamming distance
10
Probability Ranges in Different
Domains
Instance
Runs
Hits
Rarest
Hits
Common
Common-to
-Rare Ratio
Random
50  106
53
9  105
1.7  104
Logistics
1  106
84
4  103
50
Verif.
1  106
45
318
7
11
Improving the Uniformity of Sampling
Nonergodic
Ergodic
Ergodic
Quickly reach sinks Slow convergence
WalkSat

+
SA
=
Does not satisfy DBC
SampleSat
SampleSat:
– With probability p, the algorithm makes a
biased random walk move
– With probability 1-p, the algorithm makes a
SA (simulated annealing) move
12
Comparison Between WalkSat and
SampleSat
WalkSat
SampleSat
10
104
13
SampleSat
14
Hamming Distance
Instance
Runs
Hits
Rarest
Hits
Common
Common-to
-Rare Ratio
WalkSat
Ratio
SampleSat
Random
50  106
53
9  105
1.7  104
10
Logistics
1  106
84
4  103
50
17
Verif.
1  106
45
318
7
4
15
Analysis
c1
c2
c3
…
cn
a
b
F
F
F
…
F
F
F
F
F
F
…
F
F
T
16
Property of F*
Proposition 1 SA with fixed temperature
takes exponential time to find a solution
of F*
 This shows even for some simple
formulas in 2cnf, SA cannot reach a
solution in poly-time

17
Analysis, cont.
c1
c2
c3
…
cn
a
T
T
T
…
T
T
F
F
F
…
F
T
F
F
F
…
F
F
Proposition 2:
pure RW
reaches this
solution with
exp. small
18
prob.
SampleSat

In SampleSat algorithm, we can devide the
search into 2 stages. Before SampleSat
reaches its first solution, it behaves like
WalkSat.
instance
WalkSat
SampleSat
SA
random
382
677
24667
logistics
5.7  104
15.5  105
> 109
verification
36
65
10821
19
SampleSat, cont.

After reaching the solution, random walk
component is turned off because all clauses
are satisfied. SampleSat behaves like SA.
 Proposition 3 SA at zero temperature
samples all solutions within a cluster
uniformly.
 This 2-stage model explains why SampleSat
samples more uniformly than random walk
algorithms alone.
20
Verification on Larger formulas ApproxCount

Small formulas -> Figures, solution
frequencies. How to verify on large formulas?
ApproxCount.
 ApproxCount approximates the number of
solutions of Boolean formulas, based on
SampleSat algorithm.
 Besides using it to justify the accuracy of our
sampling approach, ApproxCount is
interesting on its own right.
21
Algorithm

The algorithm works as follows (Jerrum
and
Valiant, 1986):
1.
2.
3.
4.
5.
Pick a variable X in current formula
Draw K samples from the solution space
Set variable X to its most sampled value t,
and the multiplier for X is K/#(X=t).
Note 1  multiplier  2
Repeat step 1-3 until all variables are set
The number of solutions of the original
formula is the product of all multipliers.
22
Accumulation of Errors
#variables
Sample error
Overall error
200
10%
1%
1.9  105
7.3
400
10%
1%
3.6  1016
53.5
800
10%
1%
1.3  1033
2865
23
Within the Capacity of Exact
Counters

We compare the results of approxcount with those of the exact
counters.
instances
#variables Exact
count
ApproxCount Average
Error
prob004-log-a
1790
2.6  1016 1.4  1016
0.03%
wff.3.200.810
200
3.6  1012 3.0  1012
0.09%
dp02s02.shuffled 319
1.5  1025 1.2  1025
0.07%
24
And beyond …

We developed a family of formulas
whose solutions are hard to count
– The formulas are based on SAT encodings
of the following combinatorial problem
– If one has n different items, and you want
to choose from the n items a list (order
matters) of m items (m<=n). Let P(n,m)
represent the number of different lists you
can construct. P(n,m) = n!/(n-m)!
25
Hard Instances


Encoding of P(20,10) has only 200 variables, but
neither cachet or Relsat was able to count it in 5 days
in our experiments.
On the other hard, ApproxCount is able to finish in 2
hours, and estimates the solutions of even larger
instances.
instance
#variables
#solutions ApproxCount
Average
Error
P(30,20)
600
7  1025
7  1024
0.4%
P(20,10)
200
7  1011
2  1011
0.6%
26
Summary
Small formulas -> complete analysis of
the search space
 Larger formulas -> compare
ApproxCount results with results of
exact counting procedures
 Harder formulas -> handcraft formulas
compare with analytic results

27
Conclusion and Future Work
Shows good opportunity to extend
SAT solvers to develop algorithms
for sampling and counting tasks.
 Next step: Use our methods in
probabilistic reasoning and
Bayesian inference domains.

28
Download