Towards Efficient Sampling: Exploiting Random Walk Strategy 1

Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman 1 Motivations  Recent years have seen tremendous improvements in SAT solving. Formulas with up to 300 variables (1992) to formulas with one million variables.  Various techniques for answering “does a satisfying assignment exist for a formula?”  But there are harder questions to be answered . “how many satisfying assignments does a formula have?” Or closely related “can we sample from the satisfying assignments of a formula?” 2 Complexity SAT is NP-complete. 2-SAT is solvable in linear time.  Counting assignments (even for 2cnf) is #P-complete, and is NP-hard to approximate (Valiant, 1979).  Approximate counting and sampling are equivalent if the problem is “downward self-reducible”.  3 Challenge  Can we extend SAT techniques to solve harder counting/sampling problems?  Such an extension would lead us to a wide range of new applications. SAT testing logic inference counting/sampling probabilistic reasoning 4 Standard Methods for Sampling MCMC Based on setting up a Markov chain with a predefined stationary distribution.  Draw samples from the stationary distribution by running the Markov chain for sufficiently long.  Problem: for interesting problems, Markov chain takes exponential time to converge to its stationary distribution  5 Simulated Annealing  Simulated Annealing uses Boltzmann distribution as the stationary distribution.  At low temperature, the distribution concentrates around minimum energy states.  In terms of satisfiability problem, each satisfying assignment (with 0 cost) gets the same probability.  Again, reaching such a stationary distribution takes exponential time for interesting problems. – shown in a later slide. 6 Standard Methods for Counting Current solution counting procedures extend DPLL methods with component analysis.  Two counting precedures are available. relsat (Bayardo and Pehoushek, 2000) and cachet (Sang, Beame, and Kautz, 2004). They both count exact number of solutions.  7  Question: Can state-of-the-art local search procedures be used for SAT sampling/counting? (as alternatives to standard Monte Carlo Markov Chain and DPLL methods) Yes! Shown in this talk 8 Our approach – biased random walk Biased random walk = greedy bias + pure random walk. Example: WalkSat (Selman et al, 1994), effective on SAT.  Can we use it to sample from solution space?  – Does WalkSat reach all solutions? – How uniform is the sampling? 9 WalkSat visited 500,000 times visited 60 times Hamming distance 10 Probability Ranges in Different Domains Instance Runs Hits Rarest Hits Common Common-to -Rare Ratio Random 50  106 53 9  105 1.7  104 Logistics 1  106 84 4  103 50 Verif. 1  106 45 318 7 11 Improving the Uniformity of Sampling Nonergodic Ergodic Ergodic Quickly reach sinks Slow convergence WalkSat  + SA = Does not satisfy DBC SampleSat SampleSat: – With probability p, the algorithm makes a biased random walk move – With probability 1-p, the algorithm makes a SA (simulated annealing) move 12 Comparison Between WalkSat and SampleSat WalkSat SampleSat 10 104 13 SampleSat 14 Hamming Distance Instance Runs Hits Rarest Hits Common Common-to -Rare Ratio WalkSat Ratio SampleSat Random 50  106 53 9  105 1.7  104 10 Logistics 1  106 84 4  103 50 17 Verif. 1  106 45 318 7 4 15 Analysis c1 c2 c3 … cn a b F F F … F F F F F F … F F T 16 Property of F* Proposition 1 SA with fixed temperature takes exponential time to find a solution of F*  This shows even for some simple formulas in 2cnf, SA cannot reach a solution in poly-time  17 Analysis, cont. c1 c2 c3 … cn a T T T … T T F F F … F T F F F … F F Proposition 2: pure RW reaches this solution with exp. small 18 prob. SampleSat  In SampleSat algorithm, we can devide the search into 2 stages. Before SampleSat reaches its first solution, it behaves like WalkSat. instance WalkSat SampleSat SA random 382 677 24667 logistics 5.7  104 15.5  105 > 109 verification 36 65 10821 19 SampleSat, cont.  After reaching the solution, random walk component is turned off because all clauses are satisfied. SampleSat behaves like SA.  Proposition 3 SA at zero temperature samples all solutions within a cluster uniformly.  This 2-stage model explains why SampleSat samples more uniformly than random walk algorithms alone. 20 Verification on Larger formulas ApproxCount  Small formulas -> Figures, solution frequencies. How to verify on large formulas? ApproxCount.  ApproxCount approximates the number of solutions of Boolean formulas, based on SampleSat algorithm.  Besides using it to justify the accuracy of our sampling approach, ApproxCount is interesting on its own right. 21 Algorithm  The algorithm works as follows (Jerrum and Valiant, 1986): 1. 2. 3. 4. 5. Pick a variable X in current formula Draw K samples from the solution space Set variable X to its most sampled value t, and the multiplier for X is K/#(X=t). Note 1  multiplier  2 Repeat step 1-3 until all variables are set The number of solutions of the original formula is the product of all multipliers. 22 Accumulation of Errors #variables Sample error Overall error 200 10% 1% 1.9  105 7.3 400 10% 1% 3.6  1016 53.5 800 10% 1% 1.3  1033 2865 23 Within the Capacity of Exact Counters  We compare the results of approxcount with those of the exact counters. instances #variables Exact count ApproxCount Average Error prob004-log-a 1790 2.6  1016 1.4  1016 0.03% wff.3.200.810 200 3.6  1012 3.0  1012 0.09% dp02s02.shuffled 319 1.5  1025 1.2  1025 0.07% 24 And beyond …  We developed a family of formulas whose solutions are hard to count – The formulas are based on SAT encodings of the following combinatorial problem – If one has n different items, and you want to choose from the n items a list (order matters) of m items (m<=n). Let P(n,m) represent the number of different lists you can construct. P(n,m) = n!/(n-m)! 25 Hard Instances   Encoding of P(20,10) has only 200 variables, but neither cachet or Relsat was able to count it in 5 days in our experiments. On the other hard, ApproxCount is able to finish in 2 hours, and estimates the solutions of even larger instances. instance #variables #solutions ApproxCount Average Error P(30,20) 600 7  1025 7  1024 0.4% P(20,10) 200 7  1011 2  1011 0.6% 26 Summary Small formulas -> complete analysis of the search space  Larger formulas -> compare ApproxCount results with results of exact counting procedures  Harder formulas -> handcraft formulas compare with analytic results  27 Conclusion and Future Work Shows good opportunity to extend SAT solvers to develop algorithms for sampling and counting tasks.  Next step: Use our methods in probabilistic reasoning and Bayesian inference domains.  28

Towards Efficient Sampling: Exploiting Random Walk Strategy 1

Related documents

Products

Support

Towards Efficient Sampling: Exploiting Random Walk Strategy 1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib