An Empirical Study of Optimal Noise and Runtime Distributions in Local Search Lukas Kroc, Ashish Sabharwal, Bart Selman Cornell University, USA SAT 2010 Conference Edinburgh, July 2010 Presented by: Holger H. Hoos Local Search Methods for SAT A lot is known about Stochastic Local Search (SLS) methods [e.g. Hoos-Stutzle ’04], especially their behavior on random 3-SAT Along with systematic search, the main SAT solution paradigm Walksat one of the first widely successful local search solver Biased random walk Combines greedy moves (downhill) with stochastic moves (possibly uphill) controlled by a “noise” parameter [0% .. 100%] Yet, new surprising findings are still being discovered Part of this work motivated by the following observation: Empirical evidence that Walksat's running time on large, random, 3-SAT instances is quite predictable, and scales linearly with number of variables for a specific setting of the noise parameter [Seitz-Alava-Orponen 2005] Optimal Noise and Runtime Distributions in Local Search 2 Our Motivation Our work looks at Walksat again, on large, random, 3-SAT formulas, and seeks answers to two questions: A. Can we further characterize the “optimal noise” and the linear scaling behavior of Walksat? • Key parameter: the clause-to-variable ratio, α B. How do runtime distributions of Walksat behave at sub-optimal noise? • Are they concentrated around the mean or do they have “heavy tails” similar to complete search methods? • Heavy tails very long runs more likely than we might expect • Heavy tails not reported in local search so far Note: Walksat still faster than current adaptive, dynamic noise solvers on these formulas; studying behavior at optimal static noise of much interest Optimal Noise and Runtime Distributions in Local Search 3 Summary of Results Walksat on large, random, 3-SAT formulas: A. Further characterization the “optimal noise” and linear scaling: A detailed analysis, showing a piece-wise linear fit for optimal noise as a function of α, with transitions at interesting points (extending the previous observation that ~57% is optimal for α=4.2) Simple inverse polynomial dependence of runtime on α B. Runtime distributions of Walksat behave at sub-optimal noise Exponential decay in the high noise regime Heavy tails in the low noise regime First quantitative observation of heavy tails in local search [earlier insights: Hoos-Stutzle 2000] Preliminary Markov Chain model Optimal Noise and Runtime Distributions in Local Search 4 A. Further Study of Optimal Noise and Linear Scaling Optimal Noise Setting vs. α Question: How does the optimal noise setting vary with α and N? Experiment: For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K] For each, find the noise setting where Walksat is the fastest (binary search) Average these optimal noise settings and plot against α Optimal Noise and Runtime Distributions in Local Search 6 Optimal Noise Setting vs. α Data with 1 standard deviation bars Optimal noise depends significantly on α (e.g., ~46% at α=3.9; ~57% at α=4.2) Very good piece-wise linear fit Transitions at interesting places: • α≈3: up to which generalized unit clause (GUC) rule works almost surely [Frieze-Suen 1996] • α≈3.9: up to which greedy Walksat (GSAT) works (also where “clustering Generalized Unit Clause heuristic works till here Greedy Walksat (GSAT) works till here structure” of the solution space is believed to change drastically: from one giant cluster to exponentially many small ones [Mezard-Mora-Zecchina 2005]) Optimal Noise and Runtime Distributions in Local Search 7 Linear Scaling at Optimal Noise Experiment: For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K] Measure Walksat's runtime with optimal noise (#flips till solution found) Plot #flips/N against α (one point per run, no averaging) Results: Inverse polynomial fit of #flips/N as a function of α Suggesting linear scaling for α < 4.235 [fig explained in paper] Points with varying N fall on each other after rescaling by N, showing linearity wrt N Optimal Noise and Runtime Distributions in Local Search 8 B. Runtime Distribution of Local Search Methods Standard vs. Heavy Tailed Distributions Standard distributions: Exponential Decay Exponential or faster decay e.g., Normal distribution Standard Distribution (finite mean & variance) Heavy-tailed distributions: Power Law Decay Power law decay e.g. Pareto-Levy distribution Optimal Noise and Runtime Distributions in Local Search 10 Heavy Tailed Distributions Heavy-tailed distributions: Power Law Decay Power law decay e.g. Pareto-Levy distribution Signature: tail of the distribution is a line in log-log plot Observed in systematic search solvers Mechanism well-understood in terms of “bad” variable assignments that are hard to recover from [Gomes, Kautz and Selman ‘99, ’00] Motivated key techniques such as search restarts, algorithm portfolios Not previously observed in studies on local search methods Optimal Noise and Runtime Distributions in Local Search 11 Runtime Distributions of Walksat Experiment: Generate a random 3-SAT formula with N=100K at α=4.2 Large formulas, free of small size effects Very hard to solve Still less constrained than formulas at the phase transition (α4.26) Run 100K (!) runs of Walksat with noise settings around the optimal Plot the runtime distribution: probability of failure to find a solution as a function of #flips Optimal Noise and Runtime Distributions in Local Search 12 Runtime Distributions of Walksat [Setting: Large, random, 3-SAT formula with α=4.2] Summary of Results: There is a qualitative difference between noise higher that optimal (>56.7%) and lower that optimal (<56.7%) High noise regime: tail of P[failure] has an exponential distribution Low noise regime : tail of P[failure] has a power-law distribution Intuition captured by a (preliminary) Markov Chain model High noise means “guessing the solution” Low noise (too greedy) leads the search into “local traps” Optimal noise is where the two effects balance Optimal Noise and Runtime Distributions in Local Search 13 Heavy-Tails in Low Noise Regimes LOG-LOG scale straight line = power-law decay 100K data points plotted per curve; actual data points, no fitting; Not all data points marked with o, x, +, etc. for clarity Last 5% of tail (5K points) Linear slope = 0.38 14 Heavy-Tails in Low Noise Regimes Same data as previous plot, but with all 100K data points (per curve) marked with o, x, +, etc., and full y-axis. As before, actual data points, no fitting. Optimal Noise and Runtime Distributions in Local Search 15 Qualitative Contrast: High vs. Low Noise Regimes LOG-LOG scale straight line = power-law decay High Noise Not straight lines not heavy tailed. In fact, log-linear plot reveals a clear exponential tail Low Noise Line heavy tailed. extremely long runs are much more likely than one might expect! Optimal Noise and Runtime Distributions in Local Search 16 Understanding Variation with Noise Level and Power-Law Decay: Preliminary Insights Different “Search” at High, Low, Opt Noise Experiment: Run Walksat at different noise levels on a formula with 100K vars, 420K clauses Plot how the number of unsatisfied clauses evolves as the search progresses (0 on y-axis = solution) High noise: search “stuck” at a relatively high value Optimal noise: a gradual descent until solution found Low noise: #unsat clauses decreases fast but gets “stuck” at a relatively low value Optimal Noise and Runtime Distributions in Local Search 18 Markov Chain Model Capturing Power-Law Decay (preliminary) [details omitted; refer to paper. Similar to work of Hoos ’02] Key features: States represent (roughly) the number of unsatisfied clauses; left-most state = all solutions Ladder structures capture falling into a “trap”; the farther it keeps falling, the harder it gets to recover (recovery time = hitting time of a biased 1-dimensional Markov Chain) Optimal Noise and Runtime Distributions in Local Search 19 Markov Chain Model Capturing Power-Law Decay (preliminary) [details omitted; refer to paper. Similar to work of Hoos ’02] In the horizontal part of the chain: High noise: avoids traps but attraction towards the top-middle node; exponential time to convergence, very concentrated around the mean Low noise: leftward drift but good chance of falling into a trap; exponential time to convergence but power-law decay Optimal Noise and Runtime Distributions in Local Search 20 Summary A. Further study of optimal noise for Walksat depends on the clause-to-variable ratio, α, in piece-wise linear fashion with transitions at interesting points allows for a simple inverse polynomial fit for the linearity constant B. Runtime distributions in local search drastic change in behavior below and above optimal noise exponential decay for higher-than-optimal noise power-law decay (heavy tails) for lower-than-optimal noise Future directions: A better understanding of when heavy tails appear and when they don’t Improved model capturing heavy tails in local search Ways of utilizing these insights to improve local search solvers (similar to restarts and algorithm portfolios for complete search) Optimal Noise and Runtime Distributions in Local Search 21