slides - Cornell University

advertisement
An Empirical Study of Optimal Noise and
Runtime Distributions in Local Search
Lukas Kroc, Ashish Sabharwal, Bart Selman
Cornell University, USA
SAT 2010 Conference
Edinburgh, July 2010
Presented by:
Holger H. Hoos
Local Search Methods for SAT

A lot is known about Stochastic Local Search (SLS) methods
[e.g. Hoos-Stutzle ’04], especially their behavior on random 3-SAT



Along with systematic search, the main SAT solution paradigm
Walksat one of the first widely successful local search solver
 Biased random walk
 Combines greedy moves (downhill) with stochastic moves (possibly
uphill) controlled by a “noise” parameter [0% .. 100%]
Yet, new surprising findings are still being discovered

Part of this work motivated by the following observation:
Empirical evidence that Walksat's running time on large, random,
3-SAT instances is quite predictable, and scales linearly
with number of variables for a specific setting of the noise parameter
[Seitz-Alava-Orponen 2005]
Optimal Noise and Runtime Distributions in Local Search
2
Our Motivation
Our work looks at Walksat again, on large, random, 3-SAT formulas, and
seeks answers to two questions:
A.
Can we further characterize the “optimal noise” and the linear scaling
behavior of Walksat?
• Key parameter: the clause-to-variable ratio, α
B.
How do runtime distributions of Walksat behave at sub-optimal noise?
• Are they concentrated around the mean or do they have “heavy
tails” similar to complete search methods?
• Heavy tails  very long runs more likely than we might expect
• Heavy tails not reported in local search so far
Note: Walksat still faster than current adaptive, dynamic noise solvers on these
formulas; studying behavior at optimal static noise of much interest
Optimal Noise and Runtime Distributions in Local Search
3
Summary of Results
Walksat on large, random, 3-SAT formulas:
A.
Further characterization the “optimal noise” and linear scaling:
 A detailed analysis, showing a piece-wise linear fit for optimal noise
as a function of α, with transitions at interesting points
(extending the previous observation that ~57% is optimal for α=4.2)
Simple inverse polynomial dependence of runtime on α

B.
Runtime distributions of Walksat behave at sub-optimal noise
 Exponential decay in the high noise regime
 Heavy tails in the low noise regime
First quantitative observation of heavy tails in local search
[earlier insights: Hoos-Stutzle 2000]

Preliminary Markov Chain model
Optimal Noise and Runtime Distributions in Local Search
4
A. Further Study of Optimal Noise
and Linear Scaling
Optimal Noise Setting vs. α
Question:
How does the optimal noise setting vary with α and N?
Experiment:
 For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K]
 For each, find the noise setting where Walksat is the fastest (binary search)
 Average these optimal noise settings and plot against α
Optimal Noise and Runtime Distributions in Local Search
6
Optimal Noise Setting vs. α
Data with 1 standard deviation bars
 Optimal noise depends significantly on α
(e.g., ~46% at α=3.9; ~57% at α=4.2)
 Very good piece-wise linear fit
 Transitions at interesting places:
• α≈3: up to which generalized unit
clause (GUC) rule works almost
surely [Frieze-Suen 1996]
• α≈3.9: up to which greedy Walksat
(GSAT) works (also where “clustering
Generalized Unit Clause
heuristic works till here
Greedy Walksat
(GSAT) works till here
structure” of the solution space is
believed to change drastically: from one
giant cluster to exponentially many small
ones [Mezard-Mora-Zecchina 2005])
Optimal Noise and Runtime Distributions in Local Search
7
Linear Scaling at Optimal Noise

Experiment:
 For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K]
 Measure Walksat's runtime with optimal noise (#flips till solution found)
 Plot #flips/N against α (one point per run, no averaging)

Results: Inverse polynomial fit of #flips/N as a function of α
 Suggesting linear scaling for α < 4.235
[fig explained in paper]
Points with varying N
fall on each other after
rescaling by N,
showing linearity wrt N
Optimal Noise and Runtime Distributions in Local Search
8
B. Runtime Distribution of
Local Search Methods
Standard vs. Heavy Tailed Distributions

Standard distributions:

Exponential Decay
Exponential or faster decay
e.g., Normal distribution
Standard Distribution
(finite mean & variance)

Heavy-tailed distributions:

Power Law Decay
Power law decay
e.g. Pareto-Levy distribution
Optimal Noise and Runtime Distributions in Local Search
10
Heavy Tailed Distributions
Heavy-tailed distributions:
Power Law Decay

Power law decay
e.g. Pareto-Levy distribution

Signature: tail of the distribution is a line in log-log plot

Observed in systematic search solvers
 Mechanism well-understood in terms of “bad” variable assignments that
are hard to recover from [Gomes, Kautz and Selman ‘99, ’00]
 Motivated key techniques such as search restarts, algorithm portfolios

Not previously observed in studies on local search methods
Optimal Noise and Runtime Distributions in Local Search
11
Runtime Distributions of Walksat
Experiment:
 Generate a random 3-SAT formula with N=100K at α=4.2
 Large formulas, free of small size effects
 Very hard to solve
 Still less constrained than formulas at the phase transition (α4.26)
 Run 100K (!) runs of Walksat with noise settings around the optimal
 Plot the runtime distribution: probability of failure to find a solution
as a function of #flips
Optimal Noise and Runtime Distributions in Local Search
12
Runtime Distributions of Walksat
[Setting: Large, random, 3-SAT formula with α=4.2]
Summary of Results:
 There is a qualitative difference between noise higher that optimal (>56.7%)
and lower that optimal (<56.7%)
 High noise regime: tail of P[failure] has an exponential distribution
 Low noise regime : tail of P[failure] has a power-law distribution
 Intuition captured by a (preliminary) Markov Chain model
 High noise means “guessing the solution”
 Low noise (too greedy) leads the search into “local traps”
 Optimal noise is where the two effects balance
Optimal Noise and Runtime Distributions in Local Search
13
Heavy-Tails in Low Noise Regimes
LOG-LOG scale  straight line = power-law decay
100K data points plotted per curve;
actual data points, no fitting;
Not all data points marked
with o, x, +, etc. for clarity
Last 5% of tail (5K points)
Linear slope = 0.38
14
Heavy-Tails in Low Noise Regimes
Same data as previous plot, but with all 100K data points (per curve) marked
with o, x, +, etc., and full y-axis. As before, actual data points, no fitting.
Optimal Noise and Runtime Distributions in Local Search
15
Qualitative Contrast: High vs. Low Noise Regimes
LOG-LOG scale  straight line = power-law decay
High Noise
Not straight lines  not heavy tailed.
In fact, log-linear plot reveals
a clear exponential tail
Low Noise
Line  heavy tailed.
extremely long runs are much more likely than one might expect!
Optimal Noise and Runtime Distributions in Local Search
16
Understanding Variation with Noise Level
and Power-Law Decay: Preliminary Insights
Different “Search” at High, Low, Opt Noise
Experiment:
 Run Walksat at different noise levels on a formula with 100K vars, 420K clauses
 Plot how the number of unsatisfied clauses evolves as the search progresses
(0 on y-axis = solution)
High noise: search “stuck”
at a relatively high value
Optimal noise: a gradual
descent until solution found
Low noise: #unsat clauses
decreases fast but gets “stuck”
at a relatively low value
Optimal Noise and Runtime Distributions in Local Search
18
Markov Chain Model Capturing
Power-Law Decay
(preliminary)
[details omitted; refer to paper.
Similar to work of Hoos ’02]
Key features:

States represent (roughly) the
number of unsatisfied clauses;
left-most state = all solutions

Ladder structures capture falling
into a “trap”; the farther it keeps
falling, the harder it gets to recover
(recovery time = hitting time of a biased 1-dimensional Markov Chain)
Optimal Noise and Runtime Distributions in Local Search
19
Markov Chain Model Capturing
Power-Law Decay
(preliminary)
[details omitted; refer to paper.
Similar to work of Hoos ’02]
In the horizontal part of the chain:

High noise: avoids traps but attraction towards the top-middle node;
exponential time to convergence, very concentrated around the mean
 Low noise: leftward drift but good chance of falling into a trap;
exponential time to convergence but power-law decay
Optimal Noise and Runtime Distributions in Local Search
20
Summary
A.
Further study of optimal noise for Walksat
depends on the clause-to-variable ratio, α, in piece-wise linear fashion
with transitions at interesting points
 allows for a simple inverse polynomial fit for the linearity constant

B.
Runtime distributions in local search

drastic change in behavior below and above optimal noise
 exponential decay for higher-than-optimal noise
 power-law decay (heavy tails) for lower-than-optimal noise
Future directions:

A better understanding of when heavy tails appear and when they don’t

Improved model capturing heavy tails in local search

Ways of utilizing these insights to improve local search solvers
(similar to restarts and algorithm portfolios for complete search)
Optimal Noise and Runtime Distributions in Local Search
21
Download