Formal Models of Heavy-Tailed Behavior in Combinatorial Search

advertisement
Formal Models of Heavy-Tailed Behavior
in Combinatorial Search
Hubie Chen, Carla P. Gomes, and Bart Selman
{hubes,gomes,selman}@cs.cornell.edu
Department of Computer Science
Cornell University
1
CP - 2001
Background
Randomized backtrack search methods
demonstrate high variability of run time
(relative to fixed instance):
Heavy-tailed behavior
(Gomes et. al. CP ‘97, JAR ‘00)
New insights into the the design of search
algorithms  restart strategies
Randomization and restart strategies are now an integral
part of state-of-the-art SAT Solvers
(Chaff, GRASP, RELSAT, SATZ-Rand)
2
CP - 2001
Goals
Research on heavy-tails in search thus far
largely based on empirical studies.
Our goals:
 Formal analysis of tree search models: show
under what conditions heavy-tailed distributions
can and cannot arise.
 Understand when restart strategies are/are not
effective.
3
CP - 2001
Intuition
How does heavy-tailed behavior arise?
• The procedure is characterized by a large variability,
which leads to highly different trees from run to run.
• Wrong branching decisions may lead the search
procedure to explore exponentially large subtrees of the
search space containing no solutions.
• A lucky sequence of good branching decisions may lead
the search to find a solution after exploring only a small
subtree.
4
CP - 2001
Intuition Pump: Restarts
When are restarts effective?
Suppose a search procedure requires
(on inputs of size n):
• Time p(n) (for a polynomial p) with probability ½
• Time 2^n with probability ½
No restarts:
expected time exponential: equal to ½ * (p(n) + 2^n)
Restart with time interval p(n):
expected time drops to polynomial: equal to 2*p(n)
5
CP - 2001
Outline of Talk
• Empirical evidence of Heavy-Tailed behavior
• Tree Search Models
• Balanced Tree Search Model
• Imbalanced Tree Search Model
• Bounded Heavy-Tailed Behavior: finite
distributions
6
CP - 2001
Empirical Evidence
of Heavy-Tailed Behavior
7
CP - 2001
Quasigroups or Latin Squares:
An Abstraction for Real World Applications
Quasigroup or
Latin Square
(Order 4)
A quasigroup is an n-by-n matrix such
that each row and column is a
permutation of the same n colors
32% preassignment
Gomes and Selman 96
8
CP - 2001
Randomized Backtrack Search
Easy instance – 15 % preassigned cells
Time:
(*) no
7
11
30
(*)
(*)
solution found - reached cutoff: 2000
9
Gomes et al. 97
CP - 2001
Erratic Behavior of Search Cost
Quasigroup Completion Problem
3500!
sample
mean
2000
Median = 1!
500
number of runs
10
CP - 2001
Heavy-Tailed Distributions
11
CP - 2001
Heavy-Tailed Distributions
•
Infinite variance, infinite mean
•
Introduced by Pareto in the 1920’s --- “probabilistic curiosity.”
•
Mandelbrot established the use of heavy-tailed distributions to
model real-world fractal phenomena.
•
Examples: stock-market, earthquakes, weather, web traffic...
12
CP - 2001
Decay of Distributions
Standard
Exponential Decay
e.g. Normal:
Pr[ X  x]  Ce x2,
for some C  0
Exponential Decay
Standard Distribution
(finite mean & variance)
Heavy-Tailed
Power Law Decay
e.g. Pareto-Levy:
Power Law Decay
Pr[ X  x] Cx  , x  0
13
CP - 2001
Visualization of Heavy Tailed
Behavior
 1
1 2

infinite mean and infinite
variance
infinite variance
  0.153
Unsolved fraction
Slope gives value of
(1-F(x))(log)
Log-log plot of tail of distribution
should be approximately linear.
  0.319
18%
unsolved
  0.466
 1
=> Infinite mean
0.002%
unsolved
Number backtracks (log)
14
CP - 2001
Heavy Tailed behavior has been
observed in several domains:
QCP, Graph Coloring, Planning,
Scheduling, Circuit synthesis,
Decoding, etc.
Consequence for algorithm
design:
Use restarts or parallel
/ interleaved runs to
exploit the extreme
variance performance.
1-F(x)
Unsolved fraction
Exploiting Heavy-Tailed Behavior
70%
unsolved
0.001%
unsolved
250 (62
restarts)
Number backtracks (log)
Restarts provably eliminate
15
heavy-tailed behavior (Gomes et al. 2000) CP - 2001
Tree Search Models:
Balanced Tree Model
16
CP - 2001
Balanced Tree Model, Described
Trees
 All leaves occur at the same depth
 Branching factor 2
 Exactly one “satisfying” leaf
Search algorithm
 Chronological backtrack search model
 Random child selection with no propagation mechanisms
17
CP - 2001
Balanced Tree Model: Analysis
Let T (n) denote the runtime: number of
leaf nodes visited (including “satisfying”
leaf), on tree of depth n.
Let X i denote choice at (unique) node
above satisfying leaf at depth i :
1 = bad choice, 0 = good choice
Then,
T (n)  X 2n1  X 2ni  X n 20 1
i
1
T=4
There is exactly one choice of zero-one
assignments to the variables for each
possible value of T(n); any such
assignment has probability
n
1
 
 2
 
T(n) has an uniform distribution.
n

1
P[T (n)  i]   , i 1,,2n
2




T=64
18
CP - 2001
Balanced Tree Model:
Distribution
n
1

2
E[T (n)]
2
















V [T (n)]









2
n
2 1
12









• The expected run time and variance scale
exponentially, in the height of the search tree
(number of variables);
• The run time distribution is uniform -shape not heavy tailed.
(see paper for formal proofs)
19
CP - 2001
Balanced Tree Model: Restarts
Restart strategies are not effective for this model:
no restart strategy with expected polynomial time.
Define a restart strategy to be a sequence of times
t (n),t (n),t (n),...
1
2
3
Applied to a search procedure by running procedure
t (n)
for time 1
; restarting and running for time t2 (n) , etc.,
until solution found.
Luby et al. (IPL ‘93) show that optimal performance
(minimum expectation) obtained by a purely uniform
restart strategy:
t (n)  t (n)  t (n)  ...
1
2
3
20
CP - 2001
Balanced Tree Model
What sort of improvements can be made to an
algorithm so that behavior not like backtrack in
balanced tree model?
 Very clever search heuristics that lead quickly to the
solution node - but that is hard in general
 Combination of pruning, propagation, dynamic
variable ordering: prune subtrees that do not
contain the solution, allowing for runs that are
short.
Resulting trees may vary dramatically from run to
run.
21
CP - 2001
Tree Search Models:
Imbalanced Tree Model
22
CP - 2001
Imbalanced Tree Model
b=2
Algorithm requires time b^i
with probability (1-p)p^i
Intuition: lower p corresponds to
“smarter” search
Let T denote the runtime of the algorithm:
the number of leaf nodes visited up to and including the successful
node.
P[T bi ] (1 p) pi
(i  0)
23
CP - 2001
Imbalanced Tree Model
24
CP - 2001
Imbalanced Tree Model:
Three Regimes of Behavior
p 1
b2
(see paper for formal proofs)
Regime 1:
finite expected time, finite variance
1  p 1
b
b2
Regime 2:
finite expected time, infinite variance
p 1
b
Regime 3:
infinite expected time, infinite variance
log p
Tail: P[T  L ] p2 L
b C L
p 1
when
we have   2
2
b
25
CP - 2001
Bounded Imbalanced Tree
Model
26
CP - 2001
Bounded Imbalanced Tree
Model
Unbounded model
Single infinite distribution.
P[ T  bi ]  (1 p) pi i  0
Bounded model
Infinite number of distributions, one for each n.
Arises from truncating successively larger finite segments
of unbounded distribution.
Given that:
We define:
n
i
(
1

p
)
p
1 p n 1

i 0
i
(1 p) p
P[ T  bi ] 
 Cn pi
n 1
1 p
with
i  0,1, , n
Cn  1 p
1 p n 1
27
CP - 2001
Bounded Imbalanced Tree Model:
Three Regimes of Behavior
p 1
b2
(see paper for formal proofs)
Regime 1:
polynomial expected time, polynomial variance
1  p 1
b
b2
Regime 2:
polynomial expected time, exponential variance
p 1
b
Regime 3:
exponential expected time, exponential variance
Restart strategy - Expected polynomial time
28
CP - 2001
Bounded Heavy-Tailed Behavior
29
CP - 2001
Balanced, Unbounded, and
Imbalanced Trees
30
CP - 2001
Conclusions
31
CP - 2001
Conclusions
Heavy-tailed behavior yields insight into backtrack search methods,
providing an explanation for the effectiveness of restart strategies.
Tree Search Models: can be analyzed rigorously.
• Balanced Tree Search Model
Uniform distribution (not heavy-tailed);
restarts are not effective
• Imbalanced Tree Search Model (Bounded/Unbounded)
Heavy-tailed; restarts are effective
Consequence for algorithm design: aim for strategies which have
highly asymmetric distributions.
32
CP - 2001
Demos, papers, etc.
www.cs.cornell.edu/hubes
www.cs.cornell.edu/gomes
Check also:
www.cis.cornell.edu/iisi
33
CP - 2001
Download