Distributions of Randomized Backtrack Search

advertisement
Distributions of Randomized
Backtrack Search
• Key Properties:
• I Erratic behavior of mean
• II Distributions have “heavy tails”.
Erratic Behavior of Search Cost
Quasigroup Completion Problem
3500!
sample
mean
2000
Median = 1!
500
number of runs
1
Proportion of cases Solved
75%<=30
Number backtracks
5%>100000
Number backtracks
Heavy-Tailed
Distributions
• … infinite variance … infinite mean
• Introduced by Pareto in the 1920’s
• --- “probabilistic curiosity.”
• Mandelbrot established the use of
heavy-tailed distributions to model
real-world fractal phenomena.
• Examples: stock-market, earth-quakes,
weather,...
Decay of Distributions
• Standard --- Exponential Decay
•
e.g. Normal:
•
Pr[ X  x] Ce  x2, for some C  0, x 1
• Heavy-Tailed --- Power Law Decay
•
e.g. Pareto-Levy:
•
•
Pr[ X  x] Cx  , x  0
Power Law Decay
Exponential Decay
Standard Distribution
(finite mean & variance)
Normal, Cauchy, and Levy
Cauchy -Power law Decay
Levy -Power law Decay
Normal - Exponential Decay
Tail Probabilities
(Standard Normal, Cauchy, Levy)
•c
Normal
0
0.5
1
0.1587
2
0.0228
3
0.001347
4 0.00003167
Cauchy Levy
0.5
1
0.25
0.6827
0.1476
0.5205
0.1024
0.4363
0.078
0.3829
Example of Heavy Tailed Model
(Random Walk)
•Random Walk:
•Start at position 0
•Toss a fair coin:
–with each head take a step up (+1)
–with each tail take a step down (-1)
X --- number of steps the random walk takes
to return to position 0.
Zero crossing
Long periods without
zero crossing
The record of 10,000 tosses of an ideal coin
(Feller)
Heavy-tails vs. Non-Heavy-Tails
1-F(x)
Unsolved fraction
50%
Random Walk
Median=2
Normal
(2,1000000)
O,1%>200000
Normal
(2,1)
2
X - number of steps the walk takes to return to zero (log scale)
How to Check for “Heavy
Tails”?
• Log-Log plot of tail of distribution
• should be approximately linear.
• Slope gives value of 
•
•
•
 1
1  2
infinite mean and infinite variance
infinite variance
(1-F(x))(log)
Unsolved fraction
Heavy-Tailed Behavior in QCP Domain
  0.153
  0.319
18%
unsolved
  0.466
 1 => Infinite mean
Number backtracks (log)
0.002%
unsolved
Formal Models of HeavyTailed Behavior in
Combinatorial Search
Chen, Gomes, Selman 2001
Motivation
• Research on heavy-tails has been largely
based on empirical studies of run time
distribution.
• Goal: to provide a formal characterization of
tree search models and show under what
conditions heavy-tailed distributions can
arise.
• Intuition: Heavy-tailed behavior arises:
Balanced vs. Imbalanced
Tree Model
• Balanced Tree Model:
• chronological backtrack search model;
• fixed variable ordering;
• random child selection with no propagation
mechanisms;
(show demo)
n
1

2
E[T (n)]
2

























2n 1
2
V [T (n)]
12









The run time distribution of chronological backtrack search on
a complete balanced tree is uniform (therefore not heavy-tailed).
Both the expected run time and variance scale exponentially
Balanced Tree Model
n
1

2
E[T (n)]
2
















V [T (n)]









2
n
2 1
12









– The expected run time and variance scale exponentially, in the
height of the search tree (number of variables);
– The run time distribution is Uniform, (not heavy tailed ).
– Backtrack search on balanced tree model has no restart strategy
with exponential polynomial time.
Chen, Gomes & Selman 01
• How can we improve on the balanced serach
tree model?
•
Very clever search heuristic that leads
quickly to the solution node - but that is hard
in general;
•
Combination of pruning, propagation,
dynamic variable ordering that prune subtrees
that do not contain the solution, allowing for
runs that are short.
• ---> resulting trees may vary dramatically
from run to run.
Formal Model Yielding
Heavy-Tailed Behavior
• T - the number of leaf nodes visited up
to and including the successful node; b branching factor
P[T bi ] (1 p) pi i  0
(show demo)
b=2
• Expected Run Time
• p  1 E[T ] 
time) b
(infinite expected
1
p

V [T ]
• Variance
b2
•
log p
  2
P[T  L ] p2 L b(infinite
C Lvariance)
• p 1
b2
• Tail
Bounded Heavy-Tailed Behavior
(show demo)
No Heavy-tailed behavior for
Proving Optimality
Proving Optimality
Small-World Vs. Heavy-Tailed
Behavior
• Does a Small-World topology (Watts &
Strogatz) induce heavy-tail behavior?
The constraint graph of a quasigroup
exhibits a small-world topology
(Walsh 99)
Exploiting Heavy-Tailed
Behavior
• Heavy Tailed behavior has been observed in
several domains: QCP, Graph Coloring,
Planning, Scheduling, Circuit synthesis,
Decoding, etc.
•
Consequence for algorithm design:
•
Use restarts or parallel / interleaved
runs to exploit the extreme variance
Restarts provably eliminate
performance.
heavy-tailed behavior.
(Gomes et al. 97, Hoos 99, Horvitz 99, Huberman, Lukose and Hogg 97, Karp et al
96, Luby et al. 93, Rish et al. 97, Wlash 99)
Super-linear Speedups
X
10
X
10
X
10
X
10
X
10
solved
Sequential: 50 +1 = 51 seconds
Parallel: 10 machines --- 1 second
51 x speedup
Interleaved (1 machine): 10 x 1 = 10 seconds
5 x speedup
Restarts
1-F(x)
Unsolved fraction
no restarts
70%
unsolved
restart every 4 backtracks
0.001%
unsolved
250 (62 restarts)
Number backtracks (log)
Example of Rapid Restart Speedup
(planning)
100000
log ( backtracks )
Number backtracks (log)
1000000
100000
~10 restarts
10000
~100 restarts
2000
1000
1
20
10
100
1000
log( cutoff )
Cutoff (log)
10000
100000
1000000
Sketch of proof of elimination of
heavy tails
X  numberof backtracks to solve the problem
• Let’s truncate the search procedure
•
after m backtracks.
• Probability of solving problem with truncated
version:
pm  Pr[ X  m]
• Run the truncated procedure and restart it
repeatedly.
Y  total number backtracks with restarts
Number of Re starts  Y / m ~ Geometric( pm)




F  Pr[Y  y]  (1 pm)












Y /m
 c1ec2 y
Y - does not have Heavy Tails
Download