Formal Models of Heavy-Tailed Behavior in Combinatorial Search Hubie Chen, Carla P. Gomes, and Bart Selman {hubes,gomes,selman}@cs.cornell.edu Department of Computer Science Cornell University 1 CP - 2001 Background Randomized backtrack search methods demonstrate high variability of run time (relative to fixed instance): Heavy-tailed behavior (Gomes et. al. CP ‘97, JAR ‘00) New insights into the the design of search algorithms restart strategies Randomization and restart strategies are now an integral part of state-of-the-art SAT Solvers (Chaff, GRASP, RELSAT, SATZ-Rand) 2 CP - 2001 Goals Research on heavy-tails in search thus far largely based on empirical studies. Our goals: Formal analysis of tree search models: show under what conditions heavy-tailed distributions can and cannot arise. Understand when restart strategies are/are not effective. 3 CP - 2001 Intuition How does heavy-tailed behavior arise? • The procedure is characterized by a large variability, which leads to highly different trees from run to run. • Wrong branching decisions may lead the search procedure to explore exponentially large subtrees of the search space containing no solutions. • A lucky sequence of good branching decisions may lead the search to find a solution after exploring only a small subtree. 4 CP - 2001 Intuition Pump: Restarts When are restarts effective? Suppose a search procedure requires (on inputs of size n): • Time p(n) (for a polynomial p) with probability ½ • Time 2^n with probability ½ No restarts: expected time exponential: equal to ½ * (p(n) + 2^n) Restart with time interval p(n): expected time drops to polynomial: equal to 2*p(n) 5 CP - 2001 Outline of Talk • Empirical evidence of Heavy-Tailed behavior • Tree Search Models • Balanced Tree Search Model • Imbalanced Tree Search Model • Bounded Heavy-Tailed Behavior: finite distributions 6 CP - 2001 Empirical Evidence of Heavy-Tailed Behavior 7 CP - 2001 Quasigroups or Latin Squares: An Abstraction for Real World Applications Quasigroup or Latin Square (Order 4) A quasigroup is an n-by-n matrix such that each row and column is a permutation of the same n colors 32% preassignment Gomes and Selman 96 8 CP - 2001 Randomized Backtrack Search Easy instance – 15 % preassigned cells Time: (*) no 7 11 30 (*) (*) solution found - reached cutoff: 2000 9 Gomes et al. 97 CP - 2001 Erratic Behavior of Search Cost Quasigroup Completion Problem 3500! sample mean 2000 Median = 1! 500 number of runs 10 CP - 2001 Heavy-Tailed Distributions 11 CP - 2001 Heavy-Tailed Distributions • Infinite variance, infinite mean • Introduced by Pareto in the 1920’s --- “probabilistic curiosity.” • Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena. • Examples: stock-market, earthquakes, weather, web traffic... 12 CP - 2001 Decay of Distributions Standard Exponential Decay e.g. Normal: Pr[ X x] Ce x2, for some C 0 Exponential Decay Standard Distribution (finite mean & variance) Heavy-Tailed Power Law Decay e.g. Pareto-Levy: Power Law Decay Pr[ X x] Cx , x 0 13 CP - 2001 Visualization of Heavy Tailed Behavior 1 1 2 infinite mean and infinite variance infinite variance 0.153 Unsolved fraction Slope gives value of (1-F(x))(log) Log-log plot of tail of distribution should be approximately linear. 0.319 18% unsolved 0.466 1 => Infinite mean 0.002% unsolved Number backtracks (log) 14 CP - 2001 Heavy Tailed behavior has been observed in several domains: QCP, Graph Coloring, Planning, Scheduling, Circuit synthesis, Decoding, etc. Consequence for algorithm design: Use restarts or parallel / interleaved runs to exploit the extreme variance performance. 1-F(x) Unsolved fraction Exploiting Heavy-Tailed Behavior 70% unsolved 0.001% unsolved 250 (62 restarts) Number backtracks (log) Restarts provably eliminate 15 heavy-tailed behavior (Gomes et al. 2000) CP - 2001 Tree Search Models: Balanced Tree Model 16 CP - 2001 Balanced Tree Model, Described Trees All leaves occur at the same depth Branching factor 2 Exactly one “satisfying” leaf Search algorithm Chronological backtrack search model Random child selection with no propagation mechanisms 17 CP - 2001 Balanced Tree Model: Analysis Let T (n) denote the runtime: number of leaf nodes visited (including “satisfying” leaf), on tree of depth n. Let X i denote choice at (unique) node above satisfying leaf at depth i : 1 = bad choice, 0 = good choice Then, T (n) X 2n1 X 2ni X n 20 1 i 1 T=4 There is exactly one choice of zero-one assignments to the variables for each possible value of T(n); any such assignment has probability n 1 2 T(n) has an uniform distribution. n 1 P[T (n) i] , i 1,,2n 2 T=64 18 CP - 2001 Balanced Tree Model: Distribution n 1 2 E[T (n)] 2 V [T (n)] 2 n 2 1 12 • The expected run time and variance scale exponentially, in the height of the search tree (number of variables); • The run time distribution is uniform -shape not heavy tailed. (see paper for formal proofs) 19 CP - 2001 Balanced Tree Model: Restarts Restart strategies are not effective for this model: no restart strategy with expected polynomial time. Define a restart strategy to be a sequence of times t (n),t (n),t (n),... 1 2 3 Applied to a search procedure by running procedure t (n) for time 1 ; restarting and running for time t2 (n) , etc., until solution found. Luby et al. (IPL ‘93) show that optimal performance (minimum expectation) obtained by a purely uniform restart strategy: t (n) t (n) t (n) ... 1 2 3 20 CP - 2001 Balanced Tree Model What sort of improvements can be made to an algorithm so that behavior not like backtrack in balanced tree model? Very clever search heuristics that lead quickly to the solution node - but that is hard in general Combination of pruning, propagation, dynamic variable ordering: prune subtrees that do not contain the solution, allowing for runs that are short. Resulting trees may vary dramatically from run to run. 21 CP - 2001 Tree Search Models: Imbalanced Tree Model 22 CP - 2001 Imbalanced Tree Model b=2 Algorithm requires time b^i with probability (1-p)p^i Intuition: lower p corresponds to “smarter” search Let T denote the runtime of the algorithm: the number of leaf nodes visited up to and including the successful node. P[T bi ] (1 p) pi (i 0) 23 CP - 2001 Imbalanced Tree Model 24 CP - 2001 Imbalanced Tree Model: Three Regimes of Behavior p 1 b2 (see paper for formal proofs) Regime 1: finite expected time, finite variance 1 p 1 b b2 Regime 2: finite expected time, infinite variance p 1 b Regime 3: infinite expected time, infinite variance log p Tail: P[T L ] p2 L b C L p 1 when we have 2 2 b 25 CP - 2001 Bounded Imbalanced Tree Model 26 CP - 2001 Bounded Imbalanced Tree Model Unbounded model Single infinite distribution. P[ T bi ] (1 p) pi i 0 Bounded model Infinite number of distributions, one for each n. Arises from truncating successively larger finite segments of unbounded distribution. Given that: We define: n i ( 1 p ) p 1 p n 1 i 0 i (1 p) p P[ T bi ] Cn pi n 1 1 p with i 0,1, , n Cn 1 p 1 p n 1 27 CP - 2001 Bounded Imbalanced Tree Model: Three Regimes of Behavior p 1 b2 (see paper for formal proofs) Regime 1: polynomial expected time, polynomial variance 1 p 1 b b2 Regime 2: polynomial expected time, exponential variance p 1 b Regime 3: exponential expected time, exponential variance Restart strategy - Expected polynomial time 28 CP - 2001 Bounded Heavy-Tailed Behavior 29 CP - 2001 Balanced, Unbounded, and Imbalanced Trees 30 CP - 2001 Conclusions 31 CP - 2001 Conclusions Heavy-tailed behavior yields insight into backtrack search methods, providing an explanation for the effectiveness of restart strategies. Tree Search Models: can be analyzed rigorously. • Balanced Tree Search Model Uniform distribution (not heavy-tailed); restarts are not effective • Imbalanced Tree Search Model (Bounded/Unbounded) Heavy-tailed; restarts are effective Consequence for algorithm design: aim for strategies which have highly asymmetric distributions. 32 CP - 2001 Demos, papers, etc. www.cs.cornell.edu/hubes www.cs.cornell.edu/gomes Check also: www.cis.cornell.edu/iisi 33 CP - 2001