Statistical Regimes Across Constrainedness Regions Carla P. Gomes, Cesar Fernandez

advertisement
Statistical Regimes Across
Constrainedness Regions
Carla P. Gomes, Cesar Fernandez
Bart Selman, and Christian Bessiere
Cornell University
Universitat de Lleida
LIRMM-CNRS
CP 2004
Toronto
Motivation
Bring together recent results on:
•
•
•
•
Typical Case Analysis
Randomized Complete Search Methods
Heavy-Tailed Phenomena
Random CSP Models
Computational Cost (Mean)
% of solvable instances
Typical Case Analysis:
Beyond NP-Completeness
Phase Transition
Phenomenon:
Discriminating
“easy” vs.
“hard”
instances
Constrainedness
Hogg et al 96
Exceptional Hard Instances
Seem to defy the “easy-hard” pattern:
– such instances occur in the under-constrained
area;
– they are considerably harder than other similar
instances and even harder than instances from
the critically constrained area.
Gent and Walsh 94
Hogg and Williams 94
Smith and Grant 97
Are Exceptionally Hard Instances
Truly Hard?
• Different algorithms encounter different
exceptionally hard instances.
• ``Hardness'' of exceptionally hard instances
not necessarily hardness of the instances,
but rather a the combination of the instance
with the details of the search method;
Gent and Walsh 94
Hogg and Williams 94
Selman and Kirkpatrick 96
Smith and Grant 97
Randomized Backtrack Search
What if we introduce a tiny element of
randomness into the search heuristic – e.g., by
breaking ties randomly --- and run this (still
complete) randomized search procedure on the
same instance over and over again?
Study of runtime distributions of a
randomized backtrack search
on the same instance :
Way of isolating the variance caused
solely by the algorithm
Gomes et al CP 97
Extreme Variance in Runtime
of Randomized Backtrack Search
Easy instance – 15 % preassigned cells
Time:
7
Gomes, et al 97
11
30
>2000
>2000
Heavy-tailed distributions
Exponential decay for
standard distributions, e.g. Normal, Logonormal,
exponential:
Normal
Pr[ X  x]  Ce x2,
for some C  0
Heavy-Tailed
Power Law Decay
e.g. Pareto-Levy:
Pr[ X  x]Cx ,x 0
(Frost et al 97; Gomes et al 97 ,Hoos 1999,Walsh 99,)
Visualization of Heavy-tailed Phenomenon
1-F(x)
Unsolved fraction
(Log-Log Plot of Tail o Distribution)
50%
Heavy-tailed Dist.
Median=2
Normal
(2,1000000)
O,1%>200000
Normal
(2,1)
2
Runtime (Number of backtracks) (log scale)
Formal Results
Abstract Search Tree Models with
provably heavy-tailed behavior
(Chen, Gomes, Selman 2001)
Generalization and Assignment of
Semantics to the Abstract Search
Tree Models
(Williams, Gomes, Selman 2003)
Provably Polytime Restart Strategies
(Williams, Gomes, Selman 2003)
What about concrete CSP models?
(so far no good characterization of
runtime distributions of concrete CSP models)
Research Questions:
Concrete CSP Models
Complete Randomized Backtrack Search
1. Can we provide a characterization of heavy-tailed
behavior: when it occurs and it does not occur?
2. Can we identify different tail regimes across
different constrainedness regions?
3. Can we get further insights into the tail regime by
analyzing the concrete search trees produced by the
backtrack search method?
Outline of the Rest of the Talk
•
•
•
•
•
Random Binary CSP Models
Encodings of CSP Models
Randomized Backtrack Search Algorithms
Search Trees
Statistical Tail Regimes Across Cosntrainedness
Regions
– Empirical Results
– Theoretical Model
• Conclusions
Binary Constraint Networks
• A finite binary constraint network
P = (X, D,C)
– a set of n variables X = {x1, x2, …, xn}
– For each variable, set of finite domains
D = { D(x1), D(x2), …, D(xn)}
– A set C of binary constraints between pairs of variables;
a constraint Cij, on the ordered set of variables (xi, xj) is a
subset of the Cartesian product D(xi) x D(xj) that specifies the
allowed combinations of values for the variables xi and xj.
– Solution to the constraint network
instantiation of the variables such that all constraints are satisfied.
Random Binary CSP Models
Model B < N, D, c, t >
N – number of variables; D – size of the domains;
c – number of constrained pairs of variables;
p1 – proportion of binary constraints included in network ;
c = p1 N ( N-1)/ 2;
t – tightness of constraints;
p2 - proportion of forbidden tuples; t = p2 D2
Model E <N, D, p>
N – number of variables; D – size of the domains:
p – proportion of forbidden pairs (out of D2N ( N-1)/ 2)
(Gent et al 1996)
N – from 15 to 50;
(Achlioptas et al 2000)
(Xu and Li 2000)
Encodings
• Direct CSP Binary Encoding
• Satisfiability Encoding (direct encoding)
Walsh 2000
Backtrack Search Algorithms
• Look-ahead performed::
– no look-ahead (simple backtracking BT);
– removal of values directly inconsistent with the last instantiation
performed (forward-checking FC);
– arc consistency and propagation (maintaining arc consistency, MAC).
• Different heuristics for variable selection (the next variable to instantiate):
– Random (random);
– variables pre-ordered by decreasing degree in the constraint graph (deg);
– smallest domain first, ties broken by decreasing degree (dom+deg)
• Different heuristics for variable value selection:
– Random
– Lexicographic
• For the SAT encodings we used the simplified Davis-Putnam-LogemannLoveland procedure: Variable/Value static and random
Inconsistent Subtrees
Bessiere at al 2004
Distributions
• Runtime distributions of the backtrack
search algorithms;
• Distribution of the depth of the
inconsistency trees found during the search;
All runs were performed without censorship.
Main Results
1 - Runtime distributions
2 – Inconsistent Sub-tree Depth
Distributions
Dramatically different statistical
regimes across the constrainedness
regions of CSP models;
Runtime distributions
Distribution of Depth of
Inconsistent Subtrees
Applet
Applet
Depth of Inconsistent Search Tree vs.
Runtime Distributions
Other Models and More Sophisticated
Consistency Techniques
BT
MAC
Model B
Heavy-tailed and non-heavy-tailed regions.
As the “sophistication” of the algorithm increases the heavy-tailed
region extends to the right, getting closer to the phase transition
SAT encoding:
DPLL
Theoretical Model
Depth of Inconsistent Search Tree vs.
Runtime Distributions
Theoretical Model
X – search cost (runtime);
ISTD – depth of an inconsistent sub-tree;
Pistd [IST = N]– probability of finding an inconsistent sub-tree of
depth N during search;
P[X>x | N] – probability of the search cost being larger x,
given an inconsistent tree of depth N
Depth of Inconsistent Search Tree vs.
Runtime Distributions:
Theoretical Model
See paper for proof
details
Regressions for B1, B2, K
Regression for B1 and B2
Regression for k
Validation:
Theoretical Model vs. Runtime Data
α= 0.27 using runtime data;
α= 0.26  using the model;
Summary of Results
1 As constrainedness increases change
from heavy-tailed to a non-heavy-tailed
regime
Both models (B and E), CSP and SAT
encodings, for the different backtrack
search strategies:
Summary of Results
2 Threshold from the heavy-tailed to non-heavytailed regime
– Dependent on the particular search procedure;
– As the efficiency of the search method increases, the
extension of the heavy-tailed region increases: the
heavy-tailed threshold gets closer to the phase
transition.
Summary of Results
3 Distribution of the depth of inconsistent search sub-trees
Exponentially distributed inconsistent sub-tree depth
(ISTD) combined with exponential growth of the search
space as the tree depth increases implies heavy-tailed
runtime distributions.
As the ISTD distributions move away from the exponential
distribution, the runtime distributions become non-heavytailed.
Research Challenges
How to exploit these results in terms of the design of
more efficient search procedures?
– Randomization and restart strategies;
– Search heuristics:
– Look ahead and look back strategies;
Very exciting and
promising research area !
Demos and papers:
www.cs.cornell.edu/gomes/
http://fermat.eup.udl.es/~cesar/
www.cs.cornell.edu/selman/
http://www.lirmm.fr/~bessiere/
Download