Learning to Search Henry Kautz University of Washington

advertisement
Learning to Search
Henry Kautz
University of Washington
joint work with
Dimitri Achlioptas, Carla Gomes, Eric Horvitz, Don
Patterson, Yongshao Ruan, Bart Selman
CORE – MSR, Cornell, UW
Speedup Learning

Machine learning historically considered



Speedup learning disappeared in mid-90’s



Learning to classify objects
Learning to search or reason more efficiently
 Speedup Learning
Last workshop in 1993
Last thesis 1998
What happened?



It failed.
It succeeded.
Everyone got busy doing something else.
It failed.

Explanation based learning




At best, mild speedup (50%)


Examine structure of proof trees
Explain why choices were good or bad (wasteful)
Generalize to create new control rules
Could even degrade performance
Underlying search engines very weak

Etzioni (1993) – simple static analysis of next-state
operators yielded as good performance as EBL
It succeeded.

EBL without generalization




Memoization
No-good learning
SAT: clause learning
 Integrates clausal resolution with DPLL
Huge win in practice!


Clause-learning proofs can be exponentially smaller
than best DPLL (tree shaped) proof
Chaff (Malik et al 2001)
 1,000,000 variable VLSI verification problems
Everyone got busy.

The something else: reinforcement learning.



Learn about the world while acting in the world
Don’t reason or classify, just make decisions
What isn’t RL?
Another path

Predictive control of search



Learn statistical model of behavior of a problem solver
on a problem distribution
Use the model as part of a control strategy to improve
the future performance of the solver
Synthesis of ideas from



Phase transition phenomena in problem distributions
Decision-theoretic control of reasoning
Bayesian modeling
Big Picture
control / policy
runtime
Problem
Instances
Solver
dynamic
features
static features
Learning /
Analysis
resource allocation / reformulation
Predictive
Model
Case Study 1: Beyond 4.25
runtime
Problem
Instances
static features
Solver
Learning /
Analysis
Predictive
Model
Phase transitions & problem hardness

Large and growing literature on random problem
distributions


Peak in problem hardness associated with critical value
of some underlying parameter
 3-SAT: clause/variable ratio = 4.25
Using measured parameter to predict hardness of
a particular instance problematic!

Random distribution must be a good model of actual
domain of concern
 Recent progress on more realistic random
distributions...
Quasigroup Completion Problem (QCP)

NP-Complete


Has structure is similar to that of real-world problems tournament scheduling, classroom assignment, fiber
optic routing, experiment design, ...
Can generate hard guaranteed SAT instances (2000)
Complexity
Graph
Phase
Transition
Critically constrained area
Underconstrained
area
Fraction of unsolvable cases
Overconstrained area
20%
42%
50%
Phase transition
Almost all solvable
area
Almost all unsolvable
area
20%
42%
50%
Fraction of pre-assignment
Easy-Hard-Easy pattern in local search
Walksat
Computational Cost
Order 30, 33, 36
Underconstrained
area
“Over” constrained area
% holes
Are we ready to predict run times?

Problem: high variance
1.E+09
1.E+08
1.E+07
1.E+06
1.E+05
log scale
1.E+04
1.E+03
1.E+02
1.E+01
1.E+00
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Deep structural features
Hardness is also controlled by structure of
constraints, not just the fraction of holes
Rectangular Pattern
Aligned Pattern
Tractable
Balanced Pattern
Very hard
Random versus balanced
Random
Balanced
Random versus balanced
7.E+07
6.E+07
5.E+07
Balanced
4.E+07
3.E+07
2.E+07
1.E+07
Random
0.E+00
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Random vs. balanced (log scale)
1.E+09
1.E+08
Balanced
1.E+07
1.E+06
1.E+05
1.E+04
1.E+03
Random
1.E+02
1.E+01
1.E+00
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Morphing balanced and random
Mixed Model - Walksat
100
90
Time (seconds)
80
70
60
50
40
30
20
10
0
0.00%
20.00%
40.00%
60.00%
Percent random holes
80.00%
100.00%
Considering variance in hole pattern
Mixed Model - Walksat
100
90
time
80
70
60
50
40
30
20
10
0
0
2
4
variance in # holes / row
6
8
Time on log scale
Mixed Model - Walksat
time (seconds) log scale
100
10
1
0
2
4
variance in # holes / row
6
8
Effect of balance on hardness


Balanced patterns yield (on average) problems
that are 2 orders of magnitude harder than
random patterns
Expected run time decreases exponentially with
variance in # holes per row or column
E(T) = C-ks


Same pattern (differ constants) for DPPL!
At extreme of high variance (aligned model) can
prove no hard problems exist
Intuitions

In unbalanced problems it is easier to identify
most critically constrained variables, and set
them correctly

Backbone variables
Are we done?



Unfortunately, not quite.
While few unbalanced problems are hard, “easy”
balanced problems are not uncommon
To do: find additional structural features that
signify hardness



Introspection
Machine learning (later this talk)
Ultimate goal: accurate, inexpensive prediction of
hardness of real-world problems
Case study 2: AutoWalksat
control / policy
runtime
Problem
Instances
Solver
dynamic
features
Learning /
Analysis
Predictive
Model
Walksat
Choose a truth assignment randomly
While the assignment evaluates to false
Choose an unsatisfied clause at random
If possible, flip an unconstrained variable in that clause
Else with probability P (noise)
Flip a variable in the clause randomly
Else flip the variable in the clause which causes the
smallest number of satisfied clauses to become
unsatisfied.
Performance of Walksat is highly sensitive to the
setting of P
The Invariant Ratio

Shortest expected run time when P is set to minimize
Mean of the objective function
+ 10%
Std Deviation of the objective function

7
6
5
4
3
2
1
0
McAllester, Selman and Kautz (1997)
Automatic Noise Setting

Probe for the optimal noise level

Bracketed Search with Parabolic Interpolation



No derivatives required
Robust to stochastic variations
Efficient
Hard random 3-SAT
3-SAT, probes 1, 2
3-SAT, probe 3
3-SAT, probe 4
3-SAT, probe 5
3-SAT, probe 6
3-SAT, probe 7
3-SAT, probe 8
3-SAT, probe 9
3-SAT, probe 10
Summary: random, circuit test, graph
coloring, planning
Other features still lurking
clockwise – add 10%


counter-clockwise – subtract 10%
More complex function of objective function?
Mobility? (Schuurmans 2000)
Case Study 3: Restart Policies
control / policy
runtime
Problem
Instances
Solver
dynamic
features
static features
Learning /
Analysis
resource allocation / reformulation
Predictive
Model
Background
Backtracking search methods often exhibit a
remarkable variability in performance
between:



different heuristics
same heuristic on different instances
different runs of randomized heuristics
Cost Distributions
Observation (Gomes 1997): distributions often have heavy
tails



infinite variance
mean increases without limit
probability of long runs decays by power law (Pareto-Levy),
rather than exponentially (Normal)
Very short
Very long
Randomized Restarts

Solution: randomize the systematic solver




Add noise to the heuristic branching (variable
choice) function
Cutoff and restart search after a some number of
steps
Provably eliminates heavy tails
Very useful in practice

Adopted by state-of-the art search engines for SAT,
verification, scheduling, …
Effect of restarts on expected
solution time (log scale)
log ( backtracks )
1000000
100000
10000
1000
1
10
100
1000
log( cutoff )
10000
100000
1000000
How to determine restart policy

Complete knowledge of run-time distribution
(only): fixed cutoff policy is optimal (Luby 1993)


No knowledge of distribution: O(log t) of optimal
using series of cutoffs


argmin t E(Rt) where
E(Rt) = expected soln time restarting every t steps
1, 1, 2, 1, 1, 2, 4, …
Open cases addressed by our research


Additional evidence about progress of solver
Partial knowledge of run-time distribution
Backtracking Problem Solvers


Randomized SAT solver

Satz-Rand, a randomized version of Satz (Li & Anbulagan 1997)

DPLL with 1-step lookahead

Randomization with noise parameter for increasing variable
choices
Randomized CSP solver

Specialized CSP solver for QCP

ILOG constraint programming library

Variable choice, variant of Brelaz heuristic
Formulation of Learning Problem

Different formulations of evidential problem



Consider a burst of evidence over initial
observation horizon
Observation horizon + time expended so far
General observation policies
Observation horizon
Observation horizon
Short
Median run time
1000 choice points
Long
Formulation of Learning Problem

Different formulations of evidential problem



Consider a burst of evidence over initial
observation horizon
Observation horizon + time expended so far
General observation policies
Observation horizon + Time expended
Observation horizon
Short
Median run time
1000 choice points
t1
t2
t3
Long
Formulation of Dynamic Features


No simple measurement found sufficient for
predicting time of individual runs
Approach:


Formulate a large set of base-level and derived features
 Base features capture progress or lack thereof
 Derived features capture dynamics
st and 2nd derivatives
 1
 Min, Max, Final values
Use Bayesian modeling tool to select and combine relevant
features
Dynamic Features

CSP: 18 basic features, summarized by 135 variables





# backtracks
depth of search tree
avg. domain size of unbound CSP variables
variance in distribution of unbound CSP variables
Satz: 25 basic features, summarized by 127 variables






# unbound variables
# variables set positively
Size of search tree
Effectiveness of unit propagation and lookahead
Total # of truth assignments ruled out
Degree interaction between binary clauses, l
Different formulations of task

Single instance



Every instance



Solve a specific instance as quickly as possible
Learn model from one instance
Solve an instance drawn from a distribution of
instances
Learn model from ensemble of instances
Any instance


Solve some instance drawn from a distribution of
instances, may give up and try another
Learn model from ensemble of instances
Sample Results: CSP-QWH-Single




QWH order 34, 380 unassigned
Observation horizon without time
Training: Solve 4000 times with random
Test: Solve 1000 times
Learning: Bayesian network model



MS Research tool
Structure search with Bayesian information criterion
(Chickering, et al. )
Model evaluation:

Average 81% accurate at classifying run time vs. 50%
with just background statistics (range of 98% - 78%)
Learned Decision Tree
Min of 1st derivative of
variance in number of
uncolored cells across
columns and rows.
Min number of
uncolored cells
averaged across
columns.
Min depth of all
search leaves of
the search tree. Change of sign
of the change of
avg depth of
node in search
tree.
Max in
variance in
number of
uncolored
cells.
Restart Policies


Model can be used to create policies that are
better than any policy that only uses run-time
distribution
Example:
Observe for 1,000 steps
If “run time > median” predicted, restart immediately;
else run until median reached or solution found;
If no solution, restart.
 E(Rfixed) = 38,000 but E(Rpredict) = 27,000
 Can sometimes beat fixed even if
observation horizon > optimal fixed !
Ongoing work

Optimal predictive policies





Dynamic features
+ Run time
+ Static features
Partial information about run time distribution
 E.g.: mixture of two or more subclasses of problems
Cheap approximations to optimal policies

Myoptic Bayes
Conclusions


Exciting new direction for improving power of
search and reasoning algorithms
Many knobs to learn how to twist


Noise level, restart policies just a start
Lots of opportunities for cross-disciplinary work





Theory
Machine learning
Experimental AI and OR
Reasoning under uncertainty
Statistical physics
Download