Learning to Search Henry Kautz University of Washington joint work with Dimitri Achlioptas, Carla Gomes, Eric Horvitz, Don Patterson, Yongshao Ruan, Bart Selman CORE – MSR, Cornell, UW Speedup Learning Machine learning historically considered Speedup learning disappeared in mid-90’s Learning to classify objects Learning to search or reason more efficiently Speedup Learning Last workshop in 1993 Last thesis 1998 What happened? It failed. It succeeded. Everyone got busy doing something else. It failed. Explanation based learning At best, mild speedup (50%) Examine structure of proof trees Explain why choices were good or bad (wasteful) Generalize to create new control rules Could even degrade performance Underlying search engines very weak Etzioni (1993) – simple static analysis of next-state operators yielded as good performance as EBL It succeeded. EBL without generalization Memoization No-good learning SAT: clause learning Integrates clausal resolution with DPLL Huge win in practice! Clause-learning proofs can be exponentially smaller than best DPLL (tree shaped) proof Chaff (Malik et al 2001) 1,000,000 variable VLSI verification problems Everyone got busy. The something else: reinforcement learning. Learn about the world while acting in the world Don’t reason or classify, just make decisions What isn’t RL? Another path Predictive control of search Learn statistical model of behavior of a problem solver on a problem distribution Use the model as part of a control strategy to improve the future performance of the solver Synthesis of ideas from Phase transition phenomena in problem distributions Decision-theoretic control of reasoning Bayesian modeling Big Picture control / policy runtime Problem Instances Solver dynamic features static features Learning / Analysis resource allocation / reformulation Predictive Model Case Study 1: Beyond 4.25 runtime Problem Instances static features Solver Learning / Analysis Predictive Model Phase transitions & problem hardness Large and growing literature on random problem distributions Peak in problem hardness associated with critical value of some underlying parameter 3-SAT: clause/variable ratio = 4.25 Using measured parameter to predict hardness of a particular instance problematic! Random distribution must be a good model of actual domain of concern Recent progress on more realistic random distributions... Quasigroup Completion Problem (QCP) NP-Complete Has structure is similar to that of real-world problems tournament scheduling, classroom assignment, fiber optic routing, experiment design, ... Can generate hard guaranteed SAT instances (2000) Complexity Graph Phase Transition Critically constrained area Underconstrained area Fraction of unsolvable cases Overconstrained area 20% 42% 50% Phase transition Almost all solvable area Almost all unsolvable area 20% 42% 50% Fraction of pre-assignment Easy-Hard-Easy pattern in local search Walksat Computational Cost Order 30, 33, 36 Underconstrained area “Over” constrained area % holes Are we ready to predict run times? Problem: high variance 1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 log scale 1.E+04 1.E+03 1.E+02 1.E+01 1.E+00 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Deep structural features Hardness is also controlled by structure of constraints, not just the fraction of holes Rectangular Pattern Aligned Pattern Tractable Balanced Pattern Very hard Random versus balanced Random Balanced Random versus balanced 7.E+07 6.E+07 5.E+07 Balanced 4.E+07 3.E+07 2.E+07 1.E+07 Random 0.E+00 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Random vs. balanced (log scale) 1.E+09 1.E+08 Balanced 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03 Random 1.E+02 1.E+01 1.E+00 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Morphing balanced and random Mixed Model - Walksat 100 90 Time (seconds) 80 70 60 50 40 30 20 10 0 0.00% 20.00% 40.00% 60.00% Percent random holes 80.00% 100.00% Considering variance in hole pattern Mixed Model - Walksat 100 90 time 80 70 60 50 40 30 20 10 0 0 2 4 variance in # holes / row 6 8 Time on log scale Mixed Model - Walksat time (seconds) log scale 100 10 1 0 2 4 variance in # holes / row 6 8 Effect of balance on hardness Balanced patterns yield (on average) problems that are 2 orders of magnitude harder than random patterns Expected run time decreases exponentially with variance in # holes per row or column E(T) = C-ks Same pattern (differ constants) for DPPL! At extreme of high variance (aligned model) can prove no hard problems exist Intuitions In unbalanced problems it is easier to identify most critically constrained variables, and set them correctly Backbone variables Are we done? Unfortunately, not quite. While few unbalanced problems are hard, “easy” balanced problems are not uncommon To do: find additional structural features that signify hardness Introspection Machine learning (later this talk) Ultimate goal: accurate, inexpensive prediction of hardness of real-world problems Case study 2: AutoWalksat control / policy runtime Problem Instances Solver dynamic features Learning / Analysis Predictive Model Walksat Choose a truth assignment randomly While the assignment evaluates to false Choose an unsatisfied clause at random If possible, flip an unconstrained variable in that clause Else with probability P (noise) Flip a variable in the clause randomly Else flip the variable in the clause which causes the smallest number of satisfied clauses to become unsatisfied. Performance of Walksat is highly sensitive to the setting of P The Invariant Ratio Shortest expected run time when P is set to minimize Mean of the objective function + 10% Std Deviation of the objective function 7 6 5 4 3 2 1 0 McAllester, Selman and Kautz (1997) Automatic Noise Setting Probe for the optimal noise level Bracketed Search with Parabolic Interpolation No derivatives required Robust to stochastic variations Efficient Hard random 3-SAT 3-SAT, probes 1, 2 3-SAT, probe 3 3-SAT, probe 4 3-SAT, probe 5 3-SAT, probe 6 3-SAT, probe 7 3-SAT, probe 8 3-SAT, probe 9 3-SAT, probe 10 Summary: random, circuit test, graph coloring, planning Other features still lurking clockwise – add 10% counter-clockwise – subtract 10% More complex function of objective function? Mobility? (Schuurmans 2000) Case Study 3: Restart Policies control / policy runtime Problem Instances Solver dynamic features static features Learning / Analysis resource allocation / reformulation Predictive Model Background Backtracking search methods often exhibit a remarkable variability in performance between: different heuristics same heuristic on different instances different runs of randomized heuristics Cost Distributions Observation (Gomes 1997): distributions often have heavy tails infinite variance mean increases without limit probability of long runs decays by power law (Pareto-Levy), rather than exponentially (Normal) Very short Very long Randomized Restarts Solution: randomize the systematic solver Add noise to the heuristic branching (variable choice) function Cutoff and restart search after a some number of steps Provably eliminates heavy tails Very useful in practice Adopted by state-of-the art search engines for SAT, verification, scheduling, … Effect of restarts on expected solution time (log scale) log ( backtracks ) 1000000 100000 10000 1000 1 10 100 1000 log( cutoff ) 10000 100000 1000000 How to determine restart policy Complete knowledge of run-time distribution (only): fixed cutoff policy is optimal (Luby 1993) No knowledge of distribution: O(log t) of optimal using series of cutoffs argmin t E(Rt) where E(Rt) = expected soln time restarting every t steps 1, 1, 2, 1, 1, 2, 4, … Open cases addressed by our research Additional evidence about progress of solver Partial knowledge of run-time distribution Backtracking Problem Solvers Randomized SAT solver Satz-Rand, a randomized version of Satz (Li & Anbulagan 1997) DPLL with 1-step lookahead Randomization with noise parameter for increasing variable choices Randomized CSP solver Specialized CSP solver for QCP ILOG constraint programming library Variable choice, variant of Brelaz heuristic Formulation of Learning Problem Different formulations of evidential problem Consider a burst of evidence over initial observation horizon Observation horizon + time expended so far General observation policies Observation horizon Observation horizon Short Median run time 1000 choice points Long Formulation of Learning Problem Different formulations of evidential problem Consider a burst of evidence over initial observation horizon Observation horizon + time expended so far General observation policies Observation horizon + Time expended Observation horizon Short Median run time 1000 choice points t1 t2 t3 Long Formulation of Dynamic Features No simple measurement found sufficient for predicting time of individual runs Approach: Formulate a large set of base-level and derived features Base features capture progress or lack thereof Derived features capture dynamics st and 2nd derivatives 1 Min, Max, Final values Use Bayesian modeling tool to select and combine relevant features Dynamic Features CSP: 18 basic features, summarized by 135 variables # backtracks depth of search tree avg. domain size of unbound CSP variables variance in distribution of unbound CSP variables Satz: 25 basic features, summarized by 127 variables # unbound variables # variables set positively Size of search tree Effectiveness of unit propagation and lookahead Total # of truth assignments ruled out Degree interaction between binary clauses, l Different formulations of task Single instance Every instance Solve a specific instance as quickly as possible Learn model from one instance Solve an instance drawn from a distribution of instances Learn model from ensemble of instances Any instance Solve some instance drawn from a distribution of instances, may give up and try another Learn model from ensemble of instances Sample Results: CSP-QWH-Single QWH order 34, 380 unassigned Observation horizon without time Training: Solve 4000 times with random Test: Solve 1000 times Learning: Bayesian network model MS Research tool Structure search with Bayesian information criterion (Chickering, et al. ) Model evaluation: Average 81% accurate at classifying run time vs. 50% with just background statistics (range of 98% - 78%) Learned Decision Tree Min of 1st derivative of variance in number of uncolored cells across columns and rows. Min number of uncolored cells averaged across columns. Min depth of all search leaves of the search tree. Change of sign of the change of avg depth of node in search tree. Max in variance in number of uncolored cells. Restart Policies Model can be used to create policies that are better than any policy that only uses run-time distribution Example: Observe for 1,000 steps If “run time > median” predicted, restart immediately; else run until median reached or solution found; If no solution, restart. E(Rfixed) = 38,000 but E(Rpredict) = 27,000 Can sometimes beat fixed even if observation horizon > optimal fixed ! Ongoing work Optimal predictive policies Dynamic features + Run time + Static features Partial information about run time distribution E.g.: mixture of two or more subclasses of problems Cheap approximations to optimal policies Myoptic Bayes Conclusions Exciting new direction for improving power of search and reasoning algorithms Many knobs to learn how to twist Noise level, restart policies just a start Lots of opportunities for cross-disciplinary work Theory Machine learning Experimental AI and OR Reasoning under uncertainty Statistical physics