The Necessity of Metabias in Metaheuristics. John.woodward@nottingham.edu.cn www.cs.nott.ac.uk/~jrw Abstract Bias is necessary for learning, and is a probability over a search space. This is usually introduced implicitly. Each time a metaheuristic is executed it will give a different solution. However, if executed repeated it will give the same solution on average. In other words, the bias is static (even if we include a self adaptive component to the search algorithm). One desirable property of metaheuristics is that they converge. This means that there is a non-zero probability of visiting each item in the search space. Search algorithms are intended to be reused on many instances of a problem. These instances can be consider to be drawn from a probability distribution. In other words, a search algorithm and problem class can both be viewed as probability distributions over the search space. If the bias of a search algorithm does not match the bias of a problem class, it will under perform, if however, they do match, it will perform well. Therefore we need some mechanism of altering the initial bias of the search algorithm to coincide with that of the problem class. This mechanism can be realized by a meta level which alters the bias of the base level. In other words, if a search algorithm is to be applied to many instances of a problem, then meta bias is necessary. This implies that convergence at the meta level means a search algorithm shift its bias to any probability distribution. Additionally, shifting bias is equivalent to automating the design of search algorithms. Outline • Need and application of meta heuristics. • Preliminaries (search space, problem instance, problem class, meta heuristic) • Generate-and-Test and Convergence • Bias and Probability • Many problem instances. • Bias and Problem Class • Summary and Conclusions The Need for Heuristics • Many computational problems of industrial interest are intractable to search. • The combinatorial explosion associated with many problems means the search spaces grows too rapidly for practical purposes. • For example, with the travelling salesman problem, the size of the search space grows as O(n!) where n is the number of cities. • Therefore we need “non-exact” heuristics. Examples of MetaHeuristics • • • • • • Hill Climbing Simulated Annealing Genetic Algorithms Ant colony algorithms Swam particle optimization …the list continues to grow… Applications of Metaheuristis • Function Optimization • Function Regression • Combinatorial Problems – Bin Packing – Knapsack – Travelling Salesman • Program Induction • Reinforcement Learning • … Preliminaries • A search space is a finite set of objects. • A cost function assigns a value to each object in the search space. • A problem instance is a search space and cost function. • A problem class is a probability distribution over a set of problem instances with the same search space. • A metaheuristic samples the search space in order to find a “good quality” solution. Generate-and-Test/Search • A candidate solution is generated and tested. • This is repeated until some generate termination condition is met (usually a fixed number of iterations). test • Generate-and-test is also called “search”. • In other words we are sampling the search space. Property of Convergence Convergence: given enough time they will eventually reach (sample/hit) the global optima. There is a non-zero probability of visiting all points in the search space. This criteria is usually easy to meet. “premature convergence” and “getting stuck in a local optima” are major issues for many metaheuristics. Bias and Probability • In the literature, “the bias of a metaheuristic”. • Bias is the probability distribution over the search space (simplified assumption). • A metaheuristic is a random variable. • Let us call this “base bias”. • Sources of bias: Any design decision! E.g. – Cooling schedule, crossover operator, parameters. • It is unlikely the designer makes correct choices or that the choices have much affect (i.e. tuning one parameter maybe more effective than tuning a different parameter). One problem instance to the next one MetaheuristicA Problem instance1 metaheuristicA Problem instance2 Metaheuritic A operates in the same way regardless of the underlying problem instance (1 then 2 then 1). There in no mechanism to alter the bias from one instance to the next (except case based reasoning). Mechanisms do exist which alter the bias during the application of the metaheuristic on a single problem instance, but rarely across problem instances. We argue this is essential. The metaheuristic may learn on one problem instance, but is not learning across problem instances, which is essential if it is applied to many instances. One problem instance to the next 2 Meta bias metaheuristicA metaheuristicA metaheuristicA Problem instance1 Problem instance2 Problem instance1 Meta bias can affect how a metaheuristic operates on different instances. As a special case, if we return to problem instance 1 (or something similar), now at least have a mechanism to allow improved performance. Bias and Problem Class We want, over a number of instances, for the algorithm to learn which are the promising areas of the search space. Distribution of quality solutions over a problem class 0.6 0.5 Probability If no quality solutions exist in a certain part of the search space, there is no need in sampling it, and therefore we do not require the property of convergence. 0.4 0.3 0.2 0.1 0 1 2 3 4 5 Items in search space 0.6 Probability Convergence at the meta level means for the probability of sampling to change from an arbitrary initial distribution for the best for the class. This cannot be achieved by parameter tuning alone. 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 Distribution of sampling By a metaheuristic Summarizing our contributions 1. If we apply our metaheuristics to many problem instances, then meta bias is necessary so the base bias of the metaheuristic converges towards the global optima. Altering bias at the meta level is equivalent to automatically designing metaheuristics. 2. At the meta level, convergence means that the bias can shift to any base bias (i.e. any probability distribution). 3. Problem classes defined as probability distributions are an essential part of the machine learning methodology. A problem class defines a niche in which a suitable metaheuristic can fit. Therefore algorithms should be tested on problem instances which are drawn from this distribution and NOT randomly selected benchmark instances. REFERENCES 1 [1] T.M. Mitchell, The Need for Biases in Learning Generalizations, Rutgers Computer Science Department Technical Report CBM-TR-117, May, 1980. Reprinted in Readings in Machine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, 1990. [2] S. Thrun and L. Pratt, Learning To Learn, S. Thrun and L. Pratt, ed., Kluwer Academic Publishers, 1998, 354 pages. [3] E. K. Burke, J. Woodward, M. Hyde, G. Kendall, Automatic heuristic generation with genetic programming: Evolving a Jack of all trades or a master of one. Genetic and Evolutionary Computation Conference, GECCO 2007. [4] C. Schumacher, M. D. Vose, and L. D. Whitley. The no free lunch and problem description length. In proceedings of the Genetic and Evolutionary Computation Conference, 565-570, California, USA, 7-11 July 2001. Morgan Kaufmann. [5] T. M. Mitchell. Machine Learning. McGraw-Hill 1997. [6] Riccardo Poli, William B. Langdon and Nicholas Freitag McPhee, A Field Guide to Genetic Programming, Lulu.com, freely available under Creative Commons Licence from www.gp-field-guide.org.uk, March 2008. REFERENCES 2 [7] William B. Langdon: Scaling of Program Fitness Spaces. Evolutionary Computation 7(4): 399-428 (1999) [8] Woodward J. Computable and Incomputable Search Algorithms and Functions. IEEE International Conference on Intelligent Computing and Intelligent Systems (IEEE ICIS 2009) November 20-22,2009 Shanghai, China. [9] Woodward, J., Evans A., Dempster, P. 2008, A Syntactic Justification of Occam’s Razor. October 31 to November 2, 2008 Midwest, A New Kind of Science Conference Indiana University Bloomington, Indiana [10] Marcus Hutter, ”Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability” Springer,2004, http://www.hutter1.net/ai/uaibook.htm. [11] Hartley Rogers, Theory of Recursive Functions and Effective Computability, The MIT Press (April 22, 1987) [12] Edmund K. Burke and Graham Kendall (editors), Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, Springer 2005. [13] Stefan Droste, Thomas Jansen, Ingo Wegener: Perhaps Not a Free Lunch But At Least a Free Appetizer, 13-17 July 1999: 833-839 Proceedings of the Genetic and Evolutionary Computation Conference, Orlando, Florida, USA, Morgan Kaufmann, The End • • • • Thank you I would be glad to take any questions. John.woodward@nottingham.edu.cn www.cs.nott.ac.uk/~jrw/