Adaptive Simulated Annealing Final Project Report ECE 556: Design Automation of Digital Systems Professor Yu Hen Hu Greg Knowles 12/16/04 Introduction (Problem statement, Motivation) Simulated Annealing (SA) is a heuristic for minimizing hard combinatorial optimization problems. Dating back to 1953 when Metropolis proposed (SA) to Monte Carlo was modelled after a thermal equilibrium equation representing the metal annealing process. The process was then developed by Kirkpatrick, Gelatt and Vecchi, in “Optimization by Simulated Annealing”, IBM Research Report RC 9355 (1992) and the Metropolis heuristic became what we know today as Simulated Annealing. The technique has proven its capability to minimize complicated n-dimensional cost functions very efficiently. Simulated Annealing is not a replacement for a heuristic designed specifically for the problem at hand. Each problem is different and a method tuned to minimize a specific function, such as Newton Minimization for a parabola will almost always work better. SA is a randomized heuristic with rules to maximize the quality of its guesses, and methods of escaping local minima “traps” to continue exploring the rest of the solution set. SA is also excellent at solving real time continuous optimization problems. Real Time data can be fed into the system and analyzed to find the ideal adjustment parameters to keep a system stable. SA operates on two basic principles; all moves that improve the solution are always accepted, and moves that increase the solution cost are accepted with low probability according to the Metropolis criterion. This criterion is based on the Boltzmann average [e-(Ei-Ej)/kT]. This average is compared to a random number, R, between 0.0 and 1.0 If R ≤ e-(Ei-Ej)/kT, the new solution is accepted If R > e-(Ei-Ej)/kT, the new state is rejected This probability of accepting worse solutions is known as “hillclimbing” and it allows for the escape from local minima in hopes of finding a global minimum. Simulated Annealing has a number of very important parameters that change with every problem. When these parameters are carefully chosen, SA can produce excellent solutions with great efficiency. However, these parameters are hard to guess if you don’t know much about the system to begin with. This paper introduces some techniques that were employed into adaptive software that dynamically changes the critical parameters based on the data samples, as they are collected in an attempt to reach a global minimum value as quickly as possible. Method, algorithm There are five key parameters to focus on: Ti (Initial Temperature) Tf (Final Temperature) Alpha (Cooling Rate) Iterations (Solutions generated at each Temperature) Stop conditions (“Freeze” Detection) Both the initial and final temperatures have closed form solutions, but still require additional information. The closed form solutions are dependent on selecting an acceptance probability for any given increase in cost. How should one go about determining what the ideal probability is to accept a given worse solution? kTi = -dEi/ln(Pi) kTf = -dEf/ln(Pf) Intuition tells us that we want to accept bad moves more frequently at the beginning and start accepting them less at the end. We still lack prerequisite knowledge of the range of our system, and the only way to find that out is by testing a few solutions. Once we get a feel for how big is big and how little is little, then we will know what probabilities to choose. The next two parameters are intimately related. Alpha is the rate at which the system cools, and the iteration counter keeps track of how many solutions to generate at each temperature. On closer review, one will notice that these two parameters do essentially the same thing in different ways. A steep temperature slope will send you down to zero quickly, so you want to take a lot of samples at each temperature plateau. If one were to use a mild temperature slope, it wouldn’t be necessary to take as many measurements at each spot. The advantage to the latter scenario is a much more contoured graph that is better suited for curve fitting. Rapid temperature drops result in block like charts with no smooth lines. The number of iterations taken at high temperatures is less important then taking several iterations on the breaking point of the solution set. The breaking point is the sweet spot of every function that produces a 40-60% ratio of accepted moves. This is the period of time when the “best” found solution moves the quickest. Once the solution set has been adequately sampled, the temperature should continue to decrease until the system becomes frozen. Once there are essentially no moves being accepted, it is up to your stopping criteria to decide when to cut the cord. Stopping criteria are very important because the majority of the iterative calculations performed are at low temperatures. Lots of computation time will be wasted if the algorithm is let to run to zero. Then again, the system could be hiding a very isolated solution that has far better costs, and would only be detected with an extremely thorough search of that local minimum. It should be obvious now that there is no step – by – step best way to perform these calculations and estimates. Professor Lester Ingber of CalTech has spent over twenty years researching ASA and has published several papers on the topic as well as releasing source code to optimize almost any problem that can be modelled with mathematical constraints. Martial Arts for instance is one of his focus areas. Big Blue will really have to step up to the challenge this time around. Program structure and highlight The attached code is based on optimizing a two dimensional cost function. The perturb function is simply a random point generator that is more likely to produce points near the current solution then further away. The ideal perturb function is capable of reaching every solution in the set from every other point. These criteria do not hold true here, as only one dimension can be changed at a time, however any given point can be reached in exactly two moves from every other point with the encoded perturb function. This implementation of ASA generates a completely random initial solution. Research has proven that given a sufficiently high initial temperature the outcome is uncorrelated with the initial solution. The algorithm then performs a search for the ideal temperature based on the acceptance rate. The algorithm will even increase the temperature if the starting one was too low. Once the ideal temperature has been reached, the only decision left to be made is how low your temperature should go before stopping. This algorithm picks a stopping point proportional to the optimal temperature found. This code highlights a Comma Separated Value (CSV) output. The output file can be read by any modern spreadsheet analysis software package. This allows for quick graphical interpretation of your results, as well as a report on how well the adaptive components are functioning. Results: Performance Since these test points are completely random in nature, it is not uncommon to get “unlucky” and have the adaptive system perform to different levels of effectiveness on every successive run. This particular data run displays near ideal performance based on the goals that were set forth in this paper. Temp 400000 350000 300000 250000 200000 150000 100000 50000 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 Alpha 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 153 161 169 The two charts on the previous page show the temperature and the cooling rate parameters. Notice the temperature drops every time the cooling rate, alpha migrates from zero. Also notice the initial temperature was decided to be too low, and was rapidly raised to an appropriate level. The temperature was not directly changed. It was only affected by modifying the cooling rate parameter, alpha. The value for alpha was decided with trigger levels on the acceptance ratio. The temperature is raised until 60% of solutions are accepted, and then slowly lowered while in the 60-40% range, before being more noticeably decremented at acceptance rates lower then 40%. AcceptRatio 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 153 161 169 What may be more interesting to some is the quality of the solutions being generated and accepted. Again at higher temperatures, bad solutions are accepted more frequently, and at lower temperatures, a local minima of solutions will have been located, and each new solution selected will be a good one. The graph shown happens to start out with very good solutions, and this is due to luck in picking a good local minimum when the temperature was at a low value. Typically the graph would start at a very high value and work its way down following the behavior observed in the remainder of the graph. The shape of this graph directly shows a linear representation of the path chosen throughout the solution set on this run. A few million runs like this could potentially create a complete graph of the solution space, however this is often impractical in real world applications involving much more then two dimensions of flexibility. AvgCost 300000 250000 200000 150000 100000 50000 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 Iterations 1200000 1000000 800000 600000 iter 400000 200000 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 The number of iterations chart and best-found solution chart look like complete opposites from each other in this example. As the complexity and resolution of the solution space increases, both of these graphs will move inward, requiring longer to converge to a global minimum. The iteration count may need to start getting turned up higher at an earlier temperature if the solution is not stabilizing. Best Cost 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 References S. Nahar, S. Sahni, and E. Shragowitz. Experiments with simulated annealing. Proceedings of 22nd Design Automation Conference(1985) pp. 748-752. S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, Science 220 (1983) 671-680. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth. A.H. Teller and E. Teller, "Equation of State Calculations by Fast Computing Machines," J. Chem. Phys. 21 (1953) 1087-1092. Cerny, V., "Thermodynamical Approach to the Traveling Salesman Problem: An Efficient Simulation Algorithm", J. Opt. Theory Appl., 45, 1, 41-51, 1985 Kirkpatrick, S., C. D. Gelatt Jr., M. P. Vecchi, "Optimization by Simulated Annealing",Science, 220, 4598, 671-680, 1983. Press, Wm. H., B. Flannery, S. Teukolsky, Wm. Vettering, Numerical Recipes, 326-334, Cambridge University Press, New York, NY, 1986.