MonteCarlo Optimization (Simulated Annealing) Mathematical Biology Lecture 6 James A. Glazier Optimization • Other Major Application of Monte Carlo Methods is to Find the Optimal (or a nearly Optimal) Solution of an Algorithmically Hard Problem. • Given f x want to find xmin that minimizes f. • Definition: If x, 0 x xmin f x f xmin , then xmin is a Local Minimum of f. • Definition: If x xmin , f x f xmin , then xmin is the Global Minimum of f. • Definition: If xmin,1 xmin,2 x xmin , f x f xmin and f xmin,1 f xmin,2 , x then f has Multiple Degenerate Global Minima, , x min,1 min,2 . Energy Surfaces • The Number and Shape of Local Minima of f Determine the Texture of the ‘Energy Surface,’ also called the Energy Landscape, Penalty Landscape or Optimization Landscape. • Definition: The Basin of Attraction of a Local Minimum, xmin is: x y, x xmin y ymin f y f x . • The Depth of the Basin of Attraction is: max xBasin of Attractionof x f x f xmin . • The Radius or Size of the Basin of Attraction is: min max xBasin of Attractionof xmin x xmin . • If no local minima except global minimum, then optimization is easy and energy surface is Smooth. Energy Surfaces • If multiple local minima with large basins of attraction, need to pick an x0 In each basin, find the corresponding xmin and pick the best. Corresponds to x enumerating all states if min is a finite set, e.g. TSP. • If many local minima or minima have small basins of attraction, then the energy surface is Rough and Optimization is Difficult. • In these cases cannot find global minimum. However, often, only need a ‘pretty good’ solution. Monte Carlo Optimization • Deterministic Methods, e.g. NewtonRabson ( xi1 xi f xi ) Only Move Towards Better Solutions and Trap in Basins of Attraction. Need to Move the Wrong Way Sometimes to Escape Basins of Attraction (also Called Traps). • Algorithm: – Choose a 0. – Start at xi . Propose a Move to xt 0 xt xi . f x f x , let x x – If t i i 1 t. – If f xt f xi , let xi 1 xt with probabilit y g f xt f xi , Where, g x, is a Decreasing Function of x, g 0 1 and g 0. Monte Carlo Optimization—Issues • Given an infinite time, the pseudo-random walk xi Will explore all of Phase Space. • However, you never know when you have reached the global minimum! So don’t know when to Stop. • Can also take a very long time to escape from Deep Local Basins of Attraction. • Optimal Choice of g(x) and will depend on the particular f x . • If g(x)1 for x<x0, then will algorithm will not see minima with depths less than x0 . • A standard Choice is the Boltzmann Distribution, g(x)=e-x/T, Where T is the Fluctuation Temperature. The Boltzmann Distribution has right equilibrium thermodynamics, but is NOT an essential choice in this application). Temperature and • Bigger T results in more frequent unfavorable moves. • In general, the time spent in a Basin of Attraction is ~ exp(Depth of Basin/ T). • An algorithm with these Kinetics is Called an Activated Process. • Bigger T are Good for Moving rapidly Between Large and Deep Basins of Attraction but Ignore Subtle (less than T) Changes in f x . • Similarly, large move faster, but can miss deep minima with small diameter basins of attraction. • A strategy for picking T is called an “Annealing Schedule.” Annealing Schedules • Ideally, want time in all local minimum basins to be small and time in global minimum basin to be nearly infinite. • A fixed value of T Works if depth of the basin of attraction of global minimum>>depth of the basin of attraction of all local minima and radius of the basin of attraction of global minimum~radius of the largest basin of attraction among all local minima. • If so, pick T between these two depths. • If multiple local minima almost degenerate with global minimum, then can’t distinguish, but answer is almost optimal. • If have a deep global minimum with very small basin of attraction (golfcourse energy). Then no method helps! Annealing Schedules • If Energy Landscape is Hierarchical or Fractal, then start with large T and gradually reduce t. • Selects first among large, deep basins, then successively smaller and shallower ones until it freezes in one. • Called “Simulated Annealing.” • No optimal choice of Ts. • Generally Good Strategy: • Start with T~ Df/2, if you know typical values of Df for a fixed stepsize , or T~ typical f, if you do not. • Run until typical Df <<T. Then set T=T/2. Repeat. • Repeat for many initial conditions. • Take best solution. Example—The Traveling Salesman Problem • Simulated Annealing Method Works for Algorithmically Hard (NP Complete) problems like the Traveling Salesman problem. x • Put down N points in some space: • Define an Itinerary: I i , i ,..., i i 1, N , and j k i • The Penalty Function or Energy or Hamiltonian is the Total Path Length H x . for a Given Itinerary: x N i i 1 1 2 N N j 1 ij i j 1 mod N j j ik . Example—The TSP (Contd.) • • • • Pick any Initial Itinerary. At each Monte Carlo Step, pick: j, k 1, N , j k. If the initial Itinerary is: I Initial i1 ,..., i j ,..., ik ,..., iN . Then the Trial Itinerary is the Permutation: I Trial i1 ,..., ik ,..., i j ,..., iN . • Then DH H ITrial H I Initial . • Apply the Metropolis Algorithm. • A Good Initial Choice of T is: T 1 N2 N N x x i 1 j i 1 i j . • This Algorithm Works Well, Giving a Permutation with H within a Percent or Better if the Global Optimum in a Reasonable Amount of Time.