Monte Carlo Methods and Statistical Physics

advertisement
MonteCarlo Optimization
(Simulated Annealing)
Mathematical Biology Lecture
6
James A. Glazier
Optimization
• Other Major Application of Monte Carlo Methods is to Find
the Optimal (or a nearly Optimal) Solution of an
Algorithmically Hard Problem.

• Given f x  want to find xmin that minimizes f.
• Definition: If x,   0  x  xmin    f x   f xmin ,

then xmin is a Local Minimum of f.
• Definition: If x  xmin , f x   f xmin ,

then xmin is the Global Minimum of f.
• Definition:


 




If  xmin,1  xmin,2  x  xmin , f x   f xmin  and f xmin,1   f xmin,2  ,


x
then f has Multiple Degenerate Global Minima,

,
x
min,1
min,2 .
Energy Surfaces
• The Number and Shape of Local Minima of
f Determine the Texture of the ‘Energy
Surface,’ also called the Energy
Landscape, Penalty Landscape or
Optimization Landscape.
• Definition: The Basin
of Attraction of a

Local Minimum, xmin is:
x  y, x  xmin
 


 y  ymin  f  y   f x .
• The Depth of the Basin of Attraction
is:


max xBasin of Attractionof x  f  x   f xmin .
• The Radius or Size of the Basin of
Attraction is:
 
min
max xBasin of Attractionof xmin  x  xmin .
• If no local minima except global minimum,
then optimization is easy and energy
surface is Smooth.
Energy Surfaces
• If multiple local minima with large
basins

of attraction, need to pick an x0

In each basin, find the corresponding xmin
and pick the best. Corresponds to


x
enumerating all states if min  is a finite
set, e.g. TSP.
• If many local minima or minima have
small basins of attraction, then the
energy surface is Rough and
Optimization is Difficult.
• In these cases cannot find global
minimum. However, often, only need a
‘pretty good’ solution.
Monte Carlo Optimization
• Deterministic Methods, e.g. NewtonRabson
 


( xi1  xi  f xi  ) Only Move Towards
Better Solutions and Trap in Basins
of Attraction. Need to Move the
Wrong Way Sometimes to Escape
Basins of Attraction (also Called
Traps).
• Algorithm:
– Choose a   0.




– Start at xi . Propose a Move to xt  0  xt  xi   .








f
x

f
x
,
let
x

x
– If
t
i
i 1
t.






– If f xt   f xi , let xi 1  xt with probabilit y g  f xt   f xi ,
Where, g x, is a Decreasing Function of x,  g 0  1 and g   0.
Monte Carlo Optimization—Issues
• Given an
 infinite time, the pseudo-random
walk xi 
Will explore all of Phase Space.
• However, you never know when you have
reached the global minimum! So don’t
know when to Stop.
• Can also take a very long time to escape
from Deep Local Basins of Attraction.
• Optimal Choice of g(x)
 and  will depend
on the particular f x .
• If g(x)1 for x<x0, then will algorithm will
not see minima with depths less than x0 .
• A standard Choice is the Boltzmann
Distribution,
g(x)=e-x/T,
Where T is the Fluctuation Temperature.
The Boltzmann Distribution has right
equilibrium thermodynamics, but is NOT
an essential choice in this application).
Temperature and 
• Bigger T results in more frequent
unfavorable moves.
• In general, the time spent in a Basin of
Attraction is ~ exp(Depth of Basin/ T).
• An algorithm with these Kinetics is
Called an Activated Process.
• Bigger T are Good for Moving rapidly
Between Large and Deep Basins of
Attraction but Ignore
 Subtle (less than
T) Changes in f x .
• Similarly, large  move faster, but can
miss deep minima with small diameter
basins of attraction.
• A strategy for picking T is called an
“Annealing Schedule.”
Annealing Schedules
• Ideally, want time in all local minimum
basins to be small and time in global
minimum basin to be nearly infinite.
• A fixed value of T Works if depth of the
basin of attraction of global
minimum>>depth of the basin of
attraction of all local minima and
radius of the basin of attraction of
global minimum~radius of the largest
basin of attraction among all local
minima.
• If so, pick T between these two
depths.
• If multiple local minima almost
degenerate with global minimum, then
can’t distinguish, but answer is almost
optimal.
• If have a deep global minimum with
very small basin of attraction (golfcourse energy). Then no method
helps!
Annealing Schedules
• If Energy Landscape is Hierarchical or
Fractal, then start with large T and
gradually reduce t.
• Selects first among large, deep basins, then
successively smaller and shallower ones
until it freezes in one.
• Called “Simulated Annealing.”
• No optimal choice of Ts.
• Generally Good Strategy:
• Start with T~ Df/2, if you know typical values
of Df for a fixed stepsize , or T~ typical f, if
you do not.
• Run until typical Df <<T. Then set T=T/2.
Repeat.
• Repeat for many initial conditions.
• Take best solution.
Example—The Traveling Salesman
Problem
• Simulated Annealing Method Works for
Algorithmically Hard (NP Complete)
problems like the Traveling Salesman
problem.


x
• Put down N points in some space: 
• Define an Itinerary: I  i , i ,..., i  i  1, N , and j  k  i
• The Penalty Function or Energy or
Hamiltonian is the Total Path Length
 
H

x
.
for a Given Itinerary:
 x
N
i i 1
1
2
N
N
j 1
ij
i j 1 mod N
j
j
 ik .
Example—The TSP (Contd.)
•
•
•
•
Pick any Initial Itinerary.
At each Monte Carlo Step, pick: j, k  1, N , j  k.
If the initial Itinerary is: I Initial  i1 ,..., i j ,..., ik ,..., iN .
Then the Trial Itinerary is the Permutation:
I Trial  i1 ,..., ik ,..., i j ,..., iN .
• Then DH  H ITrial   H I Initial .
• Apply the Metropolis Algorithm.
• A Good Initial Choice of T is: T 
1
N2
N
N


  x x
i 1 j i 1
i
j
.
• This Algorithm Works Well, Giving a Permutation with H
within a Percent or Better if the Global Optimum in a
Reasonable Amount of Time.
Download