ANALYSIS OF SIMULATED ANNEALING FOR OPTIMIZATION by

advertisement
·
LIDS-P-1 494
August 20, 1985
ANALYSIS OF SIMULATED ANNEALING FOR OPTIMIZATION
by
Saul B. Gelfand and Sanjoy K. Mitter
Department of Electrical Engineering and Computer Science
and
Laboratory for Information and Decision Systems
Massachusetts Institute of Technology
Cambridge, MA 02139
Abstract
Simulated annealing is a popular Monte Carlo algorithm for combinatorial
optimization.
The annealing algorithm simulates a nonstationary finite state
Markov chain whose state space
minimized.
of
is the domain of the cost function to be
We analyze this chain focusing on those issues most important for
optimization.
{I,J}
Q
In
all
of
our results we consider
Q; important special cases are when
I
an arbitrary partition
is the set of minimum cost
states or a set of all states with sufficiently small cost.
bound on the probability that the chain visits
= 1,2,....
This
bound may
be useful
even
I
We give a lower
at some time
when the
< k, for
does
not
I
in
probability and obtain an estimate of the rate of convergence as well.
We
converge.
We
give conditions under which the chain converges to
also give
conditions under which the
visits
almost always, or does not converge to
I
algorithm
k
chain visits
I
infinitely often,
I, with probability
1.
Research supported by the U.S. Army Res. Off. under Grant DAAG29-84-K-0005
and by the Air Force Off. of Scientific Res. under Grant AFOSR 82-0135B
1. Introduction
Simulated
annealing,
as
proposed
by Kirkpatrick
Monte-Carlo algorithm for combinatorial optimization.
[11,
is
a popular
Simulated annealing is
a variation on an algorithm introduced by Metropolis
[2]
for approximate
computation of mean values of various statistical-mechanical quantities for a
physical
system
in
equilibrium
at
a
given
temperature.
In
simulated
annealing the temperature of the system is slowly decreased to zero; if the
temperature is decreased slowly enough the system should end up among the
minimum energy states or at least among states of sufficiently low energy.
Hence the annealing algorithm can be viewed as minimizing a cost function
(energy) over a finite set (the system's states).
been applied to several combinatorial
Simulated annealing has
optimization problems including the
traveling salesman problem [2], computer design problems [2],[3], and image
reconstruction problems [4] with apparently good results.
The
annealing
algorithm
consists
of
simulating
a
nonstationary
finite-state Markov chain which we shall call the annealing chain. We now
describe
the
precise
relationship
optimization problem to be solved.
M
to be the real numbers,
between
this
chain
the natural numbers, and
the cardinality of a finite set
finite set, say
Q =
{1,...,QI}, and
over
i E
n. Let
Ui E R
Tk > 0
for
A.
Let
W(k)=
energies
{Tk}kEMN
Wk) 1 iE
{Ui}iE
R
i E 0;
k E MN.
(
Q
be a
we want to
shall be the
{Ui}iE Q
as the
as the annealing schedule of temperatures.
(a row vector) be a Boltzman distribution over the
at temperature
Wk)
Let
for
state-space for the annealing chain and we shall refer to
energy function and
finite
M 0 = M U {O}, and we
tAI
Ui
the
Here and in the sequel we shall take
shall denote by
minimize
and
e
Tk,
i.e.,
-Ui/Tk
v-U j/Tk
for all
k E N 0.
The annealing chain will be constructed such that at each
time
k
7(k)
the chain has
each time
k
the annealing chain shall have a
P(kk+l) = [p (k k+l)i 3 Q
S
transition matrix
is the unique solution of
The motivation for this is as follows.
be the minimum energy states in
13ct
Tif
1-step
7 = D(k)
such that
7 -=p(k,k+l)
the vector equation
Let
as its unique invariant distribution, i.e., at
Q.
Now if
Tk
0
-*
as
k
-
X
then
iE S
(k ) _,S*
if
0
as
k
,
i.e.,
the
S ,
i
invariant
distributions
distribution over the minimum energy states.
converge
to
a
uniform
The hope is then that the chain
itself converges to the minimum energy states.
We now show how Metropolis constructs a transition matrix
7(k)
with invariant vector
for
k E N0.
Q =
Let
p(k,k+l)
[qij]ijE
be a
symmetric and irreducible stochastic matrix, and let
qije
P
for all
~kk+l)
qI
i
(
1-
~
·1~
>(k,k+l)
Pie
Qli
and
k E M 0.
i,j E Q
f(k)p(k,k+l)
-( Uj-Ui) /Tk
for all
k e N.
if
Uj > U i ,
if
Uj < Ui, J • i,
if
3 = i,
Then it is easily verified that
In fact,
p(kk+l)
and
7(k)
(k)
satisfy the
reversibility condition
p(k,k+l) f(k) = ilk)
for all
k E N 0.
transition
Let
matrices
(k,k+l)
i,
{xkI}kEN
be the
({P(kk+l)}k
and
annealing chain with 1-step
some
constructed on a suitable probability space (M,A,P).
for
i E Q
and
initial
Let
distribution,
pik)
-
P{xk = i}
k E[NO.
The annealing chain is simulated as follows.
generate
E
'
ij
i
ji
a random variable
y E
n
-3-
with
Suppose
P{y =
J}
xk = i E 0. Then
= qiJ
for
j E 0.
Suppose
y
=
J E Q.
Then set
(i
Uj < Ui,
if
}3~~
if
k+1
j
:if
j
i
~~~~~-(U
Ui
U
with probability
-Ui)/T k
e
else.
Hence we may think of the annealing algorithm as a "probabilistic descent"
algorithm
where
the
Q
matrix
represents
some
prior
distribution
of
"directions", transitions to same or lower energy states are always allowed,
and transitions to higher energy states are allowed with positive probability
which tends to
0
k - o
as
(when
Tk - 0
as
k -+ A).
Even though simulated annealing was proposed as heuristic, its apparent
success in dealing with hard combinatorial optimization problems makes it
desirable to understand in a rigorous fashion why it works.
The recent works
of Geman [4], Gidas C5], and Mitra et. al. [6] have approached this problem
by showing the existance of an annealing schedule for which the annealing
chain converges
distributions
weakly to
{ff(k)}
the
k E N
limit
as the
sequence
, i.e., to a uniform distribution over
case a (different) constant
large enough
same
and
c
is given such that if
Tk - 0
1
1t-
if
of
invariant
S .
In each
Tk > c / log k
for
k - ~ then
as
*
i E S*,
Is,
O
Piif
0
as
k * A.
icQi
are an
f
i0 S
Furthermore, under an annealing schedule of the form
log(k+k 0) where
E
|pk)-il
i
if
T > c
for
extension
and
k0
Tk = T /
1, Mitra et. al. obtain an upper bound on
k E N 0.
The results of Geman, Gidas, and Mitra et. al.
of
convergence
weak
results
for
stationary
aperiodic
irreducible chains [7] and certain nonstationary chains [8], and are useful
in proving ergodic theorems (which Gidas does).
However, if one is simply
interested in finding any minimum energy state than weak convergence seems
unnecessarily strong.
In a recent paper HaJek [9]
investigates when the
annealing chain converges in probability to
for a constant
for large
d
S .
Hajek gives an expression
such that under the annealing schedule
k E lN,
enough
P{x
Furthermore the condition that
E
k
Q
S )
1
-
k -
as
T k = T / log k
iff
o
T > d .
be symmetric is relaxed to what is called
"weak reversibility".
In this paper, we analyze simulated annealing focusing on optimization
issues.
Here we are not so much interested in the statistics of individual
states as in that of certain groups of states,
minimum energy states
or
more
sufficiently low energy.
partition
{I,J}
of
such as the set
generally a set
S
of all
S
of
states with
In all of our results we consider an arbitrary
Q, and examine the behavior of the annealing chain
relative to this partition; we obtain results for
I - S
as a special case.
We investigate both finite-time and asymptotic behavior as it depends on the
Q
matrix and the annealing schedule of temperatures
In Section 2 we establish notation.
behavior.
In Section 3 we examine finite-time
We observe that since we may keep track of the minimum energy
state visited up to time
probability of visiting
of visiting
S
k, it seems more appropriate to lower bound the
S
k.
where
T > 0
P{xn E I, some n < k}
exponentially fast.
n < k, rather than the probability
at some time
at time
Tk = T / log(k+k 0)
0.
Under an annealing
and
k E M O.
for
For small
T
ki
schedule
For large
T
this bound converges to
1
the bound converges to a positive value
>
T
when the
In Section 4 we examine asymptotic behavior.
First, we show that under suitable conditions on
*
of the form
> 1, we obtain a lower bound on
Hence the bound is potentially useful even for small
algorithm may not converge.
Q
there exists a constant
*
U
such that if
probability that
Tk h U
xk E I
suitable conditions on
k E I,
P{x
{Tk}kE.N
k
then
xk
E I} = 1 - O(k
/ log k
for large enough
infinitely often is
Q
T > U
if
and
1.
/T
)
as
k
-
A, where
-5-
Second, we show that under
T k = T / log k
converges in probability to
T
> 0
k E I, then the
I.
for large enough
Infact, we show that
does not depend on
T
and
only depends on
Q
pairs of allowed transitions.
on
Q
log k
there exists a constant
for large enough
always is
1.
{(i,j) E Q x Q: qij
through the set
, 0}
of ordered
Third, we show that under suitable conditions
U*
such that if
U
< T < Us
k E N, then the probability that
and
Tk = T /
xk E I
almost
Hence we obtain three results about the convergence of the
annealing algorithm with increasingly stronger assumptions and conclusions.
In Section 4 we also obtain a converse which gives conditions under which the
annealing algorithm does not converge: we show that under suitable conditions
on
Q
that there exists a constant
(W -6) / log k
for large enough
infinitely often is
W
such that if
e
> 0
and
k E N, then the probability that
Tk <
xk e I
< 1. Finally, we briefly compare our results to Hajek's
work and indicate some directions
for further research.
We remark that
Sections 3 and 4 are essentially independent of each other.
--
_
_
_
_
_
2. Notation and Preliminaries
In this section we describe notation which is necessary to state our
a technical
and discuss
this notation,
of
a few examples
give
results,
condition which we shall often impose in the sequel.
Let
and
U -=max Ui.
iEO
for some
U < U < U.
U = min U
-iEOi
{i E (: Ui
define
U}
Then
to be the
d-step
transition matrix
the stochastic matrix
Q
p(k+d-l,k+d)
=(k,k+l)
{fX}kEM
In defining the annealing chain
oin Section 1 we assumed that
This assumption is
was symmetric and irreducible.
If
unnecessarily strong for our purposes.
I
to converge to
xi
as
k
{I,J}
is a partition of
J
J
and possibly another condition which makes transitions from
I
likely than transitions from
j
that
> 0
n = O,...,k
for all
,... ,d-1.
i
to
every
d E mN
think of
j
i
can reach
=
J
=
J
iJd)
J
and
such that
Ui
at energy
i,J E 0
and
and
i,j E 0
such that
let
qin+1
let
p(kik+l) > 0
in n+1
< U
Tk > 0
for all
for all
Note that
--
iJ
i
can
such
for all
U.
AJd)
be the
for all
d
k E N0).
n = 0,...,d-1.
as the sequences of allowed transitions of length
at infinite temperature.
i
be the sequences of states
M(d)
i-i
> 0
is an
Q
are the sequences of allowed transitions of length
j at positive temperature (we defined
i 0O,...id
to
A(d)
i - io,.,id
I,
more
i = i 0 ,il,...,i k = J
U E R
n = 0,...,k-1; if
than we shall say that
sequences of states
to
I
i,j E 0 we shall say that
For each
k E N 0, and for every d E m
Let
to
for now assume
if there exists a sequence of states
qi i
and
J, depending on the mode of convergence.
to
We will be more precise later in Section 4;
arbitrary stochastic matrix.
Q
a, then we need only require some
i
kind of condition which guarantees transitions can be made from
reach
S =
k, i.e.,
p(k,k+d)
we want
and
Following standard notation, we shall
p(k,k+d) -= P(kk+d)]i
i
starting at time
= {i E Q: Ui = U}
S
n =
from
For
i =
We might
d
from
i
c Ai ), and the elements of
C Ae
___-
A)
from
Us
are precisely those sequences which have a self transition, say
\ M(
s, with
s-
qss = 0
d E N,
Now for
st >' 0
and
,
i,j E
and
t EO
for some
X E Aid)
>
Ut
such that
let
d-1
U(X) = ] max[O, Ui
-Ui ],
n
n+1
n=O
max
maxtO, Ui
V(X) =
- U i ],
n=O,...,d-1
n+1
O
-Ui ].
max
maxtO, Ui
W(X) =
n
n+l
n=O,...,d-1
Also let
rain
U(X)
umid)
# i,
(
d)
E(dA
)
if
(d)
if
for all
AiJd) =
d e I, and
d<j| 'J
i deIN
for all
by
i,j
E
and
V
Q.
(2.1)
min U(d)
Ui= inf U(d)
d) WJ by replacing U
and
ViJd) Vi
ii
Wi3
ij ,via
above.
Uid),U
in the definitions of
Similarly define
W, respectively,
Finally, if one or both of the indices
i,j E Q
are replaced by
I,J c Q
in
these definitions then an additional minimization is to be performed over the
I,J, e.g.,
elements of
if we replace
(d)
A)
Ui
by
(d)
etc. Note that
W
,
Win
min U
JEJ
iEI,JEJ
(d)V
(d) W)
and
in the definitions of U
then the values of these quantities will in general be changed; however the
values of Uij, Vij,
and
Wij
will be unchanged.
We shall refer to
U(d)) as the transition energy (d-step transition energy) from
xy
for x,y E Q U 2.
Example 2.1
{1,...,5}
from
to
J E Q
labelled with the value of
the corresponding
qij
is shown iff
p(k,k+l)
qij.
Q
matrix, i.e.,
y,
to
In Figure 2.1 we show a state transition diagram for
where transitions are governed by the
i E Q
x
Uxy
Q
-
an edge
, O, in which case the edge is
To obtain the state transition diagram for
matrix,
k E No
0
simply add a self-transition
loop to every state which can make a transition to a higher energy state (if
-8-_____
_
one
is
not
already
present)
and relabel
self-transitions which are allowed under
depicted by broken loops.
edges
P(kk+l)
appropriately.
but not under
The
Q
are
Also observe that the ordinate axis gives the
energy of the corresponding state.
A()
15
the
To illustrate the notation we have
= {(1,1,2,3,4,5),(1,2,3,3,4,5),(1,2,3,4,5,5,)}
M(5) = {(1,1,2,3,4,5)}
U 1 5 -= U1
Ui5
V1
1
=
Let
U4-U
=
V
5
U 1
U
15
i
= UU1
{I,J}
+U4
3 = 4,
4-U 3
= 3,
= 2
be a partition of
Q.
following condition: there exists
energy from
J E J
j
to
I
(U(d) = Uj
that if
I
exists
d 0 < IJI
S
)
U (d
)
and so
of
Uji
J E J).
such that for every
d o0
U(d)
for
for all
for all
to
I, for all
for all j E J.
d > do,
It is easy to show
Infact, in this case there
U(d) = UjZ
for all
j E J.
In Figure 2.2 we show a state transition diagram for
(see Example 2.1).
U4
j
This will allow us to get lower bounds
then this condition is satisfied.
{3,...,7}. Then
(d)
(d)
U 3I
= U361
U (d
such that the d-step transition
equals the transition energy from
for all
Example 2.2
{1,...,7}
d E I
P{x(k+l)d E I I x|d = j}
on the quantity
=
In Section 4 we will often impose the
UGId
Let
(d)
7I
= 1i,
d > 2,
= 2,
d > 3,
3.
I = S = {i E Q:
E
E , then
J E J.
J =
> 1,
I
Note that if we replace
i,
U i < 2} = {1,2,3},
Q =
_=
A(d)
3ij by
there does not exist
Mid)
i
d E
in the definition
such that
U(d)
3. Finite-time Behavior
From the point of view of applications it is important to understand the
Certainly it is interesting
finite-time behavior of the annealing algorithm.
to
know whether
the
this
converges
algorithm
annealing
information
may
well
give
to
according
insight
into
various
criteria,
and
finite-time
behavior.
However this information may also be misleading for the following
reasons.
First, the finite-time behavior of the annealing algorithm may be
quite satisfactory even when the algorithm does not converge, which may well
Second, the finite-time behavior of
be the case for typical applications.
the annealing algorithm may not be clearly related to the convergence rate
when the algorithm does converge, as the following example indicates.
It is a simple consequence of Proposition 4.1(ii) that if
Example 3.1
Q
enough
a,a > 0
k E [, then there exists
and
P
1-step transition matrix
Since
S
P{Yk E S }
-
b > 0
finite-time
and
k
of
the
such that
P{xk E S I}
1
as
annealing
chain
Q
would
and
be
better
finite-time
-
o
behavior
of
the
than
then
annealing
the
4
T.
We now address the question of what is an appropriate
the
k
Of course we would hope that the
stationary chain, for appropriate choice of
assess
[l/IQ13
is at best polynomial while the rate that
1
is at worst exponential.
behavior
=
k E N0.
is chosen such that
T
0 < p < 1
>} 1 - bp ,
P{(xk E S }
1
Q
by setting
is Just the set of persistent states for this chain, it
P{Yk E S
the rate that
p(kk+1)
and some initial distribution, constructed on
is well-known that there exists
Hence assuming that
large enough.
Yk E 0, be a stationary Markov chain with
{Yk}kE,[
Tk = O, and let
(M,A,P).
k
be the matrix obtained from
P
for large
such that
P{xk E S } < 1 - a
Now let
T k > T / log k
T > 0, and
is symmetric and irreducible,
criterion to
algorithm.
For
our
purposes, we are simply interested in finding any state of sufficiently low
energy, i.e.,
an element of
S.
Hence it seems reasonable to lower bound
P{(k E S}
for
k E N0.
However, we observe that by just doubling the
annealing algorithm's memory requirements we can keep track of one of the
minimum energy states visited by the chain up to the current time.
case we are really interested in having visited
S
opposed to
k.
actually
occupying
S
at
time
P{xn E S, some n < k}
appropriate to lower bound
at some time
Hence
for
it
In this
n < k, as
seems
more
k E MN.
We start with a proposition which gives a lower bound on the d-step
transition probability
X E A(d)
of sequences
Proposition 3.1
for
k E MN.
r(X) > 0
Proof
in terms of the transition energies
U(X)
i,j E Q
for
d E f,
Let
Then for every
p(k,k+d) >
where
p(kkd)
T > 0,
k0
> 1,
and
Tk = T / log(k+k)0
i,j E Q
)
r(X)(k+k0 +dl)-U(X)/T,
XEP(d)
k E
0,
and
X
(3.1)
is given in (3.2).
Let
(k)
(Pii
(k,k+l)
for all
E A(d)
i,j E 0
and
if
k E MN.
J
i
Also for every
i,J E 0
(io,...,id)
let
An(X) = max[O,
n
-Ui
Ui
n+1
n
d-1 (k+n)
) ' 0,
rk(X) = TT ri
n-0O
=
0,...,d-l,
k E NO,
n n+l
and
r(X) -=
Tk
Since
d-1
ri
° )
> 0.
n-O
n n+1
is strictly
nondecreasing, so that
(3.2)
(k,k+l)
decreasing,
rk(X) > r(X)
for all
k E MiN.
E
(k,k+d)
d-1
FT7
Pij
(i i
)EAJd) n-0
(k+nI
Pi nin
in n+l
and
+n+l)
hence
rk)
Hence for every
are
i,J
2
d)
(io ...
r! i n )
(a)
> d
(d) rk(X)
r(X)(k exp
+ 0dT,
n=O
XE-'(
-
max[O, U i
1+
T-i1
exp
UA(d) n=O log(k+nin+1 0 +d.-)
ij
d-1 AnW
nT
k+n
]
1
k e
XEAij
XeAi
(1) In Figure 2.1 we have
1
Remarks on Proposition 3.1
r((1,2,3,4,5)) = q1223q34q45
r((2,3,3,4,5)) =
=
445 =
23P33
1 -
2/T
0
2
I0
r((2,3,4,5,5)) = q3q34q45P55
(1)
(2)
nondecreasing as
T
decreases or
On the other hand,
(if
r(X)
is
increases, which reflects the fact
k0
that self-transitions in the sequence
temperature.
is easy to see that
From (3.2) it
k E [N
0.'
Fix
/T
1
have larger probability at lower
X
(k+ko+d-l)-U(X)/T 1 0
as
T 1 0
or
ko T
U(X) > 0), which reflects the fact that transitions to higher energy
states in the sequence
X
have smaller probability at lower temperature.
These two phenomena compete with each other in the lower bound (3.1).
The next theorem gives a lower bound on
[N
by setting
Let
(d)
max Uji,
T , O,
P{xnd E J,
n = O,...,k}
{I,J}
be a partition of
k 0 , 1, and
exp
exp
k d(1-a
d.(a)
+ n)
nO
<tp
for
k E
I = S.
Theorem 3.1
jEJ
P{x n E S, some n < k}
T k = T / log(k+k O )
1-a
a)
] exp[
Q.
for
(d
no]
d(a-1) na-l
exp|
e[a-)
- 12 -
(kd + n)a-1
Also let
d E [N,
k E [NO.
Then
if
T
U,
if
T
if
T
=
U,
U =
k E N0, where
for all
a = U/T,
nO = k0+d-1,
a > 0
and
is given in
(3.5).
Note
In the statement of Theorem 3.1 and in the proof to follow we
suppress the dependence of the constants
U
make this dependence explicit by writing
U ( d)
Proof
and
From Proposition 3.1 for every
a
on
Later, we shall
a (d )
and
i,j E Q
r(X)(k+ko+d-1) - U(X)/T
p(k,k+d) >_
d.
k E [0,
XEA d
where
r(X) , O
is given in (3.2). Hence
k-1
P{Xnd E J, n = O,...,k} _<max P{X(n+l)d E J I Xnd =J}
n=O JEJ
10-
= n=O
F7
(nd (n+i)d)
pji
iEif
min
n-03JEJ
k-l
n=O
(nd + n
o)a
(34)
where
r(X) > 0.
a = min
(3.5)
EJ iEI kXEAi
U(X)<u
(if
U = ~
let
a
be any positive real).
Since
1+x < e x
x E R,
for all
we have
k-1
T
[
~k-1~~~~
a
~k-1i
| < exp[- a
(nd + n O ) a
n=O
nal
0
exp d la
kd, +[ nO
for all
k E MN.
<exp-
a|
-= (d + + n0)a
exp[- a n=0
(nd
0
(kd + n)1a
n
exp[- d
d
O
d
(xd + n )
a
1,
if
a
if
a - 1,
(3.6)
Combining (3.4) and (3.6) completes the proof.
Remarks on Theorem 3.1
= 4
1
1
)
in Figure 2.1. Then
(1)
)
U = U (4
15 = 4
a
I - S
Let
= {5},
J = {1,2,3,4}, and
d
and
min
r(X).
JE{1,2,3,4} XEA),
U(X)<4
Now it is not hard to see that the minimum is obtained by
-
13 -
J = 1 or 2.
Using
the values of
computed in the first remark following Proposition 3.1
r(X)
we have
a1
am
min[1, 4
42/T _
_
Hk
k 21/T
1
k1jT
(2) Note that
0}
P{xnd E J, n E
=
lim P{xnd E J,
n = O,...,k}
n
k-4
=0
< exp-
d(a- ) n- l
so that the bound is potentially useful even when
(3)
Fix
k E N .
(see
(3.6)).
increases,
k0
if
T < U,
T < U.
T
and
k0
in the form
(xd + n(3.7)
J, n = 0,...,k} < exp - a
Since
T > U,
It will be convenient to analyze the dependence of
the upper bound (3.3) on
P{xnd
if
r(X)
is
nondecreasing
we have from (3.5) that
a
as
T
decreases
is nondecreasing as
T
or
k0
decreases or
increases, which reflects the fact that self-transitions in sequences of
transitions from
the other hand,
J
to
I
have larger probability at lower temperature.
1
0 (xd + no)a
d I 0
dx
as
T I 0
or
ko
0 T
(if
On
U > o),
which reflects the fact that transitions to higher energy states in sequences
of transitions from
Since
these
two
J
to
I have smaller probability at lower temperature.
phenomena
compete
minimizing the r.h.s of (3.7) over
T
with
and
each
k0
other
one
could
consider
to obtain the best bound.
(4) We can generalize Theorem 3.1 by replacing
U = max UJd)
with
U'
JEJ
> U
a
(if
and
U' < U
a
then
a = 0
and the upper bound (3.3) is useless).
are both nondecreasing with increasing
minimizing the r.h.s. of (3.7) over
U'
as well as
U'
T
Since
one could consider
and
k0
to obtain
the best bound (see previous remark).
In order to apply Theorem 3.1 we must obtain suitable estimates for the
constants
U(d)
and
a(d).
We are currently investigating this in the
context of a particular problem.
- 14 -
4.
Asymptotic Analysis
In
the
previous
we
section
pointed
out
some
of
difficulties
the
associated with using the asymptotic behavior of the annealing algorithm to
Nonetheless, it is certainly interesting
predict its finite-time behavior.
from a theoretical viewpoint to perform an asymptotic analysis, i.e, to find
conditions under which the
annealing
algorithm does
or does not
converge
according to various criteria, and when the algorithm converges to estimate
the rate of convergence as well.
and then
briefly
our
compare
In this section we address these questions,
results
to
HaJek's
work
and
indicate
some
directions for further research.
We first address the question of what are appropriate criteria to assess
the asymptotic performance of the annealing algorithm.
For our purposes, we
are simply interested in finding any state of sufficiently low energy, i.e.,
an element of
S.
Hence we shall investigate conditions on the
and the annealing schedule of temperatures
(Tk}kEN
Q
matrix
under which one or more
of the following is true:
(i)
P{xk E S i.o.} = 1,
(ii)
P{xk E S}I
(iii)
P{xk E S a.a.} = 1.
1
as
k-
,
Here "i.o." and "a.a." are abbreviations for "infinitely often" and "almost
always", i.e.,
{xk E S i.o.}
{
k--)}
-
E S} =
n
U {xk E S}
na=1 k>n
and
{xk E S a.a.} = lim {xk E S} =
Ok-,
U
n
n=l k>n
{Xk E S}
Since (c.f. (7])
P{xk E S a.a.} < lim P{x k E S)} < I
P{Xk E S} < P{Xk E S i.o.},
(4.1)
it follows that (i),(ii), and (iii) are increasingly strong results and so we
expect increasingly strong conditions under which each is true.
We are also
interested in obtaining the rate of convergence in (ii) as well as conditions
- 15 -
under which (i),(ii), and (iii) do not hold.
We start by giving a proposition which establishes asymptotic upper and
pijk+d)
lower bounds on the d-step transition probability
Uij, for
terms of the transition energy
for
and
in
i,j E Q.
T > O. Then there exists
aij
>
0
such that each of the following is true:
i,j E Q
if
(i)
d E m
Let
Proposition 4.1
k
as
k E N then
for large enough
Tk < T / log k
iJ/T (k,k+d) < a
Pij
< aij
-
i,j E f,
for all
T k > T / log k
if
(ii)
k EN
for large enough
and
Tk
O
k *
as
D then
for all
(k,k+d)
i j / Tlim
Lim
> aij
k
pij
k-.)a
U ii.
such that U (d)
ij
i,j E f
(iii)
for all
-
Proof
We prove (i);
from (i) and (ii).
for all
E A(d)
ij
i,j E f
as
Uij/T
such that I Ud
i,3 E £
)
the proof of (ii) is similar and (iii) follows
So assume
T k < T / log k
for large enough
k E M.
Also, for every
i,J E Q
and
An(X) = max[O, Uin-Ui
d-1
n-O
'
Uij.
let
rk(X) =
then
aij
p(k,k+d)
pij
k E m
for large enough
T k = T / log k
if
n
0,..,d
(k+n)
> 0O,
ni
n n+l
k E IN,
and
r(X) = lim rk(X) = sup r k (X) > O.
kEN
k-n
and
k E M
X = (io,...,
)
and
id
exists in the definition of
That the limit
k
-
i,j E
Hence for every
a).
and is equal to the
sup (kk+l)
ItN
kE Ni Pu
lim (kk+l)
lit ~ Pii
k-)
supremum is a consequence of
r(X)
(since
k-0
Tk
as
n
0'(7,d
ij
~(Itrk+d)
(i 0 ,...
id)E,,jn d-1
(k+n,k+n+l)
7=rrikn)
(d)An=O
(i,·..,id),ij
-Ui]
max[O, U i
exp[ T
k+d-l
nn+1
n+l
n
d-1 ACX)
(d) rk(X)ep[-
=
Tk+n
XEAij
n=O
d-1
Z<
E
rW(X) exp[
-
AlorW
An(X)
IT
(k large enough)
n=O
E(d)
XEAij
k rk(X)
(d) kr(X) /T
k
XEA(d
a ji(d
as
[(d)UX
X
-Ik
ij ,
where
r(X)
()
aij
aiJ
0
U(X)=U (d)
(if
iJd) = c
let
aij
be any positive real).
The following theorem gives conditions under which
by setting
I = S.
Theorem 4.1
Let
(a) there exists
to
I
P{x k E S i.o.} = 1
{I,J}
be a partition of
(b) every
for all
J E J
and assume
d E M such that the d-step transition energy from
equals the transition energy from
Ud) = Uj
Q
J
to
I, for all
J E J),
can reach some
-
i E I
17 -
(max UjI <
JEJ
).
J
J E J
Also let
Tk
-
0
U
as
Proof
- max UjI <
JEJ
k
-
a.
Then
Tk
P{x
/ log k
U
i,j E
E I i.o.} = 1.
a
kUij/U
such that
k E N, and
for large enough
a >0
From Proposition 4.1(ii) there exists
p(tI+d)>
for all
k
>
k
(iJd) = Uij.
n=k
JEJi
large enough,
Hence for every large enough
P{xnd E J, n > k} < T7 max P{X(n+l)d E J
n=k JEJ
=
FT[-1 - min
mm
=7
such that
k E N
I Xnd =j}
P(nd,(n+l)d)
P
i
isn Or
Id,
JE
(nd)UJi/
theorem
,Iot1
and
follows./U
the
< for
se all
n k mino
the0r
allk
E M
iE,
Remarks on Theorem 4.1
1,2,3,4.
by (a).
settin
we0l.
(1)
(nd)UjApIx/U
In Figure 2.1 let
1
a/(-a)
=
Ia
S
=
J
<
-
4.
a
U
and obtae
ins an estimate of the rate of con vergencek as
=
We shall neeand the
following
te
lemma,
(i
w,2here
U
,
Then
ceg
I
JEJ
Remark-o=o(ebIn
the proof of which can be found ins.
g
as
J =
-o{5,
U15 = 4.0,
-Our under which
theorem
next gives conditions
P{x k E S
1
as
k
n E [0
for every
(ii)
rn'3
r = /-a
where
QGj
Q=m+m0O
m~=
m=k
> O.
Let
Theorem 4.2
(a)
there exists
to
I
(b)
every
be a partition of
d E m
such that the d-step transition energy from
(c)
the
U
Then
-
there exists
(d)
as
1
and
T > U,
< a,
= max Uj
JEJ
J E J
I, for all
to
i E I
k
(max UjI < a),
JEJ
is greater than
j
to
-4
C.
the
(minEUIj-Uji] , 0).
JEJ
Tk = T / log k for large enough
j E J
I, for all
to
j
I
from
energy
transition
P{x k E I}
i E I
can reach some
transition energy from
E N.
J
j
j E J),
for all
j E J
and assume
Q
{I,J}
equals the transition energy from
d)
(U (i
ji = U
Also let
= O(k),
_l _ a_
k
(k+l-m)n
k
Furthermore, if we assume
J E J
which can reach some
(UIJ < ),
then
as
P{x k E I} = 1 - O(k-T T),
where
T = min[UIj-UjI
3EJ
Proof
(k,k+d)
for all
for all
i,j E Q.
i,; E £
j
-
a,
by (c) and (d)).
(CO <'r
a1 > 0
From Proposition 4.1 there exists
i
k
1
such that
(4.2)
(4.2)
k E I,
U.
kij1 /Ta
Also from Proposition 4.1 there exists
a2 > 0
(a)
(k,k+d)
pik) >
2
Uij/T
such that
(d)
ij = U ij.
k
large enough,
In the sequel (4.2) ((4.3)) will be
used to upper (lower) bound the probability of transitions from
to
S
I
J
to
(J
I).
Let
E
such that
Jr
and
= {5},
J1...,Jr O
Ujr I < Ujs I
be a partition of
for all
J = {1,2,3,4},
r ( s.
J
such that
-
19 -
for all
For example, in Figure 2.1 let
1 = {4},
so that
UJI = Uj I
J2 = {2,3}, and
J
J
I =
= {1}.
= U
a
a = U /T,
Also let
a.
Note that
r = 1,...,r O.
IKr|, for
= a < 1
U
Kr
UIj /T,
/T,
and
= J.
K
and
J-s
k
-
Finally let
p(j,m,n,r) = P{xkd E Kr, k = m+l,...,n I xmd = J},
and
a(i,j,m,n,r) = P{Xkd E K,r
for
i,j E Q,
m,n E N, and
k = m+l,...,n-1 ; xnd = J
r = l,...,rO.
md = i
k0 E f
Then for every
we can
write
P{xkd E J} = P(k) +
(4.4)
(i)
where
(k d)
P(k)
P= p
P
p(j,kO,k,r O)
(4.5)
jeJ
and
k-1
pk)
p(md)
m=k
0
iEI
for all
k = ko 0k
for all
n = k o,..$,k, and
m = k o,
.. .,k -
1
p(md,(m+l)d)p(Jm+lkr)
jEJ
+1,.... In words,
and
p(k)
Xnd E J
p(k)
01
is the probability that
is the probability that
for all n = m+l,...,k.
(k)
P(k) + P(k)
2
3
4
xmd E I
Xnd E J
nd
for some
We can further write
(4.6)
where
k-1
(k)
3
)
=
m=k
rO
p(md) i _
Pi
iEI
(md,(m+l)d)
Pij
p(J,m+l,k,r)
(4.7)
r=l jEJr
and
k-2
(
for all
k)md)
m=k
k
i
k
when
xnd
m+l
(at some time
....
In words,
makes the transition from
energy back to
m+l,...20
a(i,j,m,n,r-1) p(j,n,k,r),
n=m+2 r=2 jEJr
iEI
ko , +0
ro
I
(P(k)) is the probability that
P(k)
I to J
> m+2) the state in
amongst the states in
at time
J
J
m
it visits at time
with the largest transition
that are visited from time
-,k.
- 20 -
(4.8)
n =
The motivation for the decompostion in (4.6) is as follows.
work directly with (4.4).
Observe that the
how the chain makes transitions from
I
to
P(k)
J
Suppose we
term only keeps track of
but not how it stays in
J.
In this case we are forced to work with the "worst case" scenario where the
chain makes minimum energy d-step transitions from
I
to
J
(with energy
UIJ) and maximum energy d-step transitions from J to I (with energy max
~~~~~(k)
~~~~JEJ
UjI).
In order to show that
p(k)
2
0 as k - o it seems clear that we
would have to require
and
p(k)
terms
transitions from
P(k),p(k)
3 '4
-
0
of
- max U
UI
(4.6) we not
I to J
> 0.
jI
but also how it stays in
(and consequently
in the
p(k)
3
only keep track of how the chain makes
P(k)
2
that we need only require
min[UIj-Uj]
je J
now proceed with the details.
(k)
p k)
We start by upper bounding
k0 E m
On the other hand,
0)
as
k
J.
In order to show that
~
it is not hard to see
-
> 0, which is guaranteed by (c).
We
Using (4.3), for every large enough
we have
k-1
p(J 0 ,kO,k,ro) <
FT max P{x(p+l)d E J I Xd =J}
Q=ko jEJ
k-i
QdQ+)d)]
75
|1 - min)
p(J1d,
ld)
Q=k0
JEJ iE I
k-1
a
FT-
I-
P=ko(-dU
O t
U /T
min
J
(Qd) ijT
iEI,
U d)=ui
ji
ti
k-1
a
jEJ
~-k
(d)
iI,
U jI
u(d);u
ji
k-i
• 1-
f1
a
2
i
1
|
Jo
E J,
k = kO ko+1,.
r = l,...,ro,
by (a).
Combining (4.5) and (4.9) gives for every large enough
P1
<)
[i
=k0
2d
d)a
-
k = ko,ko+
0 1....
21 -
(4.9)
k0 E N
(4.10)
Since
a2 > 0
enough
and
k 0 E N to get
p
where
we can apply Lemma (i) to (4.10) for every large
a < 1
pk) == O(e-b(kd)l-a
o(eb(kd)
)
b = a 2 /(1-a)
as
k
-
',
(4.11)
0.
>
We continue by upper bounding
P3
P4k)
and
First, by almost the
same reasoning that led to (4.9), for every large enough
k-l
n E I
we have
a2
1 -
p(j,n,k,r) <
j E J
a
k = n,n+l...
(Qd)
r =-1,...
,r
O.
(4.12)
Next, suppose that
xmd
i,
Xk E Kr,
Xnd
for some
k = (m+l)d,...,(n-l)d,
for
j'
i,j E Q,
there exists
m E
k E N
n = m+2,m+3,..., and
N,
r = 1,...,r O.
(1 < k < min[n-m-l,kr]), intermediate times
...< ik- 1 < n-l, and distinct intermediate states
Xmd = i,
Xi d
X(m+l)d
=
,
X(n-1)d = Jk'
Let
Then clearly
Jl1''''Jk E Kr
m ( il
such that
j '
for
+,
)d
Q = 1,...,k-1,
j'
Xnd =
(4.13)
A(i,j,m,n,r;k,il,...,ikl, l'''...''jk)
be the event defined by (4.13).
Then we have shown that
G(i,j,m,n,r) <
~.
1'
P{A(ij,m,n,r;kil,...,ik-lJl ....Jk)}
k-l'
Jl'''' 'Jk k
< kr 2 r
(rn-m-2) r
,,k'
P{A(i,j,m,n,r;k,il,...,ik_,
l
max
k-1
j 1 ,·..,jkk
i,j E
n, n =
r = l,...,r O.
- 22
-
m+2,m+3,...,
m E [,
Now using (4.2) and (4.12) it is not hard to show that for large enough
P{A(i,j,m,n,r;k,il,...,ik_ljil,
... ,jk
n-2
k+1
<
a
T
1
md)Uij/T Q=m+k
m E
}
a
d)ar
and consequently
k -1
(i,j,m,n,r)
(n-m-2)r
(md) U ij/ T
0
Q=m+kr
1
-
ij
(Qd) r
n = m+2,m+3,...,
where
cl
is an unimportant
constant.
(4.14)
(4.7),(4.8),(4.12), and
k0 E I
(4.14) gives for every large enough
P
Combining
r = l,...,r O ,
(k)
(k)
+P
3
4
O
+<
k-1
( -rn
1m'kg0 (md):r
z'=l
r=2 m=k
(md.)rd) P=m+l
'0 k-1
r=2
k
,kkg
(md)
r
1
k -1
km)r-
ArQ ~=m+k
a
+
a
Qd) r
r_,(
(4.15)
where
and
c2, C3
are unimportant constants.
(Ar-ar)T = UJ
JrI
mTin [UI-UjI]
-23-
Since
> 0O
a2
, O,
for all
a r < ar
r - l,...,r 0
= a ( 1,
we can
apply Lemma (ii) to each term in (4.15) for every large enough
3(k)
k) + p 4(k))
r-a
(
/Tk-(
O(k/T
)
as
k
ko
0E
to get
,
(4.16)
-
r=l
where the last equality follows from
T = min[UIj-UJI
1
=
JEJ
min
[UIj
-
UJ
m-in
I]
(r-ar)T.
r=,...,rr
0
Finally, combining (4.4),(4.6),(4.11), and (4.16) gives
P{Xkd E J} = O(e -b(kd) 1can
Similarly we
show
E J}, for all
P{xkd+k
that
k
P{xk E J} = O(e
0=
(4.17)
in
k-
replaced by
as
a,
Hence
-r
) + O(k-/T
O0 (and
b,r
(4.17)
.
can be
P{xkd E J}
O,...,d-1.
- bkl -a
and the Theorem follows since
)_+ O(k -r1/T
/T)as
k
-
(4.18)
if (d) is true).
Tr <
S
=
{5},
(2)
Condition (a) was discussed in Section 2 and is satisfied for
I =
(3)
Condition (c) is satisfied for
on
Remarks
J = {1,2,3,4}.
Theorem
Then
U
(1)
4.2
= U15 = 4
In
and
2.1
Figure
let
r = U51-U15 = U1-U
I
=
=1.
S.
min[U
IU
jE
EJ
>
min
I = S
[U ij -Uji] =
min
symmetric since
[Uj-U ]
> .
iEI, jEJ
iE I, jGJ
When condition (d) is not satisfied
(4)
Q
and
(7-=
),
(4.18) shows that
bkl-a
P{xk E II = 1 - O(e
a = U /T
where
and
b , O.
),
as
k * a,
What we have actually shown is that
1l-a
P{x k
E I, some n < k} = 1 - O(e
and this is valid when only (a),(b),
enough
k E m
T = U .
It
are assumed.
is possible to
as
),
T > U , and
k-
a,
T k > T / log k
for large
Theorem 4.1 can be deduced from this by taking
lower
bound
b
in terms
of the
aij 's from
Proposition 4.1, but we shall not do so here.
(5)
follows.
We can get a somewhat better estimate of the rate of convergence as
Let
the partition
I
be the collection of subsets of
{IO,J O}
I
such that
I0 E I
satisfies conditions (a),(b),(c), and (d).
- 24 -
iff
Assume
I • t
that
and let
r(I,)
*,_
r(I o)
-
_
max
U (I*)
r
= r(I.),
T > T
= T (I*).
T
I0 E
U ( o)
for large enough
T k - T / log k
, and
k
E N. Then
O(k - r /T
P{x k E I} = 1 -
The
to
corollary
P(xk E S a.a.} = 1
the
Let
Theorem 4.3
0,
> 0,
e
Tk
and
gives
under
conditions
be a partition of
I to J
<
k
which
I = S.
{I,J}
transition energy from
theorem
next
by setting
as
is positive
(U,-e)
log k
/
and assume that the
Q
(UIj > 0).
Also let
k E
large enough
for
U* = Uij >
N.
Then
P{x k E I a.a.} = P{x k E I i.o.}.
Proof
Let
T = U*-E.
Then from Proposition 4.1(i) there exists
a > 0
such that
(k,k+l) <
a
ij
kUij/
for all
i,j E
n.
'EN,
Hence
P{xk e I, Xk+ 1 E J} < max P{xk+l E J I x k = i} = max
EI
< max
iE-
P(k,k+l)
JE
iEI
aa
IJkan
- < [__a
I
Uij/T
U./T
ji J k
j
k
k E EN,
U*/T > 1,
and since
0c
P{x k E I, Xk+ 1 E J}
k=l
Applying
the
"first"
Borel-Cantelli
Lemma
(c.f.
[7])
we
have
P{x k E I, xk+l E J i.o.} = 0, and the theorem follows.
Corollary 4.1
(a)
there exists
to
I
(U(d
(b)
)
{I,J}
Let
d E N
every
for all
j E J
and assume that
such that the d-step transition energy from
equals the transition energy from
UjI
Q
be a partition of
j
to
I, for all
j E J),
can reach some
-
i E I
25 -
(max U
JEJ
< ),
j
j E J
(c)
the
transition
energy
transition energy from
j
from
to
I
to
I, for all
J
is
j E J
(UI
greater
than
- max Uj
the
,>0).
*E*
Also let
U
=max
for large enough
Proof
U
k E N. Then
Then
Theorem
U
= U15 = 4
4.2,
satisfied, even when
(2)
< T
U*, and
T
= T / log k
Combine Theorems 4.1 and 4.3.
{1,2,3,4}.
of
U
P{xk E I a.a.} = 1.
Remarks on Corollary 4.1
(c)
> 0
= U
(1)
In Figure 2.1 let
and
U* = U54 ° 4.
condition
I = S
and
(c)
Q
of
I =
Hence,
Corollary
4.1
S
= {5},
J =
unlike condition
is
not
generally
is symmetric.
Note that
UIA =
=
min
=
Uj-Ui]
maxO[,
iEI, jEJ,
qiJ >
The
corollary
P{xk E S i.o.} ( 1
to
the
next
by setting
theorem
I = S.
gives
conditions
under
which
By (4.1), these are conditions under
which the algorithm does not converge according to any of our criteria.
Theorem 4.4
Let
{I,J}
be a partition of
(a)
the transition energy from
(b)
every
Also let
E
>0
i E I
and
J
can reach some
Tk < (UJI-e)
to
I
and assume
is positive
j E J
/ log k
Q
(max U )<
i I
J
(UJI > 0),
c).
for large enough
k E N.
P{x k E I i.o.} < 1.
Proof
From Proposition 4.1(i) there exists
p(k,k+l)
ij
for all
i,j E Q.
a
a > 0
k E
U.j./T
'
Hence for every large enough
such that
,
k E m
P{xn E J, n > k} > P{xk E J} F7
min P{xn+ 1 E J I xn =J}
n=k jEJ
P{x k
J}
.1max
p-
n=k
P{xk 6 J} n=k
> P{x k E J} T
1-
JEa iI
max
J aI
Ui
1-
26 -
Uji/T
|]
n
-
Pji
-
Then
Since
Uji/T > 1
by (b)
0
the infinite product converges (to a positive value), and
P{xk E J} > 0
for some large enough
Corollary 4.2
(a)
the
(max U
Also let
k E N.
for infinitely many
P{x n E J, n > k} >
Hence
k E N, and the theorem follows.
Let
{I,J}
be a partition of
transition energy
from
Q
and assume that
j E J
some
to
I
is positive
> 0).
W
> 0,
= max WJI
J
=
{jE
J:
J
to
Wji = W },
I
= Q
\ J ,
and
JEJ
assume that
(b)
the transition energy from
....
Finally let
e
> 0
I
T k < (W -E) / log k
and
is positive
(UJ* *
I
for large enough
> 0).
k E N.
Then
P{xk E I i.o.} < 1.
Proof
Observe that
W
= U*<
and apply Theorem 4.4 to the partition
*
{I ,T }.
In Figure 2.1 let
2
and
J
I = S
= {5},
J = {1,2,3,4}.
Then
W
=
W1 5 = W 2 5 = W35
= {1,2,3}.
We next state a theorem of Hajek's which gives necessary and sufficient
conditions for
P{xk E S }
Theorem (Hajek)
(a)
i
(b)
1
as
if
i
at energy
Uj+Wji, for all
i,j E Q).
= max
c.
-
j, for all
can be reached from
i
d
k
Assume that
can be reached from
reached from
Let
-
V j*
< c,
j
(Q
at energy
U, for all
T > 0, and
i,j E Q
i,j E Q
is irreducible),
U
then
U E R
and
Tk = T / log k
J
can be
(Ui+Wij =
for large enough
k E
js
N. Then
P{xk E S }
Proof
1
-
as
k
-
In
HaJek's
irreducibility"
satisfied
(3)
iff
T > d.
See C93.
Remarks on HaJ.ek's Theorem
(2)
m
for
and
Q
paper
(1)
In Figure 2.1 we have
conditions
"weak reversibility",
(a)
and
(b)
respectively.
d
are
= V15 = 3.
called
Condition
"strong
(b) is
symmetric.
Obviously
W
< d
< U
and the equalities hold only in fairly
- 27 -
trivial
Hence
cases.
under
(a) and
conditions
Hajek's
(b),
stronger than our Theorem 4.2 and Corollary 4.2 with
I = S .
is
Theorem
However, the
conditions under which our results are obtained are different, and in general
weaker than Hajek's, with the exception that condition (c) of Theorem 4.2 can
Also
be true when condition (b) of Hajek's Theorem is false and conversely.
we obtain an estimate of the rate for which
We
close
this
section
by
P{xkE S }
indicating
1
-
we
how
as
can
modifications of the annealing algorithm by our methods.
k
*
a.
analyze
various
Such modifications
might include
matrix to depend on time,
(i)
allowing the
Q
(ii)
measuring the energy differences
(iii)
allowing the temperature
Tk
Uj-U
with random error,
i
to depend on the current state
xk.
The important point to observe in modifications such as these is that our
results depend only on the Markov property of the annealing chain
and the asymptotic behavior of its d-step transition matrix
as
k -
for
co
fixed
d E [N.
{Xk}kE0
{P(k'k+d)}k
our results are based on
In particular,
satisfying one or both of the inequalities
lim
_lim
ki/T (k,k+d)
k i:
> 0
(4.19)
co
(4.20)
k->co
and
U. /T
lim
for appropriate
(k~kfd.)
It13
<
k-e
i,j E Q. Hence our results are valid for any Markov chain
which satisfies (4.19) and/or (4.20) for appropriate
general
the
Uij's
are
not
given
by
(2.1),
i,j E Q.
and
can
Ofcourse in
infact
be
any
non-negative real numbers (or a), with the exception that in Theorem 4.2 we
require
Uij < U i+U j
for certain
the modifications of the
attempting
to extend our
i,j,Q E Q.
We are currently examining
annealing algorithm mentioned above and are
results to more
uncountable) states spaces.
- 28 -
general
also
(countably infinite and
5.
Conclusion
We have analyzed the
simulated annealing algorithm focusing on those
issues most important for optimization.
Here we are interested in finding
good but not necessarily optimal solutions.
finite time
and asymptotic behavior
finite-time
analysis we
gave
We distinguished between the
of the
a lower
annealing
bound on
the
algorithm.
probability
In our
that
the
annealing chain visits a set of low energy states at some time < k, for
k =
1,2,....
This bound may be useful even when the algorithm does not converge
and as such is probably our most important result for applications.
We are
currently engaged in trying to apply this bound to a specific problem.
our
asymptotic
algorithm
criteria.
analysis we
converges
to
obtained
a set
of
In
conditions under which the annealing
low energy
states
according
to
various
HaJek has recently given necessary and sufficient conditions that
the annealing chain converge in probability to the minimum energy states.
gave an estimate of the rate of convergence.
modifications of the annealing algorithm.
modifications
and
to
extend
our
results
We
Our methods apply to various
We hope to explore some of these
to
more
general
state
spaces.
6.
Appendix
Without loss of generality we assume
Proof of Lemma (i)
using the inequality
k
It [ exp-- a k-1
T[
a
x E R
for all
l+x < ex
k 0 - 1.
we have
< exp[- a J k
exp[
- -
1
dax
a
=
ebe-bk 1-a
k E N.
(x+l)y
Then using (A.1) and the inequality
< y <1
(A.1)
Without loss of generality we assume k
Proof of Lemma (ii)
Then
< xy + y
= m 0 = 1.
x > 1
for all
and
0
we have
a ] < eb(m+ ) 1 - a -bkl - a
m[1
-
eae-bkl-a bma
<ee
e
m E N.
k = m+l,m+2,...,
Let
f (kP)
bm
k+l-m)n
-m) ebm
(k
_
mn-
l-a
1,...,k,
k eE M,
n E MO .
Then we can write
k
1 (k+m)n
We shall show that for every
n
MN
0
there exists
S an (k+l -Q) eb1-a +b
fn(k,)
n
n E [N
k E M
f(kk)
eaebkl
[
m+li
an,bn E R
6
=,..
such that
.
k,
k E N,
(A.2)
and consequently
i
(1+m)n
m kP
k
FT
[1 - a
O(k[T),
as
k
o,
m=l
as required.
n E N 0.
Proof of (A.2) is by induction on
g(x) = ebx/xl,
x > 1.
Since
g'(x)
> 0
for large enough
that
fo(k'
)
0
=
m-1
bml-a
<
m
g(x)dx + g(l) + g(m)
- 30 -
First consider n = 0.
Let
x, it follows
-
l-a
b
e + -a
e+-b
+e
1-aI 1
where
-e dx
+ eb
e
6 = (A-a)/(l-a) = r/(l-a) > O.
Then expanding
Let
[6]
1,...,k,
k E N,
be the largest integer
< 6.
ebx in a Taylor series and integrating term by term we have
i i
1-a
f (k,Q)
=
1
cok,
•
-a
S=
+
Q1 a
|1bixi
I
coiI ~ ~i-a
+-
,1 _
a(Q
+ e
dx
aTIC
)6
i-a(i-l)!(i-5)
1-a
b
bi(l-a)i
+--6
I--L
i= ]+11
i
b
+ e
eb
+-
1-a
+ b,
< aO
where
a
l,...,k,
-
= 1 + (1/a)(6j+1)/(6j+1-6)]
N,
b1 = eb
and
n E N0
Next assume (A.2) is valid for
k
Q
and consider
n+1.
Summing by
parts (c.f. [10]) we have
fn+ (kB,)
= (k+l-q)f (k,Q) +
< a
m=l
(k~in+1
(1-a
(k-Q)
e+
< an+
if we set
an+l = a/(a +1)
valid for all
fn(k,m)
n E N.
m
2~b
n+1
+ anf(k,-)
+bn+
e
and
k
b
l
b (a-+2).
1,...,k,
k E N,
By induction (A.2) is
7.
References
[1]
Kirkpatrick, S., C.D.Gelatt, and M.P.Vecchi:
Annealing," Science 220, (1983) 621-680
[2]
Metropolis, N., A.Rosenbluth, M.Rosenbluth, A.Teller, and E.Teller:
"Equations of state calculations by fast computing machines," J. Chem.
Phys. 21 (1953) 1087-1091
[3]
White, S: "Concepts of Scale in Simulated Annealing,"
Thomas J. Watson Res. Cent. (1984)
[4]
Geman, S. and D. Geman: "Stochastic Relaxation, Gibbs Distributions, and
the Bayesian Restoration of Images," IEEE Trans. Pattern Analysis and
Machine Intelligence, Vol. 6 (1984) 721-741
[5]
Gidas, B.: "Nonstationary Markov Chains and the Convergence of the
Annealing Algorithm," Preprint, Div. of Applied Math, Brown University
(1983)
[6]
Mitra, D., F.Romeo, and A.Sangiovanni-Vincentelli:
"Convergence and
Finite-time Behavior of Simulated Annealing," Memo M85/23, Elec. Res.
Lab, College of Eng., U.C. Berkely (1985)
[7]
Billingsley,
(1979)
[8]
Seneta,
(1981)
[9]
Hajek, B.: "Cooling Schedules for Optimal Annealing," Preprint, Dept.
Elec. Eng. and Coor. Science Lab, U. Illinois at Chaimpaign-Urbana
(1985)
E.:
P:
Probability
Non-negative
and Measure,
"Optimization by Simulated
John
Wiley
Matrices and Markov Chains,
Preprint,
and Sons,
IBM
Inc.
Springer-Verlag
[10] Rudin, W: Principles of Mathematical Analysis, McGraw-Hill, Inc. (1976)
energy
5
A
<E
3
~2~~~~
iI
FIGURE 2.1
TRANSITION DIAGRAM FOR EXAMPLE 2.1
energy
~s~ arnaa
1
1
FIGURE
2.2
rr TRANSITION
~
1M
DIAGRAM FOR EXAMPLE 2.2
I c senergy--~
Download