ParalleT11 G,7m.1hachDMas 1E7.pW1riMncs 'W111h Anmm(Bstin

advertisement
M. Tran
Title
1E7.pW1riMncs 'W111h m G,7m.1hachDMas
01MFDi111ated Anmm(Bstin Algethnn
ParalleT11
TDT Ath
by
Mua Dinh Lam Tran
B.S.E.E., Boston University (1987)
SUBMYITED TO THE DEPARTMENT OF AERONAUTICS AND
ASTRONAUTICS INPARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
July, 1989
O Mua Dinh Lam Tran, 1989
Signature of the Author
bepartment of Aeronautics and Astronautics
July 11, 1989
Certified by
Certified by
Accepted by
Dr. Richard E. Harper
Thesis Supervisor, C. S. Draper Laboratory, Inc.
,
- ---;
,-
--Professor Wallace E. Vander Velde
Thesis Supervisor, Professor of Aeronautics and Astronautics
/Professor Harold Y. Wachman
Chairman, Dkpartmental Graduate Committee
-1-Ar
FES 26 '90
Aero
M. Tran
Abstract
1EZPxBpTrUienMCe
'Wdah a 0,7=41TasIsPaDMOerene
Trmav'mag SaesSman& IPTIAubNM
By
Mua Dinh Lam Tran
Submitted to the Department of Aeronautics and Astronautics
on July 12, 1989
in partial fulfillment of the requirements for
the degree of Master of Science
Abstract
A Synchronous Parallel Simulated Annealing Algorithm was designed for the
Traveling Salesman Problem. The speedup of the candidate parallel algorithm was
analyzed. General bounds on speedup of the candidate parallel algorithm were obtained.
On the average, a processor handles O(N) messages in a network, and the communication
overhead is O(N) or O(NpNs) units of time, where Np is the number of processors, N is
the number of cities in the TSP, and Ns is the number of cities in a subtour. As N or Ns
approaches infinity, the speedup is O(Np) and is independent of communication overhead.
For large Np, the speedup is O(Ns). If communication overhead be neglected, the speedup
is oNp + 8, where 0 << a 5 1 and 0 • 8 << 1. Through a computational study, the
behavior of the candidate parallel algorithm was investigated for two annealing schedules:
Tk+1 = cTk and Tk = d/logk, where Tk is the temperature at kth iteration, c and d are
constants under investigation. In comparison between Citywise Exchange and Edgewise
Exchange neighborhood structures, Citywise Exchange provided a better solution.
MdaD• i'nh LAn Tr6n
I
· __I
Dedication
_·
A Diploma at Nether ProvidenceHigh School (1983)
A Bachelor of Science at Boston University (1987)
A Master of Science at MassachusettsInstitute of Technology (1990)
And this thesis (1989)
have been pursued
on behalf of and are dedicatedto
two truly beloved human beings, who have
always dreamed of an educationbut never had
an opoortunitv to see inside a first grade classroom.
my parents:
Mr. Cda aTrks
and
Mrs. A('ui !i
Lam
"Mla,
you are our oldest son and the oldest grandson. You are your
father's right arm and the prospective 'Head'of ourfamily.
You are the sole measure of our true values,
standards and honors. You are
everything we have, and
we love you.'
On 29th September 1978 at 9:48PM,just before the author stepped out
of his front door to begin his "Odysseus"journey.
'Cing cha n/h? nui Thui Sdn;
(ghame nhi nzddc trong ngun ch ra.
9t long thdmme kizi cha;
Cho trong chfr
hiitu mdi a1dao on.'
I
1
M. Tran
Acknowledgements
On the night of 29 th September, 1978, navigating a small fishing boat away from
the shore of Vietnam in a threatening darkness of the Eastern sky, I silently said to myself,
"If I am to live through this journey, I shall never let my parents' dream die in vain."
Although I have imperfectly executed the last phase of this long project, today, I have
completed and fulfilled this quest.
How can I, as an Englishless and penniless refugee who set his feet on this great
nation of America less than a decade ago, be able to compose this page of
acknowledgements? I have been blessed and honored more than I actually deserve. The
journey I have travelled was made only to be experienced--not to be expressed. Neither
"thank you", "gratitude", nor any English word is an adequate word; I hope some day I
will have the opportunity to define it properly from the perspective of a refugee. For now,
TO my beloved family, Dad (Dr. Alan), Mom (Mrs. Erika), Brother Brian and Sister
Samantha Kors, who have cheered me up when I am down; warmed me when I am cold;
comforted me when I am struggling; pulled me to my feet when I am falling flat on my
face; and slowed me down when I am moving too fast. They are the Gothic foundation
upon which I have built, and the central axis of love and faith around which I have
evolved.
TO Grandpop Juston and Grandmom Marcia Wallace and Aunt Bibi, Uncle Phil, Cousins
Jeri and Howard Feintuch, for reminding me to use the $20 gifts to go out for "a good
dinner with a friend", for sending me a ton of different clothes to keep me warm during the
Bostonian winters, and for "having a lot of faith" in me.
TO my first high school teacher and tutor, Mrs. Pat Frantz, without whose dedicated
teaching, I would neither know the Bill of Rights, have learned five-years equivalence of
elementary mathematics in six months nor have been ranked 3rd in my graduating class of
1983.
TO G. De Aragon, for shopping my grocery and keeping our apartment neat and clean.
TO T. A. Le and Dr. N. V. Nguyen, for the persistent prodding, advice, and enthusiasm.
TO K. C. Luong, for the loyal friendship, for the immediate availability when I needed him
most, and for reminding me, "You are the star of your family; keep it shining."
-4-
M. Tran
Acknowledgements
TO the Staff Engineers and Scientists of Draper, especially the Fault-Tolerant Systems
Division, Dr. J. Lala, M. Dzwonczyk, P. Mukai, T. Sims, D. Fuhry, and R. Hain for
constantly prodding me during these past two years, M. Busa for "having a lot of faith" in
me, John P. Deyst for offering me some money to pay my rents, B. Mc Carragher for
giving me his Mac skills so freely, and S. Kim, J. Cervantes and J. Turkovich for
engaging in several fruitful technical discussions.
TO my former Education Director and present V.P. of Engineering: "MIT is not for
everyone. It is hard when you are in it, but it is not that hard when you are out of it. It is
always nice to be a member of the Institute, you know." Dr. David Burke advised wisely;
I understood it fully.
TO my direct Technical Supervisor, Dr. Richard E. Harper: "Relax!" Rick demanded
humanely; I listened impatiently. "MIT first!" Rick commanded authoritatively; I obeyed
soldierly. "I was there (MIT), Sport." Rick comforted teasingly; I was relieved naturally.
TO my Honorable Professor and Thesis Advisor, Prof. Wallace E. Vander Velde:
If any discrepancy is found in this thesis, it is the sole responsibility of the author, if any
result or beauty is perceived in this thesis, it is a mere reflection of the professional
guidance, keen technical insights, and boundless patience of my "humanly-down-to-earth"
Thesis Advisor, without whom the struggle for completion of this thesis could be like the
days and the nights of the ultimate fight for survival in the turbulent South China Sea.
This report was prepared at The Charles Stark Draper Laboratory, Incorporated
under an internal research and development contract.
Publication of this report does not constitute approval by the Draper Laboratory of
the findings or conclusions contained herein. It is published for the exchange and
simulation of ideas.
I hereby assign my copyright of this thesis to The Charles Stark Draper Laboratory
Incorporated, Cambridge, Massachusetts.
Mua D. L. Tran
Permission is hereby granted by The Charles Stark Draper Laboratory,
Incorporated to the Massachusetts Institute of Technology to reproduce any or all of this
thesis.
-5-
Page 6 is missing from the original
thesis submitted to the Institute Archives.
M. Tran
Table of Contents
TAIBELJE
P
TCOITEHT
Tittle .................................................................................................. 1
Abstract ................................................................................................
Acknowledgements ...........................................................................
.. 4
Table of Contents ...........................................................................
.. 7
Lists of Figures and Tables .....................................................
9.......
N om enclature .................................................................................. ..
13
1.0 INTRODUCTION .......................................................
15
1.1
Motivation....................................................15
1.2
Problem Statement .............................................................. 17
TSP and Combinatorial Optimization.......................... ..... . 17
1.3
21
Methodology .............................................................
1.4
1.4.1 Simualted Annealing ........................................... 24
1.4.2 Parallelization ...........................................................
25
28
Objective and Thesis Outline ................................................
1.5
2.0 CLASSICAL SIMULATED ANNEALING ALGORITHM............ 30
30
2.1
Introduction...................................................
2.2
2.3
2.4
Local Optimization ............................. .............................. 32
... 34
................
Statistical Mechanics--A Physical Analogy .........
Classical Simulated Annealing ................................................. 39
3.0 QUANTITATIVE ANALYSIS OF THE SIMULATED
44
ANNEALING ALGORITHM ..........................................
3.1
Introduction ...................................................................... 44
3.2
3.3
Mathematical Model .............................................................
3.2.1 Asymptotic Convergence ...............................................
3.2.2 Annealing Schedules ..................................................
Analysis of the Cost Function ..................................................
47
48
52
54
M. Tran
Table of Contents
4.0 DESIGN OF THE PARALLEL SIMULATED
ANNEALING ALGORITHM .........
.......................
59
4.1
Introduction ....................................................................... 59
4.2
Algorithm Framework and Parallelization Methodology ..................... 60
4.3
Neighborhood Structures ....................................................
.. 64
4.3.1 Citywise Exchange ................................................ 65
4.4
4.5
4.6
4.3.2 Lin's 2-Opt or Edgewise Exchange ................................. 66
Costs of TSP and Subtours.................................
.........
68
Candidate Algorithms ...........................
....... .....
................... ...... 70
4.5.1 Synchronous Parallel Simulated Annealing Algorithm (A) ........ 70
4.5.2 Asynchronous Parallel Simulated Annealing Algorithm (B) ....... 73
Implementation Issues of the Candidate Algorithms....................... 74
5.0 SPEEDUP ANALYSIS OF THE PARALLEL
SIMULATED ANNEALING ALGORITHM ............................
78
5.1
Introduction ...................................................................... 78
5.2
Speedup Analysis of Independent Subtours .................................. 80
5.3
Interprocessor Communication............................................ 85
5.3.1 Message Communication ................................................ 87
5.3.2 Data Communication .................................................... 88
Speedup Analysis of Interprocessor Communication .................... 91
5.4
5.5
5.6
Speedup Analysis of Step 3 of Algorithm A................................ 96
General Bounds on Speedup of Algorithm A ............................... 99
6.0 EMPIRICAL ANALYSIS OF THE PARALLEL
SIMULATED ANNEALING ALGORITHM ...........................111
111
6.1
Introduction ...................................................................
6.2
6.3
6.4
6.5
Analysis Methodology .............................................. 113
....... 115
Annealing Schedule Analysis..............................
Simulated Annealing Versus Local Optimization ........................ 159
Citywise Exchange Versus Edgewise Exchanges..........................166
7.0 SUMMARY AND CONCLUSIONS ................................... 178
APPENDIX A: SIMULATION PROGRAM FOR A SYNCHRONOUS
SIMULATED ANNEALING ALGORITHM ...................
BIBLIO GRAPHY .............................................................
182
223
M. Tran
Figures and Tables
T1373 OF 71GUJIR32 AND TABLES
Figure 2.1
Local Optimization Algorithm .................... .............................
Figure 2.2
Plateau, Local Minima and Global Minimum for the Cost Function....... 33
Figure 2.3
Boltzmann Distribution Curves for an Energy Function
at Various Temperatures .................................... ................... 36
Figure 2.4
General Metropolis Algorithm................................................. 37
Figure 2.5
Analogy Between Physical System and Combinatorial Optimization...... 39
Figure 2.6
Simulated Annealing Algorithm ...................
..............................
41
Figure 4.1
(a) An Arbitrary TSP Tour. (b) TSP Tour is Divided
into Four Subtours.............................................
.61
33
Figure 4.2
High-Level Description of the Parallel Scheme .......................... 63
Figure 4.3
General Algorithm of a Subtour ........................................ 64
Figure 4.4
(a)Algorithm of Subtour with Citywise Exchange ......................... 65
Figure 4.4
(b) Neighborhood Structure with Citywise Exchange....................... 66
Figure 4.5
Neighborhood Structure with Lin's 2-Opt Exchange or
Edgewise Exchange ....................
......................................... 67
Figure 4.6
Algorithm of Subtour with Lin's 2-Opt Exchange ........................... 67
Figure 5.1:
A 2-Processors System with Interprocessor Communication ............ 86
Figure 5.2:
A 4-Subtours System with Interprocessor Communication ................ 90
Figure 6.1
Temperature versus Time for T(k+1) = cT(k) at Different
Values of c and T(0) = 20.0 ................................................... 117
Figure 6.2
Costs versus Time for T(k+1) = cT(k) at c = 0.94,
T(0) = 20.0 and N = 50.....................................118
Figure 6.3
Costs versus Time for T(k+1) = cT(k) at c = 0.95,
T(0) = 20.0 and N = 50 .............. ..................................
Figure 6.4
Costs versus Time for T(k+1) = cT(k) at c = 0.96,
19
1.....
M. Tran
Figuresand Tables
T(0) = 20.0 and N = 50 ......................................
.................. 120
Figure 6.5
Costs versus Time for T(k+1) = cT(k) at c = 0.97,
T(0) = 20.0 and N = 50.....................................121
Figure 6.6
Costs versus Time for T(k+1) = cT(k) at c = 0.98,
T(0) = 20.0 and N = 50.....................................122
Figure 6.7
Costs versus Time for T(k+l) = cT(k) at c = 0.99,
T(0) = 20.0 and N = 50.....................................123
Figure 6.8
Perturbed Costs versus Time for T(k+1) = cT(k)
at Various Values of c, T(0) = 20.0 and N = 50 .........................124
Figure 6.9
Best Costs versus Time for T(k+l) = cT(k)
at Various Values of c, T(0) = 20.0 and N = 50 ..........................125
Figure 6.10 Map of the Best Tour at 1st Iteration for T(k+1) = cT(k),
c = 0.94, T(0) = 20.0, N = 50 and Best Cost = 1376.23............. 126
Figure 6.11 Map of the Best Tour at 15th Iteration for T(k+l) = cT(k),
c = 0.94, T(0) = 20.0, N = 50 and Best Cost = 742.42 ................... 127
Figure 6.12 Map of the Best Tour at 35th Iteration for T(k+l) = cT(k),
c = 0.94, T(0) = 20.0, N = 50 and Best Cost = 529.52 ................... 128
Figure 6.13 Map of the Best Tour at 60th Iteration for T(k+l) = cT(k),
c = 0.94, T(0) = 20.0, N = 50 and Best Cost = 342.27 ................. 129
Figure 6.14 Map of the Best Tour at 93rd Iteration for T(k+1) = cT(k),
c = 0.94, T(0) = 20.0, N = 50 and Best Cost = 319.95 ................... 130
Figure 6.15 Map of the Best Tour at 122nd Iteration for T(k+1) = cT(k),
c = 0.95, T(0) = 20.0, N = 50 and Best Cost = 299.06 ................... 131
Figure 6.16 Map of the Best Tour at 230rd Iteration for T(k+l) = cT(k),
c = 0.96, T(0) = 20.0, N = 50 and Best Cost = 292.27 ...................132
Figure 6.17 Map of the Best Tour at 286th Iteration for T(k+1) = cT(k),
c = 0.97, T(0) = 20.0, N = 50 and Best Cost = 293.79 ................... 133
Figure 6.18 Map of the Best Tour at 277th Iteration for T(k+1) = cT(k),
c = 0.98, T(0) = 20.0, N = 50 and Best Cost = 295.45 ................. 134
Figure 6.19 Map of the Best Tour at 407th Iteration for T(k+1) = cT(k),
c = 0.99, T(0) = 20.0, N = 50 and Best Cost = 296.85 ................. 135
Figure 6.20 Minimum Costs or Quality of Final Solutions versus c
for T(k+l) = cT(k) at T(0) = 20.0 and N = 50 ............................ 136
Figure 6.21 Temperature versus Time for Tk = d/logk at Different Values of d.......140
Figure 6.22 Costs versus Time for Tk = d/logk at d = 5 and N = 50 ................... 141
-10-
-1
M. Tran
Figuresand Tables
Figure 6.23 Costs versus Time for Tk = d/logk at d = 10 and N = 50............... 142
Figure 6.24 Costs versus Time for Tk = d/logk at d = 15 and N = 50................143
Figure 6.25 Costs versus Time for Tk = d/logk at d = 20 and N = 50................144
Figure 6.26 Perturbed Costs versus Time for Tk = d/logk at Various
Values of d and N = 50 .....................................
..................145
Figure 6.27 Best Costs versus Time for Tk = d/logk at Various Values
of d and N = 50 ............................
................................ 146
Figure 6.28 Map of the Best Tour at 1st Iteration for Tk = d/logk
d = 5, N = 50 and Best Cost = 1376.23 ................................... 147
Figure 6.29 Map of the Best Tour at 10th Iteration
for Tk = d/logk, d = 5, N = 50 and Best Cost = 596.85.................148
Figure 6.30 Map of the Best Tour at 43rd Iteration
for Tk = d/logk, d = 5, N = 50 and Best Cost = 317.23...............149
Figure 6.31 Map of the Best Tour at 67th Iteration
for Tk = d/logk, d = 5, N = 50 and Best Cost = 311.63.................150
Figure 6.32 Map of the Best Tour at 203rd Iteration
for Tk = d/logk, d = 5, N = 50 and Best Cost = 307.54..............151
Figure 6.33 Map of the Best Tour at 293rd Iteration for Tk = d/logk,
d = 5, N = 50 and Best Cost = 285.54.......................................152
Figure 6.34 Map of the Best Tour at 300th Iteration for Tk = d/logk,
d = 10, N = 50 and Best Cost = 365.65 ................................... 153
Figure 6.35 Map of the Best Tour at 300th Iteration for Tk = d/logk,
d = 15, N = 50 and Best Cost = 514.874 ............................ 154
Figure 6.36 Map of the Best Tour at 300th Iteration for Tk = d/logk,
d = 20, N = 50 and Best Cost = 630.85 ................................... 1..55
Figure 6.37 Minimum Costs or Quality of Final Solutions Versus d
for Tk = d/logk ................................................................. 156
Figure 6.38 Perturbed Costs versus Time at N = 50 for T(k+1) = cT(k),
T(0) = 20.0, c = 0.96 and Tk = d/logk at d = 5.............................157
Figure 6.39 Best Costs versus Time at N = 50 for T(k+1) = cT(k),
T(0) = 20.0, c = 0.96 and Tk = d/logk at d = 5.............................158
Figure 6.40 Map of the Best Tour at 108th Iteration using Local Optimization
for Tk = d/logk, d = 5, N = 50 and Best Cost = 327.93...............160
Figure 6.41 Local Optimization Costs versus Time
for Tk = d/logk at d = 5 and N = 50 ....................................... 161
-11-
Page 12 is missing from the original
thesis submitted to the Institute Archives.
M. Tran
Nomenclature
F~~pa~
~
n6~PAT1UIR~
PP(Np)
Speedup of the sequential codes of Algorithm A.
C(G)
Cost or objective function of the entire TSP tour.
C(on)
Cost value of the TSP tour at nth iteration as a perturbed cost value.
C(on)B
Best cost value of the TSP tour at nt h iteration as an accepted cost value.
c,d, depth
Constants of the annealing schedules.
(C)r,
Expectation and variance of the cost function.
C,o(T)
Average and standard of deviation of the cost function.
II
Number of serial iterations required for Algorithm A to converge to some
desired cost.
INp
Number of parallel iterations required for Algorithm A to converge to some
desired cost.
L
Markov-chain Length.
Lij(s)
Edgewise Exchange or Lin's 2-Opt.
Ms
Total number of messages that processor i sends to other processrs.
Mr
Total number of messages that processor i receives from other processors.
N,num_nodes Number of cities in the TSP tour.
Ns, cardinalty Number of cities in a subtour or cardinality of a subtour.
Np
Number of processorss in a parallel system.
Or
Total communication overhead for exchanging (transmitting and receiving)
Ns cities among P subtours.
Oa
Average communication overhead of the ith subtour executing on a Npprocessors system.
co(C)
Configuration density.
-13-
M. Tran
Nomenclature
Q(C,T)
Equilibrium-configuration density.
P,G,A
Transition matrix, generation matrix, and acceptance matrix.
P or npe
Number of partitioned subtours in the TSP.
Pij,Gij,Aij
Transition probability, generation probability, and acceptance probability.
q,qi
Stationary distribution and components of stationary distribution.
9Configuration
space.
SR opt
Set of globally minimal confurations.
S
Speedup as a measure of degree of parallelism of the parallel system
performance.
Si
Speedup of Step i of Algorithm A per iteration.
s,s'
Subtour and perturbed subtour.
Ga,0'
Entire current TSP tour and entire TSP perturbed TSP tour.
T,Tk
Annealing schedule and temperature at kth iteration.
Tij(s)
Citywise Exchange.
T1
Execution time of Algorithm A from Steps Al to A7 per iteration using 1
processor.
TNp
Execution time of Algorithm A from Steps Al to A7 per iteration using Np
processors.
TI
Execution time of Step i of Algorithm A per iteration using 1 processor.
Th'
Execution time of Step i of Algorithm A per iteration using Np processors.
Execution time of transmission or receipt of a city between 2 subtours or
processors.
-14-
__
M. Tran
Chapterl: Introduction
CHAPTER 1
1.1 Motivation
Vehicle routing problems involve finding a set of pick up and/or delivery routes
from one or several central depots to various demand points (e.g. customers), in order to
minimize some objective function (minimization of routing costs, or of the sum of fixed
and variable costs, or of the number of vehicles required, etc.). Vehicles may have capacity
and, possibly, maximum-route-time constraints. For example, the problem that arises
when there is a single domicile (depot), a single vehicle of unlimited capacity, unit
demands, only routing costs, and an objective function which minimizes total distance
traveled, is the famous Traveling Salesman Problem (TSP).
Of course, instead of
minimizing the distance, other notions such as time, cost, number of vehicles in fleet etc.
can equivalently be considered. With several vehicles of common capacity, a single depot,
known demands, and the same objective function as the TSP, we have a standard vehicle
routing problem.
A large body of scholarly literature devoted exclusively to the TSP is quite
impressive. One has simply to consult review papers such as Bodin et al. [Bod83] to be
convinced that the TSP is perhaps the most fundamental and prominent, and also the most
intensively investigated, of all unsolved classical combinatorial optimization problems.
Although one can easily state and clearly conceptualize the TSP, it is, in fact, the most
-15-
M. Tran
Chapterl: Introduction
"difficult" and the first problem to be described in the book, Computers and Intractability
[Gar79]; it is also the most common conversational comparator ("Why, it's as hard as the
Traveling Salesman!").
The effort spent on this problem is partially a reflection of the fact that the TSP
encompasses and represents quite a diverse set of practical problems. A specific and
representative example of such practical problems is the application of the TSP in a mission
plan for a fully autonomous/semi-autonomous flight vehicle under development at the C.S.
Draper Laboratory ([Deut85] and [Adams86]). Furthermore, the TSP is an essential
component of most of the general vehicle routing problems such as retail distribution, mail
and newspaper delivery, municipal waste collection, fuel oil delivery, etc.; it also has
numerous, and sometimes surprising, other applications (see, for example, [Len75]).
Motivated by the TSP's wide applicability and computational intensiveness, the
primary goal of this thesis is to design a parallel method to solve the TSP concurrently.
The secondary goal is to analyze the performance of this parallel algorithm with respect to
a "typical" sequential algorithm. And, using a Fault Tolerant Parallel Processor (FTPP),
currently under development at the C.S. Draper Laboratory, as a testbed ([Harp85] and
[Harp87]), and the parallel algorithm being developed, the third goal of this thesis is to
conduct a computational study of the TSP via examination of the quality of its final
solutions.
Before further detailed discussions of the above objectives are outlined in Section
1.5, and methodology is introduced in Section 1.4, it is essential to clearly understand the
definition of the TSP in Section 1.2 and to briefly review its important role in combinatorial
optimization in Section 1.3.
-16-
M. Tran
Chapterl:Introduction
1.2 Problem Statement
The Traveling Salesman Problem can be stated as follows: given a set of N cities
and the distances between any two cities, a salesman is required to visit each of the N cities
once and only once, starting from any city and returning to the original city of departure;
then, "what is the shortest route, or tour, that he must choose in order to minimize the total
distance he traveled?" Again, instead of minimizing the total distance, other notions such
as time, cost, number of vehicles in fleet etc. can equivalently be considered.
Mathematically, it can be formulated as follows: given a set of vertices V = {vo,...,VN)
and the distance, dij, from vi to vj, what is the best ordered cyclic permutation, a = (o(1),
... ,o(N)) of V, that minimizes the cost function,
N-i
C(a)=
Xd a(i).a(i+
)l + d f(N),oa(l))
(1.1)
i= I
It is essential to note that this best permutation is selected by comparison of all
possible cyclic permutations; there are (N-1)!/2 such permutations in the TSP with
symmetric distance matrix.
1.3 TSP and Combinatorial Optimization
Combinatorial optimization is a subject which consists of a set of problems that are
central to the many disciplines of science and engineering [Law76]. Research in these
areas has aimed at developing efficient techniques for finding minimum (or maximum)
values of a function of very many independent variables [Aho74]. This function, usually
called the cost function or objective function, represents a quantitative measure of the
"goodness" of some complex system. It is one of the main reasons that the last two
generations of combinatorial mathematicians and operations research analysts, including
-17-
M. Tran
Chapterl: Introduction
computer scientists and engineers, cumulatively have devoted literally many man years to
the study of combinatorial optimization.
Since the TSP is the most basic and the most representative of all combinatorial
optimization problems in general, and an essential component in most vehicle routing
problems in particular, it is extremely interesting to know how often old methods for
solving the TSP have precipitated and generated many new and important general
techniques in combinatorial optimization. Because it is not the purpose of this thesis to
investigate the historically important role of the TSP, an interested reader is encouraged to
examine a recently published monograph, called The Traveling Salesman Problem
[Law85]; the aim of this section is simply to give the reader a brief flavor and a good
appreciation of its historical importance and to highlight certain key events.
From even the earliest studies of discrete models, the Traveling Salesman Problem
has been a major stimulant to research on combinatorial optimization. Early studies of the
TSP pioneered the use of cutting plane techniques in integer programming of Dantzig,
Fulkerson, and Johnson [Dant54] and were responsible for several important ideas
associated with tree enumeration methods including coining the term "branch and bound"
([Lit63] and [Moh82]). They also introduced problem partitioning and decomposition
techniques in the context of dynamic programming [Held62] that later proved to be fruitful
in other applications of dynamic programming, and in assessing heuristic methods for
combinatorial optimization.
An isolated probabilistic study of the TSP in the plane
[Beard59] has become widely recognized as the seminal contribution to the probabilistic
evaluation of heuristic methods for combinatorial optimization problems.
Many contributions to combinatorial optimization throughout the 1950's and
1960's, for such problem classes as machine scheduling and production planning with
setup costs, crew scheduling, set covering, and facility location problems were extensions
and generalizations on these basic themes. Research focused on the design of optimization
algorithms, usually based upon dynamic programming recursions or somewhat tailored
-18-
M. Tran
Chapterl:Introduction
versions of general-purpose integer programming methods, often for special cases in the
problem class. Studies of scheduling theory as summarized by Conway, Maxwell, and
Miller [Con67] and of uncapacitated inventory and production lot size planning by dynamic
programming are prototypes of this period, as are branch and bound methods for plant
location problems ([Wag58], [Dav69], and [Fed80]).
At the same time that these integer and dynamic programming methods were
evolving, combinatorial optimization was emerging and flourishing as a discipline in
applied mathematics, based, in large part, on the widespread practical and combinatorial
applications of network flow theory [Ford62] and its generalizations such as nonbipartite
matching and matroid optimization [Law76]. Indeed, it is easy to understate the importance
of these landmark contributions in defining combinatorial optimization as we know it
today. In a survey, Klee summarizes much of the research devoted to these topics during
this period [Klee80].
Although researchers were designing and applying heuristic (or approximate)
algorithms during the 1950's and 1960's (for example, exchange heuristics for both the
Traveling Salesman Problem [Lin65] and facility location problem ([Lit63] and [Man64])),
optimization-based methods remained at the forefront of academic activity. The heuristic
algorithms developed at this time may have been progenitors of algorithms studied later,
but their analysis was often of such a rudimentary nature that heuristics did not capture the
imagination and full acceptance of the academic community of this era. Rather than
statistical assessment or error bound analysis, limited empirical verification of heuristics
ruled the 1960's.
Two developments in the 1970's, namely the emergence of computational
complexity theory and the evolution of enhanced capabilities in mathematical programming,
revitalized combinatorial optimization and precipitated a new focus in its research. The
familiar computational complexity theory ([Cook71], and [Karp72]) has shown that the
TSP and nearly every other "difficult" combinatorial problem, the so called NP-complete
-19-
M. Tran
Chapterl:Introduction
(nondeterministic polynomial time complete) class of problems, are all computationally
equivalent; namely, each of these problems has eluded any algorithmic design guaranteed to
be more efficient than tree enumeration, and if one problem could be solved by an
algorithm that is polynomial in its problem size, then they all could. This revelation
suggested that algorithmic possibilities for optimization methods were limited, and
motivated renewed interest to design and analyze effective heuristics. Lin & Kemighan's
variable r-opt exchanges [Lin73] is an example of such heuristics. Again, the TSP was at
the forefront during this era. Worst case (i.e. performance guarantee) analysis [Chris76],
statistical analysis, and probabilistic analysis [Karp77] of various heuristics for the TSP
typified this period of research and were among the first steps in the evolution of interesting
analytic approaches for evaluating heuristic methods.
Indeed, the mere fact that
computational complexity theory embraced the "infamous" Traveling Salesman Problem
undoubtedly was instrumental in the theory's acceptance as a new paradigm in operation
research, computer science and engineering.
Computational complexity theory has become pervasive, so much so that Garey and
Johnson's comprehensive monograph discusses more than 300 combinatorial applications
(320 applications exactly), and the TSP is the first representative problem to be described in
their book [Gar79]. Lenstra and Rinnooy Kan's [Len81] summary of computational
complexity as applied specifically to vehicle routing and scheduling also shows that most of
these problems are NP-complete. Consequently, the Traveling Salesman Problem would
appear to be both a source of inspiration and a prime candidate for analysis by heuristic
methods.
Cumulatively, this fertile decade of 1970's research has yielded much improved
capabilities for applying optimization methods to combinatorial problems, capabilities that
tend to counterbalance the trend, stimulated by computational complexity theory, toward
heuristic methods. As a consequence, heuristic methods provide excellent opportunities for
current algorithmic developments for the Traveling Salesman Problem of the 1980's.
-20-
M. Tran
1.4
Chapterl: Introduction
Methodology
Currently, the search for faster heuristic methods for combinatorial optimization
problems seems to follow two directions. On one hand, the search for faster computing
machinery such as the FTPP [Harp87] has recently received a substantial amount of
attention. On the other hand, there has been a considerable amount of effort devoted to the
development of better algorithms. The number of instances in which these methodologies
and algorithms thus developed to solve the TSP have been used successfully in practical
applications has been growing encouragingly over the past years. These algorithms can be
classified into two categories, namely exact and heuristic (or approximate)algorithms.
An example of such an exact algorithm for solving the TSP is the branch and bound
method [Moh82].
This method is a generalized scheme for limiting the necessary
enumeration of the entire configuration space of the TSP, thus improving on the
exhaustive search technique. It accomplishes this by arranging the configuration space as a
tree and attempting to find bounds by which entire branches of the configuration space may
be discarded from the search. Let us consider a configuration space of an N-city TSP,
which may be represented by a number of binary state variables corresponding to the
presence or absence of a tour edge (or branch of the tour); each edge directly connects city i
and city j. From this configuration space, we can derive that there are N(N-1)/2 such
possible edges, and of course "only" (N- 1)!/2 combinations of the binary state variables to
map to the valid tours. It is important to note that, in this method, a tree is constructed such
that it branches in two directions at each node, depending on whether or not a particular
edge is considered part of the tour. As we descend through the tree, the distance of the
current incomplete tour grows as certain edges considered are included in the tour. If a
certain upper bound has already been established for the optimal tour length, then an entire
-21-
M. Tran
Chapterl:Introduction
branch of the configuration space tree may be eliminated if the current incomplete tour
length already exceeds that bound. Equally important to know that as the algorithm
proceeds through the search tree, lower and upper bounds may be discovered as new
branches are traversed. Although this ability to prune the search tree is vital to the success
of the branch and bound algorithm, such expansion and pruning of the search tree can
continue endlessly.
Though most of these algorithms have aimed for efficiency and computational
tractability, the TSP is a NP-complete problem. In other words, the TSP is unlikely to be
solvable exactly by any algorithm or amount of computation time when the problem size
(the number of cities in the TSP), N, is large. Because of the exponentially dependent
nature of the computation on N, the computing time in employing an exhaustive search in
solving the TSP for the exact solution is practically infeasible. Such infeasibility can be
evidently demonstrated in a simple example. Let us consider a computer that can be
programmed to enumerate all the possible tours for a set of N cities, keeping track of the
the shortest tour.
Suppose this computer enumerates and examines a tour in one
microsecond. At this rate the computer would solve a ten city problem in 0.18 seconds,
which is not too bad. In a fifteen city problem, the computational effort would require
over twelve hours. But, a twenty city problem would require nearly two thousand years!
[Cerv87]. It is not too difficult to render such an algorithm entirely impractical--only a
small increase in the problem size causes a nearly unbounded computation time.
Because of the impracticality in employing exact algorithms in solving the TSP,
there exists fortunately another algorithmic category that constitutes quite a few practical
heuristic (or approximate) algorithms, known as iterative improvement algorithms, for
finding nearly optimal solutions for the TSP and other combinatorial optimization
problems.
Iterative improvement starts with a feasible tour and seeks to improve the tour via a
sequence of interchanges. In other words, it begins by selecting an initial state in the
-22-
M. Tran
Chapterl: Introduction
configuration space and successively applying a set of rules for alternating the
configuration so as to increase the optimality of the current solution. Given a random
starting solution, the algorithm descends on the surface of the objective function until it
terminates at a local minimum. The local minimum occurs because none of the allowed
transitions or moves in the configuration space yield states with lower objective function.
Thus, one application of this algorithm yields what may be a fairly optimal solution. By
repeating this procedure many times, the probability of finding more highly optimal states
is increased. The best-known algorithms of this type are the edge (branch) exchange
algorithms provided by Lin [Lin65] and Lin & Kernighan [Lin73], respectively. In the
general case, r edges in a feasible tour are exchanged for r edges not in that solution as
long as the result remains a tour whose length, distance, or cost is less than that of the
previous tour. Exchange algorithms are referred to as r-opt algorithms where r is the
number of edges exchanged at each iteration.
In an r-opt algorithm, all exchanges of r edges are tested until there is no feasible
exchange that improves the current solution. This solution is then said to be r-optimal
[Lin65]. In general, the larger the value of r, the more likely it is that the final solution is
optimal. Even for approximate algorithms, the number of operations necessary to test all r
exchanges unfortunately increases rapidly as the number of cities increases. As a result,
values of r = 2 and r = 3 are the ones most commonly used.
Lin & Kernighan's (variable r-opt) algorithm, which decides at each iteration how
many edges to exchange, has proven to be more powerful than Lin's (r-opt) algorithm. Lin
& Kernighan's algorithm requires considerably more effort to code than either the 2-opt or
3-opt approach. However, it produces solutions that are usually near-optimal. Since Lin
& Kernighan's algorithm decides dynamically at each iteration what the value of r (the
number of edges to exchange) should be, series of tests are performed to determinine
whether (r+l) edge exchanges should be considered. This process continues until stopping
conditions are satisfied.
-23-
M. Tran
Chapterl: Introduction
Such heuristic algorithms, whose rate of growth of the computation time is a low
order polynomial in N, rather than exponential in N, have been observed to perform well.
Among these heuristic algorithms, a modified iterative improvement heuristic known as
Synchronous Parallel Simulated Annealing Algorithm is selected to investigate the
Traveling Salesman Problem.
1.4.1 Simulated Annealing
Annealing is the process of heating a solid and cooling it slowly so as to remove
strain and crystal imperfections.
The Simulated Annealing process consists of first
"melting" the system being optimized at a high effective temperature, then lowering the
temperature by slow stages until the system "freezes" and no further changes occur. At
each temperature, the simulation must proceed long enough for the system to reach a steady
state. The sequence of temperatures attempted to reach to a steady-state equilibrium is
referred to as an annealing schedule. During this annealing process, the free energy of the
solid is minimized. The initial heating is necessary to avoid becoming trapped in a local
minimum. Virtually every function can be viewed as the free energy of some system and
thus studying and imitating how nature reaches a minimum during the annealing process
should yield optimization algorithms.
In 1982, Kirkpatrick, Gelatt & Vecchi [Kirk82,83] observed that there is an
analogy between combinatorial optimization problems such as the TSP and large physical
systems of the kind studied in statistical mechanics. Using the cost function in place of the
energy function and defining configurations by a set of energy states, it is possible, with
the Metropolis procedure that allows uphill transitions, to generate a population of
configurations of a given optimization problem at some effective temperature schedule.
This temperature schedule is simply a control parameter in the same units as the cost
function.
-24-
M. Tran
Chapterl: Introduction
Just what simulation, i.e. imitation, here means mathematically, along with its
underlying relation with statistical physics, will be the subject of Chapter 2 and Chapter 3.
The resulting method called "Simulated Annealing", which is a heuristic combinatorial
optimization technique that modifies the iterative improvement method by allowing the
possibility of uphill moves in the configuration space, has become a remarkably powerful
tool in solving global optimization problems in general and the TSP in particular.
1.4.2 Parallelization
As briefly mentioned in the section before last, one of the major developments in
computing in recent years has been the introduction of a variety of parallel computers and
the development of parallel algorithms which effectively utilize these computers'
capabilities. Parallel computer is informally meant to be any capable electronic machinery
which performs two or more computations simultaneously or concurrently. Algorithms
that are designed to carry out many simultaneous or concurrent operations are called
"parallel algorithms". In contrast, traditional computers that are designed to execute one
instruction at a time are "sequential computers", and algorithms designed for such
computers are "sequential algorithms".
The development of parallel computers and parallel algorithms is motivated by a
number of objectives. First, sensor systems may be geographically separated which may
dictate a need for distributed computations at sensor sites. Gathering these distributed
informations may be impractical due to limited communication link bandwidth or severe
contention for links between processors. The second goal is the desire to solve problems
in an ever-increasing range of problems. The third objective for parallel computation is the
desire to solve problems more cheaply than with sequential computers. For instance, it is
well known that many optimization problems are very expensive to solve due to one (or
more) of the following reasons: the size of the problem, i.e. number of variables and
-25-
M. Tran
Chapter]:Introduction
constraints, is large; the objective function or constraints are expensive to evaluate; many
iterations or function evaluations are required to solve the problem; a large number of
variations of the same problem must be solved. Indeed, often optimization problems are
not solved or are only "solved" approximately because the cost of solving them with
existing sequential computers and algorithms is prohibitive or does not fit within real-time
constraints. Finally, it may simply be desirable to implement parallelism or concurrency to
increase the computational speed of the algorithm, enabling the solution of problems that
were too time-consuming to be solved in a reasonable amount of time, or allowing the
solution of problems within the time constraints of real-time systems. Questions such as
"what is the speedup," i.e. how much faster the parallel algorithm is over the serial
algorithm, is addressed in Chapter 5. Because of the above objectives, a major research
effort in the design of parallel computers and algorithms has been under way.
Parallel algorithms have been broadly classified into two classes [Kung76]:
synchronous and asynchronous algorithms. A synchronous algorithm is one where a
global "clock" controls the sequence of operations to be performed by all processors. The
advantage of such an algorithm is that the flow of data in the processors is easy to control
and the analysis of the algorithm is somewhat simpler than the asynchronous counterpart.
However, the speed of the algorithm may be dependent upon the computation time of the
slowest processor. For example, it may often be the case that while the slowest processor
is computing, the faster processors are idle waiting for the next operation to be performed.
An asynchronous algorithm, on the other hand, performs computations independently at
each processor according to a local "clock". Each processor computes independently with
the knowledge it currently has. Because each processor is always performing useful
computation, the speed of the algorithm is less dependent upon the slowest processor.
There are several methodologies presently existing for reconstructing a serial
algorithm into a parallel algorithm. One way is the use of the "branch and bound" method
as in [Moh82]. The parallelization of the algorithms for the Traveling Salesman Problem
-26-
M. Tran
Chapterl:Introduction
and the results on speedup and parallelism for synchronous and asynchronous parallel
algorithms were compared. Both approaches were tested on the Cm* multiprocessor
system. In general, it was shown that the asynchronous approach resulted in a higher
speedup than the synchronous counterpart. Furthermore, increased parallelism severely
affected the synchronous algorithm due to bottlenecks and idle processors.
The
asynchronous algorithm, however, behaved reasonably well with increased parallelism.
Another method for parallelization is the use of partition of the Markov chains of the
Simulated Annealing Algorithm [van87]. The basic idea underlying this methodology is to
assign a Markov chain to each available processor and let processors compute and generate
Markov chains simultaneously. Using this method, it is reported that the speedup of this
method is about 6 to 9, i.e. the parallel algorithms are about 6 to 9 times faster than the
sequential algorithms.
Still another method is to partition the Traveling Salesman Problem into
subproblems. Then, each of these subproblems is assigned to a processor which is
responsible for all the computations of that particular subproblem. And, the results are
combined together when all subcomputations are complete. This methodology, ideally,
reduces the computation time by the number of processors used. However, due to
bottlenecks, time delays and processor saturation, the approach usually reduces the
computation time by an amount less than the ideal. This methodology was considered by
Tsitsiklis [Tsit84] and Bertsekas [Bert85], Schnabel [Sch84] and is examined in this thesis.
Within this framework of dividing a problem into subproblems and of categorizing
parallel algorithms into synchronous and asynchronous algorithms, as will be examined in
detail in Chapter 4, a Parallel Simulated Annealing Algorithm is designed by partitioning
the tour of the TSP into subtours. Each subtour is assigned to a processor which is
responsible for computing the current cost of its subtour, perturbing the current subtour
and computing the cost of this perturbed subtour, and performing the annealing process.
Then, the results of these subtours are combined and the annealing process is examined
-27-
M. Tran
Chapterl:Introduction
globally. It is important to note that one must carefully partition the tour of the TSP into
subtours such that one does not introduce too much overheads, thus, defeating the
effectiveness of the parallelization, and these overheads will be addressed in Chapter 5.
1.5 Objective and Thesis Outline
It is hoped that the investigation in this thesis will yield some interesting and useful
results. These expected results are briefly outlined as follows:
(1). The performance of parallel software and hardware systems is measured by means of
an important metric called speedup. Speedup for the Synchronous Parallel Simulated
Annealing Algorithm is analytically analyzed in Chapter 5. In this way, questions such as
"How much faster this candidate parallel algorithm is over a serial algorithm?" can be
evaluated. Furthermore, the second metric known as message capacity is also analytically
studied in Chapter 5; so that question such as "How much message communication
overhead, i.e. congestion and competition, in a network-based parallel processor?" can be
studied.
(2). Since the performance of the Simulated Annealing Algorithm is measured by the
quality of solutions and running times, a computational study of the Synchronous Parallel
Simulated Annealing Algorithm for an instance of the TSP is performed. In this way,
questions such as "Is Simulated Annealing Algorithm a good heuristic?" can be answered
by a comparison of the results of Local Optimization with the Annealing Algorithm.
(3). It has been a well-known fact that neighborhood structures or perturbation functions
have major effects on the performance of the Simulated Annealing Algorithm. Two
specific neighborhood structures, namely Citywise Exchange and Edgewise Exchange, are
selected to investigate the performance of the algorithm. In this way, questions such as
"How strongly do neighborhood structures affect the overall performance of the Simulated
-28-
M. Tran
Chapterl: Introduction
Annealing Algorithm?" can be addressed by analyzing and comparing the results of two
neighborhood structures.
(4). It is also well-known that the performance of the Simulated Annealing Algorithm is
dependent upon the annealing schedules. In order to answer "How will different annealing
schedules affect the behavior of the Simulated Annealing Algorithm?", two different
annealing schedules which are derived from the theoretical efforts on convergence of the
Simulated Annealing Algorithm are investigated experimentally for an instance of the TSP.
These annealing schedules are as follows: Tk+1 = cTk and Tk = d/logk, where c and d are
constants under investigation. In this way, the results of two annealing schedules can be
analyzed, and the effects of these annealing schedules on the overall performance of the
algorithm can be evaluated.
In Chapter 2, the underlying motivation and historical development of the Simulated
Annealing Algorithm are outlined and discussed.
In Chapter 3, the mathematical
foundations of the Simulated Annealing Algorithm are examined. In Chapter 4, the design
of a Parallel Simulated Annealing Algorithms for the TSP is formally developed and
presented. In Chapter 5, the speedup of the Parallel Simulated Annealing Algorithm is
analyzed. The performance analyses and comparisons of the candidate parallel algorithm in
the context of quality of solutions and running times for 2 different neighborhood
structures and 2 different annealing schedules are subjects of Chapter 6. Finally, in
Chapter 7, the main contributions of this thesis are summarized and highlighted, and future
research directions are also suggested.
-29-
M. Tran
Chapter2: Simulated Annealing
CHAPTER 2
CLAICAMIL
Il lULATTID
AHHTAL1HG
2.1 Introduction
As briefly introduced in the previous chapter, Simulated Annealing ([Kirk82,83]
and independently [Cern85]) is one of the most powerful heuristic optimization techniques
for solving difficult combinatorial optimization problems which have been known to belong
to the class of NP-complete problems. This new approach was originally invented and
developed by physicists based on ideas from statistical mechanics and motivated by an
analogy to the behavior of physical systems in the presence of a heat bath. Because the
number of molecules in the physical system of interest is very large, experimental
measurements of the energy of every molecule in the system is practically impossible.
Physicists were thus forced to develop statistical methods to describe the probable internal
behavior of molecules.
In its original form, the Simulated Annealing Algorithm is based on the analogy
between the simulation of the annealing of solids and the problem of solving large
combinatorial optimization problems, where the configurations actually are states (in an
idealized model of a physical system), and the cost function is the amount of (magnetic)
energy in a state. For this reason, the algorithm became known as "Simulated Annealing ".
-30-
Chapter2: SimulatedAnnealing
M. Tran
With the Metropolis procedure, Simulated Annealing offers a mechanism for accepting
increases in the objective function in a controlled fashion. At each temperature setting, an
increase in the tour length is accepted with a certain probability while a decrease in the tour
length is always accepted. In this way, it is possible that accepting an increase will reveal
a new configuration that will avoid a local minimum or at least a bad local minimum. The
effect of the method is that one descends slowly. By controlling these probabilities,
through the temperatures, many random starting configurations are in essence simulated in
a controlled fashion. An analogy similar to this is well-known in statistical mechanics.
The non-physicist, however, can view it simply as an enhanced version of the
familiar technique of "iterative improvement," in which an initial configuration is repeatedly
improved by making small local alterations until no such alteration yields a better
configuration. Simulated Annealing randomizes this procedure in such a way that allows
for occasional "uphill moves," changes that worsen the configurations, in an attempt to
reduce the probability of getting stuck at a poor and locally optimal configuration. Since
the Simulated Annealing Algorithm is a generalization of "iterative improvement" and
because of its apparent ability to avoid poor local optima, it can readily be adapted in
solving new combinatorial optimization problems, thus, offering hope of obtaining
significantly better results.
Ever since Kirkpatrick et al. [Kirk82,83] introduced the concepts of annealing with
incorporation of the Metropolis procedure [Met53] into the field of combinatorial
optimization and applied it successfully to the "Ising spin class" problem, much attention
has been devoted to the research of the theory and applications of Simulated Annealing.
Important fields as diverse as VLSI design ([Kirk82,83] and [Rom85]), and pattern
recognition [Gem84] have been applying Simulated Annealing with substantial success.
Computational results to date have been mixed. For further detailed examinations,
an interested reader is encouraged to refer to Kirkpatrick, Gelatt & Vecchi [Kirk82,83],
Golden & Skiscim [Gold86], and Kim [Kim86].
-31-
Chapter2: Simulated Annealing
M. Tran
In order to fully appreciate the thrust that is underlying the Simulated Annealing
Algorithm as introduced in Section 2.4, it is important to understand Local Optimization
which is briefly reviewed in Section 2.2 and the birth of the Simulated Annealing
Algorithm which is fully discussed in Section 2.3.
2.2 Local Optimization
To gain a real appreciation of the Simulated Annealing Algorithm as will be
described in more detail in Section 2.3 and Section 2.4, one must first understand Local
Optimization. A combinatorial optimization problem can be specified by identifying a set of
configurations together with a "cost function" that assigns a numerical value to each
configuration. An optimal configuration is a configuration with the minimum possible cost
(there may be more than one such configuration). Given an arbitrary configuration to such
a problem, Local Optimization attempts to improve on that configuration by a series of
incremental, local changes. To define a Local Optimization algorithm, one first specifies a
method for perturbing configurations so as to obtain different ones.
The set of
configurations that can be obtained in one such step from a given configuration i is called
the neighborhoodof i. The algorithm then performs the simple loop shown in Figure 2.1
(with the specific methods for choosing i and j left as implementation details).
Although i need not be a global optimal configuration when the loop is finally
exited, it will be locally optimal in that none of its neighbors has lower cost. The hope is
that "locally optimal" will be good enough. Because the locally optimal configuration is
not
-32-
M. Tran
Chapter2: Simulated Annealing
-L
1. Get an initial configuration i.
2. While (there is an untested neighbor of i) do the following:
2.1 Let j be an untested neighbor of i.
2.2 If cost(j) < cost(i), set i = j.
3. Return i.
Figure 2.1: Local Optimization Algorithm.
always sufficient as can be seen from Figure 2.2, the Simulated Annealing Algorithm may
provide the means to find both good locally optimal configurations and possibly a globally
optimal configuration. Hence, it is the topic of discussion of the next section and the
following.
Cos
Cost
Function,
C(i)
Configurations, i
Figure 2.2: Plateau, Local Minima and Global Minimum for the Cost Function.
-33-
Chapter2: SimulatedAnnealing
M. Tran
2.3
Statistical Mechanics--A Physical Analogy
As will be seen in the next section, Simulated Annealing is the algorithmic
counterpart to a physical annealing process of statistical mechanics, using the well-known
Metropolis Algorithm as its inner loop. Statistical mechanics concerns itself with analyzing
aggregate properties of large numbers of atoms in liquids or solids. The behavior is
characterized by random numbers fluctuating about a most probable behavior, namely the
average behavior of the system at that temperature. An important question is: What
happens to the molecules in the system at extremely low temperatures, i.e. about zero
degree? The low-temperature state may be referred to as the ground state or the lowest
energy state of the system. Since low-temperature states are very rare, experiments that
reveal the low-temperature state of a material are performed by a process referred to as
annealing. In condensed matter physics, annealing denotes a physical process in which a
solid material under study in a heat bath is first melted by increasing the temperature of the
heat bath to a maximum value at which all particles of the solid randomly arrange
themselves in the liquid phase; this melted material is then cooled slowly by gradually
lowering the temperature of the heat bath, with a long time spent at temperature near the
freezing point. It is important to note that the period of time at each temperature must be
sufficiently long to allow a thermal equilibrium to be achieved; otherwise, certain random
fluctuations will be frozen into the material and the true low-energy state or ground state
energy will not be reached. The process is like growing a crystal from a melt.
To simulate the evolution of the thermal equilibrium at any given temperature T,
Metropolis et al. [Met53] introduced a Monte Carlo method, a simple algorithm that can be
used both to generate sequences of internal configurations or states and to provide an
efficient simulation of collections of atoms in order to examine the behavior of gases in the
presence of an external heat bath at a fixed temperature (here the energies of the individual
gas molecules are presumed to jump randomly from level to level in line with the computed
probabilities). In each step of this algorithm, a randomly generated atom is given a small
-34-
Chapter2: Simulated Annealing
M. Tran
random displacement, and the resulting change, AE, in the energy of the system between
the current configuration and the perturbed configuration is computed. If AE < 0, the
displacement is accepted, and the configuration with the displaced atom is used as the
starting point of the next step. The case AE > 0 is treated probabilistically: the probability
that the configuration is accepted is P(AE) = exp(-AE/kBT). This acceptance rule of the
new configurations is known as the Metropolis criterion. Random numbers uniformly
distributed in the interval (0,1) are a convenient means of implementing the random part of
the algorithm. One such number is selected and compared with P(AE); if this random
number is less than P(AE), then the new configuration is retained for the next step;
otherwise, the original configuration is used to start the next step. By repeating the basic
step many times and using the above acceptance criterion, one simulates the thermal motion
of the atoms of a solid in thermal contact with a heat bath at each temperature T, thus
allowing the solid to reach thermal equilibrium. This choice of P(AE) has the consequence
that the system in a given state i with energy E(i) evolves into the Boltzmann distribution.
PT(i) = exp(-E(i)/kBT)/Z(T)
where
* Z(T) is
(2.1)
a normalization factor, known as the partitionfunction,
* T is the temperature,
* kB is the Boltzmann constant,
*i is a configuration of molecules in a system,
*E(i) is the energy of configuration i,
* exp (-E/kBT) is known as the Boltzmann factor,
*and, PT(i) is its probability.
Figure 2.3 illustrates the probability distribution curves for an energy function at
various temperatures.
-35-
Chapter2: Simulated Annealing
M. Tran
Figure 2.3: Boltzmann Distribution Curves for an Energy Function at Various
Temperatures.
Note that, as the temperature decreases, the Boltzmann distribution concentrates on
states with the lowest energy, and finally when the temperature approaches zero, only the
minimum energy states have a non-zero probability of occurrence.
In statistical mechanics, this Monte Carlo method, which is the Metropolis
Algorithm, is a well-known method used to estimate averages or integrals by means of
random sampling techniques. The general structure of the Metropolis Algorithm is
summarized in Figure 2.4.
It is important to note that a decrease (downhill) in the change of energy is always
accepted while an increase (uphill) in the change of energy is accepted probabilistically.
After many iterations of the Metropolis Algorithm, it is expected that the configuration of
atoms would vary according to its stationary probability distribution.
The type of acceptance probability used for uphill moves in the Metropolis
Algorithm may be used in the Simulated Annealing Algorithm. The AE of the Metropolis
Algorithm is replaced by the change in the value of the objective function and the quantity
kBT is replaced by the dimensionless version of the temperature, T. Given a sufficiently
-36-
Chapter2: Simulated Annealing
M. Tran
low temperature, the distribution of configurations of the optimization problem will
converge to a Boltzmann distribution that sufficiently favors lower objective function states
(the optimal states). The probability of accepting any uphill moves approaches zero as the
temperature approaches zero. As a result, approaching thermal equilibrium requires an
unacceptably
1. Generate an initial random state i of the system.
2. Set the initial temperature T > 0.
3. while (not yet "frozen") do the following:
3.1 While (not yet in "thermal equilibrium" do the following:
3.1.1 Perturb atom from state i to state j.
3.1.2 Compute: AE = Energy(j) - Energy(i).
3.1.3 If AE : 0
*decreased energy transition*
Then set i =j.
3.1.4 If AE > 0
*increased energy transition*
Then set i = j with probability = exp(-AE/kBT).
3.2 Set T = update(T).
*reduce the temperature*
4. Return i.
*return the best state*
Figure 2.4: General Metropolis Algorithm
large number of steps in the algorithm. The general approach of Simulated Annealing is to
let the algorithm spend a sufficient amount of time at a higher temperature, and is then
slowly lowering the temperature by small incremental steps. The process is then repeated
until a sufficiently low temperature has been obtained, i.e. T = 0. This is faster than simply
setting the temperature initially to a low value and waiting for configurations of substances
to reach thermal equilibrium. Annealing may be considered as the process of cooling
-37-
M. Tran
Chapter2: Simulated Annealing
slowly enough so that phase transitions are allowed to occur at their corresponding critical
temperatures. Thus, to obtain pure crystalline systems, the cooling phase of the annealing
process must proceed slowly while the system freezes.
However, it is well known [Kirk82,83] that if the cooling is too rapid, i.e. if the
solid or crystal structure is not allowed to reach thermal equilibrium for each temperature
value, defects and widespread irregularities or non-equilibrium states can be 'frozen' or
locked into the solid, and metastable amorphous structures corresponding to glasses can
result rather than the low energy crystalline lattice structure. Furthermore, this process is
known in condensed matter physics as "rapid quenching"; the temperature of the heat bath
is lowered instantaneously, which results in a freezing of the particles in the solid into one
of the metastable amorphous structures. The resulting energy level would be much higher
than it would be in a perfectly structured crystal. This "rapid quenching" process can be
viewed as analogous to Local Optimization. When crystals are grown in practice, the
danger of bad "local optima" is avoided because the temperature is lowered in a much more
gradual way, by a process that Kirkpatrick calls "careful annealing". In this process, the
temperature descends slowly through a series of levels, each held long enough for the
crystal melt to reach "equilibrium" at that temperature. As long as the temperature is
nonzero, uphill moves remain possible. By keeping the temperature from getting too far
ahead of the current equilibrium energy level, we can hope to avoid local optima until we
are relatively close to the ground state.
The correspondent analogy we are seeking now presents itself. Each feasible
configuration of the combinatorial optimization problem or each feasible tour of the TSP
corresponds to a state of the system; the configuration space of the combinatorial
optimization problem or the permutation space of the TSP corresponds to the state space of
the system; the cost or objective function corresponds to the energy function; the objective
value associated with each feasible tour corresponds to the energy value associated with
each state of that system; the optimal configuration or tour associated with the optimal cost
-38-
Chapter2: SimulatedAnnealing
M. Tran
value corresponds to the ground state associated with the lowest energy value of the state
of the physical system. The analogy is summarized in Figure 2.5.
Physical System
Optimization Problem
Traveling Salesman I
State
Feasible Configuration
Feasible Tour
State Space
Configuration Space
Permutation Space
Ground State
Optimal Configuration
Optimal Tour
Energy Function
Cost Function
Cost Function
Energy
Cost
Cost
Rapid Quenching
Local Optimization
Local Optimization
Careful Annealing
Simulated Annealing
Simulated Annealing
Figure 2.5: Analogy Between Physical System and Combinatorial Optimization
2.4 Classical Simulated Annealing
As was discussed in Section 2.2 and illustrated by Figure 2.2, the difficulty with
Local Optimization is that it has no way to "back out" of the unattractive local optima
because it never moves to a new configuration unless the direction is "downhill," i.e. to a
better value of the cost function, Simulated Annealing is an approach that attempts to
avoid the entrapment in poor local optima by allowing an occasional "uphill" move. This is
done under the influence of a random number generator and an annealing schedule. The
attractiveness of using the Simulated Annealing approach for combinatorial optimization
problems is that transitions away from a local optimum are always possible when the
temperature is nonzero. As pointed out by Kirkpatrick et al., the temperature is merely a
control parameter; this parameter controls the probability of accepting a tour length
-39-
M. Tran
Chapter2: Simulated Annealing
increase.
As such, it is expressed in the same units as the objective function. In
implementing the approach, any improvement procedure could be used.
As was seen, the Metropolis Algorithm can also be used to generate sequences of
configurations of a combinatorial optimization problem. In that case, the configurations
assume the role of the states of a solid while the cost function C and the control parameter
called the annealing schedule, T, take the roles of energy and the product of temperature
and Boltmann's constant, respectively. The Simulated Annealing Algorithm can now be
viewed as a sequence of Metropolis Algorithms evaluated at each value of the decreasing
sequence of the annealing schedule, which is defined to be T =
Itl, t2,
... ,t), where tl >
t2 > ... >tn-1 > tn. It can thus be described as follows. Initially, the annealing schedule is
given a high value, and a sequence of configurations of the combinatorial optimization
problem is generated. As in the iterative improvement algorithm, a generation mechanism
is defined, so that, given a configuration i, another configuration j can be obtained by
choosing at random a configuration from the neighborhood of i. The latter corresponds to
the small perturbation in the Metropolis Algorithm. Let AC(i,j) = C(j) - C(i), then the
probability for configuration j to be the next configuration in the sequence is given by 1 if
AC(i,j) < 0, and by exp(-AC(i,j)/T) if AC(ij) > 0 (the Metropolis criterion). Thus, there is
a non-zero probability of continuing with a configuration with higher cost than the current
configuration.
This process is continued until equilibrium is reached, i.e. until the
probability distribution of the configuration approaches the Boltzmann distribution, now
given by
PT( configuration = i) = qi(T) = exp(-C(i)/T)/Q(T)
Where
.Q(T) is a normalization constant depending on the
annealing schedule T, which is equivalent to the
partition function Z(T).
-40-
(2.2)
Chapter2: SimulatedAnnealing
M. Tran
And, the probability distribution curves for the cost function is analogous to Figure 2.3
with E(i) is replaced by C(i).
The annealing schedule T is then lowered in incremental steps, with the system
being allowed to approach equilibrium for each step. The algorithm is terminated for some
small value of T, at which virtually no further deteriorations or increases in cost are
accepted. The final 'frozen' configuration is then taken as the optimal configuration of the
problem under consideration. The main steps in the Simulated Annealing Algorithm are
outlined in Figure 2.6.
1. Generate an initial random configuration i.
2. Set the initial temperature T > 0.
3. While (not yet "frozen") do the following:
3.1. While ("inner loop iteration" not yet satisfied) do the following:
3.1.1. Select random neighbor j from configuration i.
3.1.2. Compute: AC(i,j) = Cost(j) - Cost(i);
3.1.3. If AC(i,j)
0
* downhill transition *
Then i =j.
3.1.4. If AC(i,j) > 0
* uphill transition *
Then set i = j with probability = exp(-ACf/).
3.2. Set T = update(T).
4. Return i.
* reduce the temperature *
* return the best configuration *
Figure 2.6: Simulated Annealing Algorithm
Thus, as with iterative improvement, we have again a generally applicable
approximation algorithm: once configurations, a cost function and a generation mechanism
-41-
M. Tran
Chapter2: Simulated Annealing
(or, equivalently, a neighborhood structure) are defined, a combinatorial optimization
problem can be solved along the lines given by the description of the Simulated Annealing
Algorithm. The heart of this procedure is the loop at Step 3.1, and the importance of this
step will be further analyzed in a subsequent chapter when a Parallel Simulated Annealing
Algorithm is discussed. Note that the acceptance criterion is implemented by drawing
random numbers from a uniform distribution on (0,1) and comparing these with exp(AC(i,j)/T). Note also that exp(-AC(i,j)F') will be a number in the interval (0,1) when AC
and T are positive, and so can rightfully be interpreted as a probability. Note also how this
probability depends on AC and T. The probability that an uphill move of size AC will be
accepted diminishes as the temperature declines, and, for a fixed temperature T, small
uphill moves have higher probabilities of acceptance than larger ones. This particular
method of operation is motivated by a physical analogy of the physics of crystal growth
described in the last section.
The main difference between the Simulated Annealing Algorithm and the Metropolis
Algorithm is that the Simulated Annealing Algorithm iterates with variable temperature
while the Metropolis Algorithm iterates with a constant temperature. As the temperature is
slowly decreased to zero or annealed, the system approaches to steady state equilibrium.
This implies that the cost function should converge to a global minimum. It is worthy to
emphasize that the cooling or annealing process should be done slowly; otherwise, the
system can get stuck at a local minimum.
Ever since Kirkpatrick had recognized the physical analogy between statistical
mechanics and combinatorial optimization, the Simulated Annealing Algorithm has been
important in many disciplines. Not only has it been successfully applied in many important
fields of science and engineering but also it has been one of the major stimulants of
research in the academic and industrial communities. The force that makes the Simulated
Annealing Algorithm powerful is its inherent ability to avoid and/or to escape from being
-42-
M. Tran
Chapter2: Simulated Annealing
entrapped at local minima, which are so many for a medium-size combinatorial optimization
problem in general and the TSP in particular.
In this chapter, the underlying motivation and historical development of the
Simulated Annealing Algorithm has been covered. To provide some useful results for the
subsequent chapters, a mathematical model and a quantitative analysis of the Simulated
Annealing Algorithm are studied in the next chapter.
-43-
M. Tran
Chapter3: QuantitativeAnalysis
CHAPTER 3
3.1 Introduction
In Chapter 1, a brief description of the Simulated Annealing was introduced. In
Chapter 2, the origin and the motivation of Simulated Annealing were examined in detail,
and the Algorithm was outlined. In this chapter, certain key mathematical concepts which
are the underlying foundation of Simulated Annealing will be investigated.
The Simulated Annealing Algorithm can be modelled mathematically by using
concepts of the theory of Markov chains. Since a detailed analysis of these Markov chains
is beyond the scope of this thesis, they are extensively discussed and proved by a number
of authors ([van87], [Mit85], [Gem84], and [Haj85]) that under certain conditions, the
algorithm converges asymptotically to an optimal solution. Thus, asymptotically, the
algorithm is an optimization algorithm. In practical applications, however, asymptoticity is
never attained and thus convergence to an optimal solution is no longer guaranteed.
Consequently, in practice, the algorithm is an approximate algorithm.
The performance analysis of an approximate algorithm concentrates on the
following two quantities:
-44-
Chapter3: QuantitativeAnalysis
M. Tran
*The quality of the final solution obtained by the algorithm, i.e. the difference in
cost value between the final solution and a globally minimal configuration;
*and, the running time required by the algorithm.
For the Simulated Annealing Algorithm, these quantities depend on the problem instance as
well as the annealing schedules.
Traditionally, three different types of performance analysis are distinguished,
namely worst-case analysis [Law85, Chapter 5], average-caseanalysis [Law85, Chapter 6]
, and empirical analysis [Law85, Chapter 7]. The worst-case analysis is concerned with
upper bounds on quality of the final solutions, i.e. how far from optimal the constructed
tour can be, while the average-case analysis is focused on the expected values of quality of
the final solutions and running times for a given probability distribution of the problem
instances.
Empirical analysis here means the analysis originating in or based on
computational experience. In other words, solving many different instances of the TSP
with different annealing schedules and drawing conclusions from the results, with respect
to both quality of solutions and running time. In this way, the effects of the annealing
schedules on the algorithm can be analyzed. It is interesting to analyze these effects
because, even for a fixed instance, the computation time and the quality of the final solution
are random variables, due to the probabilistic nature of the algorithm. All three approaches
are attempts to provide the information that will help in answering the question 'How well
will the algorithm perform (how near to optimal will be the tours it constructs) on the
problem instances to which I intend to apply it?' Each approach has its advantages and its
drawbacks.
Worst-case analysis can provide guarantees that hold for individual instances and
does not involve the assumption of any probability distribution. The drawback here is that,
since the guarantee must hold for all instances, even ones that may be quite atypical, there
may be a considerable discrepancy in the behavior of an algorithm. Empirical analysis can
be most appropriate if the problem instances on which it is based are similar to the problem
-45-
M. Tran
Chapter3: QuantitativeAnalysis
of interest. It may be quite misleading if care is not taken in the choice of test problems, or
if the test problems chosen have very different characteristics from those at hand. Averagecase (or average ensemble) analysis can tell us a lot, especially when we will be applying
the algorithm to many instances having similar characteristics. However, by its nature, this
type of analysis must make assumptions about the probability distribution on the class of
instances, and if these assumptions are not appropriate, then the results of the analysis may
not be germane to the instances at hand.
A final problem with both average-case analysis and worst-case analysis of
heuristics comes from the rigorous nature of both approaches. Analyzing a heuristic in
either way can be a very challenging mathematical task. Heuristics that yield nice
probabilistic bounds may be inappropriate for worst-case analysis, and heuristics that
behave well in the worst-case are often exceedingly difficult to analyze probabilistically. In
addition, many heuristics (including quite useful ones such as that of Lin & Kernighan
[Lin73]) do not seem to be susceptible to either type of analysis.
When studying the Simulated Annealing Algorithm, an additional probabilistic
aspect is added to the above classification. Besides the probability distribution over the set
of problem instances, there is also a probability distribution over the set of possible
solutions for a given problem. Thus, in an average-case analysis, the average can be
referred to as the average of the set of solutions of a given problem instance.
In this chapter, and computationally studied in Chapter 6, a combination of both the
average-case analysis (or known as average ensemble) for the set of solutions of a given
problem instance and empirical analysis grouped as "semiempirical" average-case analysis
will be investigated for two representative instances of the Traveling Salesman Problem.
Using these instances to present a "semiempirical" average-case analysis of the algorithm
by running it a number of times, it is possible to reproduce the observed behavior by using
standard techniques from statistical physics discussed in Chapter 2 and some assumptions
on the configuration density. Presently, a systematic investigation of the typical behavior
-46-
M. Tran
Chapter3: QuantitativeAnalysis
and the average-case performance analysis of the Simulated Annealing Algorithm remains
as an open research problem.
In Section 3.2, the core mathematical model of the Simulated Annealing Algorithm
based on the theory of Markov chains is represented and discussed. In this section, the
salient features of the annealing schedules which will be useful in the computational study
in Chapter 6 are also highlighted. And, the analysis of the cost function is presented in
Section 3.3.
3.2
Mathematical Model
A combinatorial optimization problem can be characterized by the configuration
space 91 , denoting the set of all possible configurations i, and a cost function C: 91 -- R,
which assigns a real number C(i) to each configuration i. C is assumed to be defined such
that the lower value of C, the better the corresponding configuration, with respect to the
optimization criteria. This can be done without loss of generality. The objective is to find
an optimal configuration i* for which
C(i*) = Cmin = min{C(i) Ii
E
91}
(3.1)
where .Cmin denotes the minimum cost.
To apply the Simulated Annealing Algorithm, a mechanism known as the
neighborhood structure or the perturbation function, which will be defined precisely in
Chapter 4, is used to generate a new configuration, i.e. a neighbor of i, by a small
perturbation. A neighborhood j is defined as the set of configurations that can be reached
from configuration i by a single perturbation. The Simulated Annealing Algorithm starts
off with a given initial configuration and continuously tries to transform a current
configuration into one of its neighbors by applying a perturbation mechanism and an
-47-
M. Tran
Chapter3: QuantitativeAnalysis
acceptance criterion. The acceptance criterion allows for deteriorations in the cost function,
thus enabling the algorithm to escape from local minima.
3.2.1 Asymptotic Convergence
As mentioned in the last section, the Simulated Annealing Algorithm can be
formulated as a sequence of Markov chains, each Markov chain being a sequence of trials
whose outcomes, Xl, X2, ... , satisfy the following two properties:
(i) Each outcome belongs to a finite set of outcomes (1, 2, 3, ...
,
n) called the
configuration space 91 of the system; if the outcome on the kth trial is i, then the system is
said to be in state i at time k or at the kth step.
(ii) The outcome of any trial depends at most upon the outcome of the immediately
preceding trial and not upon any other previous outcome, i.e. the outcome of a given trial
only depends on the outcome of the previous trial; with each pair of states or configurations
(ij) there is given the probability Pij such that j occurs immediately after i occurs.
Such a stochastic process is called a (finite) Markov chain. The numbers Pij, called
the transition probabilities, can be arranged into a transition matrix P below.
P1
P=
-...
.
Pnl
Pin
.
•••
Pnn
called the transitionmatrix.
Thus, with each configuration i, there corresponds the ith row (Pil, Pi2, ... ,Pin) of
the transition matrix P; if the system is in configuration i, then this row vector represents
the probabilities of all the possible outcomes of the next trial and so it is a probability
vector, whose row sum is always equal to 1.
Note that the outcomes of the trials here are the configurations. For example, the
outcome of the given trial is the perturbed configuration j while the outcome of the previous
trial is the current configuration i. So, a Markov chain is described by means of a set of
-48-
Chapter3: QuantitativeAnalysis
M. Tran
conditional probabilities Pij(k - l,k) for each pair of outcomes (i,j); Pij(k - 1,k) is the
probability that the outcome of the kth trial is j, given that the outcome of the (k - I)th trial is
i. Let ai(k) denote the probability of outcome i at the kth trial, then ai(k) is obtained by
solving the recursive relation:
ai(k) = . al(k-1).Pli(k-1,k), k = 1, 2,...,
1
(3.2)
where the sum is taken over all possible outcomes.
Let X(k) denotes the outcome of the k th trial. Then,
Pij(k-l,k) = Pr{X(k) = j I X(k-1) = i)
(3.3)
and
ai(k) = Pr{X(k) = i)
(3.4)
If the conditional probabilities do not depend on k, the corresponding Markov chain
is called homogeneous, otherwise it is called inhomogeneous.
In the case of the Simulated Annealing Algorithm, the conditional probability Pij(k 1,k) denotes the probability that the kth transition is a transition from configuration i to
configuration j. Thus, X(k) is the configuration obtained after k transitions. In this view,
Pij(k - 1,k) is the transition probability and the I9 I x I19 I-matrix P(k - 1,k) the transition
matrix.
The transition probabilities depend on the value of the annealing schedule T. Thus,
if T is kept constant, the corresponding Markov chain is homogeneous, and its transition
probability, i.e. the probability that a trial transforms configuration i into configuration j, is
defined as
-49-
Chapter3: QuantitativeAnalysis
M. Tran
A ij(T)Gj (T)
Pij(T)= 1Where
A, (T)Gik(T)
ke9t,k i
if i • j
if i=j
(3.5)
.Pij(T) denotes the transition probability.
.Gij(T) denotes the generation probability, i.e. the probability of
generating configuration j from configuration i.
.Aij(T) denotes the acceptance probability, i.e. the probability of
accepting configuration j given the configurations i and j.
.And T is the annealing schedule.
Each transition probability is defined as the product of the following two
conditional probabilities: the generation probability Gij(T) of generating configuration j
from configuration i, and the acceptance probability Aij(T) of accepting configuration j,
once it has been generated from i. The corresponding matrices G(T) and A(T) are called
the generation and acceptance matrices, respectively. As the result of the definition in
Equation 3.5, P(T) is a stochastic matrix, i.e. Vi: YjPij(T) = 1.
. a homogeneous algorithm: the algorithm is described by the sequence of
homogeneous Markov chains. Each Markov chain is generated at a fixed value of T and T
is decreased in between subsequent Markov chains, and
. an inhomogeneous algorithm: the algorithm is described by a single
inhomogeneous Markov chain. The value of T is decreased in between subsequent
transitions.
It is not within the scope of this chapter to analyze these two different types of
algorithms, they are discussed extensively in [van87].
The Simulated Annealing Algorithm obtains a global minimum if, after a large
number of transitions, K, i.e. K-- oo, the following relation holds:
-50-
Chapter3: QuantitativeAnalysis
M. Tran
Pr(X(K) E 9 opt) = 1,
(3.6)
where 91 opt is the set of globally minimal configurations.
Equation 3.6 can be proved under a number of conditions on the probabilities
Gij(T) and Aij(T); asymptotically, i.e. for infinitely long Markov chains and T -- 0, the
algorithm finds an optimal configuration with probability equal to 1. The proof is based on
the existence of an equilibrium distribution [van87]. Let X(k) denote the outcome of the
kth trial of a Markov chain; then under the condition that the Markov chains are irreducible,
aperiodic, and recurrent, there exists a unique equilibrium distribution given by the 19 Ivector q(T). The component qi(T) denotes the probability that a configuration i will be
found after an infinite number of trials and are given by the following expression:
qi(T) =lim Pr{X(k)=i IT}
= lim ([pk(T)]Ta)i
k-4 *
(3.7)
where a denotes the initial probability distribution of the configurations and P(T) the
transition matrix, whose entries are given by the Pij(T).
Under certain additional
_ 0 to a
conditions on the probabilities Gij(T) and Aij(T), the algorithm converges as T -uniform distribution on the set of optimal configurations, i.e.,
lim ( lim Pr{X(k) = i T)= lim qi(T) = n
k-,T-40
T-,o
(3.8)
and
I%1 -II0
if iE91O
elsewhere
-51-
(3.9)
-M
M. Tran
Chapter3: QuantitativeAnalysis
where 91 opt denotes the set of optimal configurations.
Here, we apply the standard form of the Simulated Annealing Algorithm, i.e., the
perturbation probability Gij(T) is chosen independent of T and uniformly over the
neighborhood of a given configuration i. The acceptance probability is chosen as
A,
exp ( - AC . / T)
u1
if AC .. > 0
if AC..ifAC
<0
0
(3.10)
where ACij = C(j) - C(i). For this choice the components of the equilibrium distribution
take the form
exp {[Cmin - C(i)] / T}
(T)exp {[Cm - C(j)] / T}
JEm
(3.11)
The above results are extremely useful when the cost function is analyzed in Section 3.3.
3.2.2 Annealing Schedules
As mentioned previously, the performance of the Simulated Annealing Algorithm is
a function of the annealing schedules.
Hence, it is common that one resorts to an
implementation of the Simulated Annealing Algorithm in which a sequence of Markov
chains of finite length is generated at decreasing values of the annealing schedule.
Optimization is begun at a starting value of the temperature To and continues by repeatedly
generating Markov chains for decreasing values of T until T approaches 0. This procedure
is governed by the annealing schedule. Generally, the parameters used in studying the
performance of the Simulated Annealing Algorithm are (1) the length L of the individual
Markov chains; (2) the stopping criterion to terminate the algorithm; (3) the start value TO of
the annealing schedule; and (4) the decrement function of the annealing schedule.
-52-
M. Tran
Chapter3: QuantitativeAnalysis
The salient features of these parameters which are the subjects of investigation in
Chapter 6 are summarized here. For an extensive treatment of these parameters, the reader
is encouraged to examine reference [van87].
(1). Markov-chain length L: All Markov chains are chosen equally long. In
practice, the number of cities in the TSP tour or the number of runs of the algorithm is
taken to be equal to the length of the Markov chains. For the computational study of
Chapter 6, the Markov-chain length is taken to be the number of runs of the algorithm.
(2). Stopping criterion: There are many criteria for terminating the Simulated
Annealing Algorithm presently existed. To reduce the level of complexity of software
implementation and analysis of computational results, in our study of Chapter 6, the
algorithm is terminated at a certain maximum number of iterations arbitrarilly set by the
user.
(3). Starting value TO: The purpose of the starting temperature value is to begin the
thermal system at a high temperature as discussed in Chapter 2. There are many variations
for the starting value of the annealing schedules. This starting value is as high as 100 and
as low as 20. For the purpose of our computational study in Chapter 6, this starting value
is initially set appropriately to a particular annealing schedule under consideration.
(4). Annealing schedule T: As mentioned in Section 1.5 and various notes
throughout Chapter 2 and this chapter, the performance of the Simulated Annealing
Algorithm is a function of the annealing schedule. Because of this dependence, the
following two well-known annealing schedules which prove to provide good solutions to
the TSP are extensively investigated in Chapter 6 by varying the parameters c and d, i.e.
0.9 < c 5 0.99 and 5 5 d < 30.
and
Tk+1 = CTk;
k = 0, 1, 2, ... , max_iterations
(3.12)
Tk = d/logk;
k = 2, 3, 4, ... , max_iterations
(3.13)
-53-
M. Tran
Chapter3: QuantitativeAnalysis
Note that as a consequence of the asymptotic convergence of the Simulated
Annealing Algorithm, it is intuitively clear that the slower the "cooling" is carried out, the
larger the probability that the final configuration is close to an optimal configuration. Thus,
the deviation of the final configuration from an optimal configuration can be made as small
as desired by investing more computational effort. The literature has not elaborated on the
probabilistic dependence on the parameters of the annealing schedule. In this chapter and
Chapter 6, semiempirical results on this topic are represented. A more theoretical treatment
is still considered as an open research topic.
3.3 Analysis of the Cost Function
In this section, some quantitative aspects of the Simulated Annealing Algorithm are
discussed. The discussion is based on an extensive set of numerical data obtained by
applying the algorithm to a specific instance of the Traveling Salesman Problem [Law85].
The description of the problem instance is given in Chapter 6 when empirical results are
obtained. The behavior of the Simulated Annealing Algorithm is analyzed. In this section,
an analytical approach to derive the expectation and the variance of the cost function in
terms of the annealing schedule is analyzed. The discussion is based on an average-case
performance analysis.
To model the behavior of the Simulated Annealing Algorithm, an analytical
approach to calculate the expectation (Cjr and the variance o2r of the cost function is
2
discussed. Let X denote the outcome of a given trial; the (C)
h and C T can be defined as
Pr{X= il T} C(i)
(C T=
(3.14)
ie
and
-54-
M. Tran
Chapter3: QuantitativeAnalysis
oT= YPr(X=i lT) [C(i)-(C)
]2
i e9
(3.15)
In equilibrium, we obtain, using Equations 3.7 and 3.11,
I exp { [C m - C(i)] / T) C(i)
a
i'r
=
qi(T)C(i )
(C) =
exp {[Cm - C(j)]/T}
ie 9
(3.16)
and
o= ie9 qi (T)[C(
i ) - (C ) ] 2
i'e9
exp { [C
- C(i)] / T)} [C(i)- (C)T ]2
I
-C(j)] / T}
exp { [C
JE 9
(3.17)
Next, the configuration density co(C) is defined as
co(C)dC =
ISI
i91 IC C(i) <C+dC)(
(3.18)
Then, in the case of the Simulated Annealing Algorithm employing the acceptance
probability of Equation 3.10, the equilibrium-configuration density Q(C,T) at a given value
of T is given by
S(C, T) dC =
o(C) exp [ (C m -C) / T]dC
T)
f m(C')exp[(Cm
-
-C') /T]dC'
(3.19)
Clearly, f2(C,T) is the equivalent of the stationary distribution q(T) given by Equation
3.11. As indicated by the notion "equilibrium," Q(C,T) is the configuration density in
equilibrium when applying the Simulated Annealing Algorithm. Thus, one obtains
-55-
M. Tran
Chapter3: QuantitativeAnalysis
(C)
= C'
i(C, T) dC'
(3.20)
and
o=
-_ [C'- (C) T ]2K(C
',
T)dC'
(3.21)
Given an analytical expression for the configuration density o)(C), it is possible to
evaluate the integrals of Equations 3.19--3.21. To estimate o(C) for a given combinatorial
optimization problem is in most cases very hard. Indeed, O(C) may vary drastically for
different specific problem instances, especially for C values close to Cmin.
The average cost C and the standard of deviation o(T) of the cost as a function of
the annealing schedule T when applying the Simulated Annealing Algorithm to an instance
of the TSP are the following expressions,
L
C(T)
= L -1
Ci(T)
i=
1
(3.22)
and
r
L
-1/2
[Ci(T)-(T)
o(T)= L-'
i=
1
(3.23)
(3.23)
where the average is taken over the values of the cost function Ci(T), for i = 1, ..., L, of
the Markov chains generated at a given value of the annealing schedule T. From the above
relations, the behavior of the Simulated Annealing Algorithm is observed for many
different problem instances and is reported by a number of authors (for example, [Kirk83]
and [van87]). Furthermore, some characteristic features of the expectation (C)r and the
variance oyr of the cost function can be deduced. For large values of T, the average and the
standard of deviation of the cost are about constant and equal to C(oo) and oY(oo).
This
behavior is directly explained from Equations 3.16 and 3.17, or Equations 3.18 -- 3.21,
namely
-56-
M. Tran
Chapter3: QuantitativeAnalysis
(c)
=lim (C) =-T
T--*
I
IC(i)
ie 9
(3.24)
and
a 2 = lim
T -*
=
[C(i)-
C) ]
i9
(3.25)
The results presented in this section and Section 3.2.2 are extremely useful when
the computational study is investigated in Chapter 6. And, note that more detailed
estimates of the average-case performance of the Simulated Annealing Algorithm can only
be deduced from a rigorous performance analysis which takes into account the detailed
structure of the optimization problem at hand. Presently, such a theoretical average-case
performance analysis remains to be as an open research problem.
The average-case performance of the Simulated Annealing Algorithm is discussed
by analyzing the expectation and the variance of the cost function as a function of annealing
schedule for a certain instance of the Traveling Salesman Problem; The results can be
summarized as follows:
*the performance of the Simulated Annealing Algorithm depends strongly on the
chosen annealing schedule; this is especially true for the quality of the solution obtained by
the algorithm;
*with a properly chosen annealing schedule, near-optimal solutions may be
obtained.
* computation times can be extremely extensive for some problems. For example,
in solving an instance TSP of 100 cities, a few hundred hours of CPU time on a VAX11/780 has been reported [van87].
-57-
M. Tran
Chapter3: QuantitativeAnalysis
In this chapter, certain key mathematical concepts which are the underlying
foundation of Simulated Annealing was examined. In the next chapter, a full description of
the design of the Parallel Simulated Annealing Algorithm is presented.
-58-
M. Tran
Chapter4: ParallelSimulated Annealing Algorithm
CHAPTER 4
AUNN3ALUlG
SJMUILAT3D
AiOREIITUM
4.1 Introduction
In Chapter 2, the underlying historical development of the Simulated Annealing
Algorithm was discussed. In Chapter 3, a mathematical model and a quantitative analysis
of the Simulated Annealing Algorithm were examined. In this chapter, a Parallel Simulated
Annealing Algorithm is designed, an algorithm which provide the basis for speedup
analysis in the next chapter and for a computational study in the following chapter.
Program partitioning or parallelization and interprocessor communication are two
popular terms in parallel processing. Intuitively, these two terms are self-explanatory:
Parallelization refers to the process of breaking a program or a problem down to smaller
components, but this can be done by using several different approaches and for different
objectives. Questions like how one can partition a program, what are the boundaries, what
are the tradeoffs and the precise goals of parallelization remain largely unanswered. The
term interprocessor communication is also self-explanatory. But similar questions about
the precise meaning of interprocessor communication and its impact on program execution
have no unique answers. Also, there is no available methodology for quantitatively
-59-
M. Tran
Chapter4: ParallelSimulated Annealing Algorithm
characterizing these terms. The problems of parallelization of the Simulated Annealing
Algorithm for the TSP in this chapter and intercommunication between processors
discussed in the next chapter are defined by modelling them, quantifying them, and
identifying their variables.
To make these ideas more concrete, Section 4.2 establishes a framework for the
TSP, explains in detail the partition process, and gives a high-level description of the
overall algorithm. In Section 4.3, two different neighborhood structures or perturbation
functions for the TSP are presented. The relationship between cost of the TSP tour and
cost of the subtour is analyzed in Section 4.4.
The candidate synchronous and
asynchronous parallel algorithms are formally outlined in Section 4.5. And, Section 4.6
discusses key implementation issues during different phases of the candidate Parallel
Simulated Annealing Algorithm.
Algorithm Framework and Parallelization
4.2
Methodology
In order to formulate the TSP concretely and precisely, let us establish some
common ground by introducing some notation.
From graph theory, a graph, G, is defined as
G = (V,E)
where V = {v0,...,N) is the set of vertices
and E = (vi,vj): there is an arc from vi to vj) is the set of edges.
A tour is defined as an ordered sequence a = (a(O),.... ,a(N-l),a(O)). (or equivalently,
as a function o:{0, ... , N-1,0) -> V) satisfying a(i)
ac(j) for i * j. a is a permutation
sequence, through which the tour is traversed, and a(i) is the ith vertex in a tour. As
illustrated in Figure 4.1(a), this arbitrary TSP tour has the following permutation or
ordered sequence a = (o(O), a(1), a(2), T(3), r(4), a(5), a(6), F(7), Y(8), a(9), a(10),
-60-
Chapter4: ParallelSimulatedAnnealing Algorithm
M. Tran
o(11), o(12), a(13), o(14), o(0)) = (vo, v4, V9, V13, V3, V7, Vll, Vl, V2, V15, V6, V5, V12,
V8, V10, v14, vo). Thus, the framework for the TSP is established.
As discussed in Section 1.4.2, there are several approaches to partitioning the TSP
tour into subtours. The approach here is to partition the entire TSP tour into a number of
equal subtours by taking the total number of cities (or vertices) desired in the TSP tour
divided the number of cities desired in the subtour. To illustrate this concept, let us
consider an arbitrary TSP tour of Figure 4.1(a); let us further divide this particular tour
into four subtours as shown in Figure 4.1(b). Hence, the first subtour consists of a
permutation
r
f
|
L
12
11
I
(b)
(a)
Figure 4.1: (a): An Arbitrary TSP Tour. (b): TSP Tour is Divided into Four Subtours.
sequence (black cities) si = (s(O), s(1), s(2), s(3)) = (vO, v4, v9, v13); the second subtour
has a permutation sequence s2 = (s(O), s(1), s(2), s(3)) = (v3, v7, vii, Vl); the third
subtour consists of a permutation sequence s3 = (s(O), s(1), s(2), s(3)) = (v2, v15, v6, v5);
and the fourth subtour has a permutation sequence s4 = (s(O), s(l), s(2), s(3)) = (v12, v8,
vl0, v14). Note that a and Y'are respectively denoted as the entire TSP tour and the entire
-61-
M. Tran
Chapter4: ParallelSimulatedAnnealing Algorithm
perturbed TSP tour while s and s' are respectively denoted as the subtour and the perturbed
subtour. It is important to notice one key feature before proceeding. The important fact is
that the number of cities in the TSP tour should be multiple of the number of cities in the
subtours. Thus, one can generally partition the total TSP tour into any arbitrary subtour
sizes and number of subtours. To illustrate this, let N be denoted the number of cities in
the TSP tour; let P be denoted the number of subtours; let Np be denoted the number of
processing elements (PEs) or processors; and let Ns be denoted the number of cities in the
subtours. Then, if N = PNs always hold, then one can partition the TSP tour into any sizes
Ns and P subtours. For example, let the number of cities in the TSP tour be N = 50, and if
we process this tour using 5 processing elements; then, there are 5 (=P) subtours, each of
which consists of 10 (=Ns) cities.
In order to amplify the understanding of the communication issues discussed in the
next chapter, the candidate algorithms provided in the later section and the software
provided in the Appendix A, it is important to be familiar with the overall conceptual
framework of the parallel processing scheme or software system model depicted in Figure
4.2.
In the next chapter, the speedup of the software system model is analyzed generally
where the number of subtours P is less than, equal to, and greater than the number of
available processors Np. In this chapter, for simplicity of analysis and discussion, the case
of the number of subtours P is equal to the number of processors Np is considered. Thus,
this software system model is assumed to be mapped exactly onto the FTPP [Harp87]
which is a loosely coupled Multiple Instruction stream Multiple Data stream (MIMD)
architecture whose Processing Elements (PEs) are communicated via I/O devices. These
PEs are interconnected by means of an interconnection network or a bus; each of these PEs
consists of a general-purpose CPU, i.e. MC68020, RAM, ROM, timers, I/O, and
interprocessor communication ports,
is capable of executing its own program
independently and operating on different copies of data. Mapping of the software system
-62-
M. Tran
Chapter4: ParallelSimulated Annealing Algorithm
model onto the FTPP means that each subtour si of Figure 4.2 is assigned exactly to one
PE.
I-
PEO
PE1
PE3
PE2
Figure 4.2: High-Level Description of the Parallel Scheme.
As can be seen from Figure 4.2, the central coordinator assumes several principal
tasks. First, it partitions the entire TSP tour into subtours. Second, for each partitioned
subtour, it creates a processing element datum structure. Third, at each particular
temperature value, it assigns a process, which consists of a processing element datum
structure and a subtour, to an available PE. Finally, it reconstructs the new TSP tour,
computes the cost of the new TSP tour and performs the global annealing process after all
processes have completed their computations. Note that, in this chapter, the terms PE and
processor are used interchangeably.
-63-
Chapter4: ParallelSimulated Annealing Algorithm
M. Tran
It is important to notice a few key features. First, communications are channelled
not only between the central coordinator and the PEs (or subtours), but also between the
PEs (or subtours) themselves. These interprocessor communications will be discussed in
detail in the next chapter. Second, Simulated Annealing is performed both locally and
globally. And, finally, the parallelization methodology in this thesis is generally applicable;
in other words, if the number of subtours P were set to 1, then the Parallel Simulated
Annealing Algorithm is reduced to a Serial Simulated Annealing Algorithm.
4.3 Neighborhood Structures
In this section, the two neighborhood structures, which are experimentally studied
in Chapter 6, are presented.
For easy reference, a particular subtour si which is assigned to a particular
processing element PEi in Figure 4.2 can be extracted from Figure 2.6 as shown in Figure
4.3.
3.1. While ("inner loop iteration" not yet satisfied) do the following:
3.1.1. Select random neighbor s' of configuration s.
3.1.2. Compute: AC(s',s) = Cost(s') - Cost(s).
* downhill transition *
3.1.3. If AC(s',s) • 0
Then set s = s'.
*uphill transition*
3.1.4. If AC(s',s) > 0
Then set s = s' with probability = exp(-AC/T).
L
Figure 4.3: General Algorithm of a Subtour
-64-
M. Tran
Chapter4: ParallelSimulated Annealing Algorithm
The general algorithm of a subtour in Figure 4.3 is performed at a particular
temperature.
The important statement which influences the computational results
substantially is the statement 3.1.1. Therefore, it is germane to pay close attention to the
variations of the this statement as we proceed. The following neighborhood structures for
the statement 3.1.1. are subjects of computational study in Chapter 6.
4.3.1 Citywise Exchange
For a given subtour s, a perturbed subtour s' is obtained by interchanging the
positions of two distinct randomly chosen cities. Such a modification will be denoted by
Tij(s). The operation Tij takes a subtour s and produces a new subtour s' = Tij(s) which
3.1. For i = 0 to (Ns-1), do the following: * Ns : number of cities in subtour*
3.1.1.Neighborhood structure:
(1). Generate a random city j, 0 _j _ (Ns-1), j # i.
(2). Construct a trial permutation from the current permutation as follows:
. Find: i' = min(ij) and j' = max(ij).
Set: s'(k) = s(k), k = 0,1, ... ,i' -1.
s'(k) = s(k) , k = i',1,2, ... , j'.
s'(k) = s(k), k = j'+l,j'+2, ... ,(Ns-1).
3.1.2. Compute: AC(s',s) = Cost(s') - Cost(s);
* downhill transition *
3.1.3. If AC(s',s) 5 0
Then set s = s'.
* uphill transition *
3.1.4. If AC(s',s) > 0
Then set s = s' with probability = exp(-AC/T).
Figure 4.4 (a): Algorithm of Subtour with Citywise Exchange.
-65-
M. Tran
Chapter4: ParallelSimulatedAnnealing Algorithm
i-1
i -P1
i
j
i+1
Subtour s
i+ 1
j-1
-1
j
j+1
j+1
Perturbed Subtour s'
A
Figure 4.4 (b): Neighborhood Structure with Citywise Exchange.
satisfies s'(j) = s(i), s'(i) = s(j) and s'(k) = s(k) for every k * ij. The algorithmic steps for
the subtour are provided in Figure 4.4(a) while the neighborhood structure is outlined in
Step 3.1.1 and is graphically illustrated in Figure 4.4(b). It is important to notice that by
this construction only the positions of the ith and jth cities are interchanged.
4.3.2
Lin's 2-Opt
Algorithm or Edgewise Exchange
Another neighborhood structure under consideration in Chapter 6 is the Lin's 2-Opt
Algorithm or Edgewise Exchange. In their well-known paper [Lin73], Lin & Kernighan
proposed a series of heuristics of increasing complexity that gives approximate solutions to
the TSP. The simplest one, known as Lin's 2-Opt Algorithm and denoted Lij(s), is the
following: starting from a given tour, exchange two edges (or branches) by replacing two
edges in a given subtour with two edges not in the tour, provided that the resulting
connection is also a subtour. Whenever this exchange yields a shorter or better subtour, it
-66-
M. Tran
Chapter4: ParallelSimulatedAnnealing Algorithm
is repeated until no more improvements are possible. To illustrate this concept, let s = (vl,
v2, ... , vNs) be the current permutation sequence. There are edges connecting vl and v2,
i
i+l
Subtour s
j
j+1
0
i
j+1
i+·cJ
Perturbed Subtour s'
Figure 4.5: Neighborhood Structure with Lin's 2-Opt Exchange or Edgewise Exchange
3.1. For i = 0 to (Ns-1), do the following: * Ns : number of cities in subtour*
3.1.1.Neighborhood structure:
(1). Generate a random city j, 0 5 j 5 (Ns-1), j * i.
(2). Construct a trial permutation from the current permutation as follows:
. Find: i' = min(ij) and j' = max(ij).
Set: s'(k) = s(k) , k = 0,1, ... ,i'-1.
s'(i'+ k) = s(j'- k), k = 0,1,2, ... , j'- i'.
s'(k) = s(k), k = j'+l,j'+2, ... ,(Ns-1).
3.1.2. Compute: AC(s',s) = Cost(s') - Cost(s);
3.1.3. If AC(s',s) 5 0
* downhill transition *
Then set s = s'.
3.1.4. If AC(s',s) > 0
* uphill transition *
Then set s = s' with probability = exp(-AC/T).
Figure 4.6: Algorithm of Subtour with Lin's 2-Opt Exchange.
-67-
M. Tran
Chapter4: ParallelSimulatedAnnealing Algorithm
v2 and v3, ... ,VNs-1 and vNs in the current permutation. In the two edge perturbation
strategy, two edges are chosen and broken from the current perturbation. These are
v2 and v3, ... ,VNs-1 and vNs in the current permutation. In the two edge perturbation
strategy, two edges are chosen and broken from the current perturbation. These are
replaced by the two unique edges required to rejoin the permutation (and create a new one).
For example, if the edges (vi, vi+1) and (vj, vj+1) are broken for some i and j such that i < j
< Ns, the new edges are (vi, vj) and (vi+1, vj+1). The net result is that the permutation
segment between vi+1 and vj is reversed. The perturbed permutation is s' = (vl,v2, ... ,vi,
vj, vj-1, ... , vi+l, vj+1, ... , vNs) as depicted in Figure 4.5. To be precise, the algorithm
of the subtour with Lin's 2-Opt Exchange is provided in Figure 4.6.
It is important to point out here the difference between the permutation processes of
the Tij(s) and Lij(s) Exchanges. The two cities of the subtour are interchanged in the Tij(s)
permutation process, hence the term Citywise Exchange, whereas the two edges are
switched in the Lij(s) process, thus the term Edgewise Exchange. Figure 4.4(b) and
Figure 4.5 illustrate this difference. For example, city i is interchanged with city j in Figure
4.4(b) whereas edges (vi,vi+l) and (vj,vj+l) are interchanged with edges (vi,vj) and
(vi+l,vj+l) in Figure 4.5.
4.4 Cost of TSP Tour and Subtour
The Steps 3.1.2--3.1.4. of the Algorithm of Subtour were outlined in Figure 4.3.
Let us examine in more detail what the functions of these steps in computing the subtour
costs are and how they are related with the entire TSP tour cost.
-68-
M. Tran
Chapter4: ParallelSimulated Annealing Algorithm
Let us attach weights or costs to the edges of the entire TSP tour. Let duv be the
cost of an edge (u,v). Thus, da(i),o(i+l) is the cost associated with an edge of a tour. The
cost of a tour y,denoted as C(a) , is given by,
N-1
C(O)=
d (i)ai+ l) }+(a d(N), a(l)I
(4.10)
i=I
which is exactly the same as Equation 1.1. The cost of the subtour with Tij(s) exchange is
C(Tij(s)) = C(s') = :E
de'
(4.11)
e'lE'
where Es' is the set of edges resulting from Tij(s). For the Simulated Annealing Algorithm,
one is interested in the change in cost from C(s) to C(s'). The change in cost, AC(s',s), is
AC(s',s) = C(s') - C(s) =
de' - :
e' E"
eeVE
de
(4.12)
where Es is the set of edges in the subtour s and Es' is the set of edges in the perturbed
subtour s'. Equation 4.13 gives an expression for the cost of the entire TSP tour with P
simultaneous subtours.
P-1
C(o') = C(a) + : AC(sk,s)
k=O
(4.13)
The Equation 4.13 can be interpreted as follows. The perturbed cost of the TSP
tour is equal to the cost of the current tour plus the sum of the changes in cost of the
-69-
Chapter4: ParallelSimulatedAnnealing Algorithm
M. Tran
individual subtours. This equation is incorporated into the candidate algorithm in the next
section.
4.5 Candidate Algorithms
Two algorithms are presented below, which are based on the ideas outlined in
[Kim86], and both of which are iterative and utilize the central coordinator. Algorithm A is
a Synchronous Parallel Simulated Annealing Algorithm while Algorithm B is an
Asynchronous Parallel Simulated Annealing Algorithm. Though an analysis of Algorithm
B is not performed in Chapter 6, it would be a very interesting subject of a computational
study in the future research efforts.
Let o n be denoted as the tour obtained by the nth iteration and let C(o n ) be its
associated cost. Also, let C(on)B be the best, or minimum, cost found up to iteration n.
4.5.1
Algorithm A: Synchronous Parallel Simulated Annealing
Algorithm.
Initialize: At iteration (n =) 1, generate an initial random tour ol , compute its initial cost
C(ol), set the initial best cost C(o 1l )B = C(o 1 ).
1. The central coordinator partitions the entire TSP tour into P subtours and selects the
annealing temperature Tn at the nth iteration.
2. The subtour along with the processing element datum structure at the nth iteration and
T'n are delivered to kth available processor.
-70-
M. Tran
Chapter4: ParallelSimulated Annealing Algorithm
3. (A). At nth iteration, the kth processor calculates the change in cost of the subtours and
performs the local annealing process according to the following relations:
AC (s'", s))=C(s'")
n
sk sn
-C(s,)
(4.14)
if r <exp { - AC(s'", s) / T") or AC(s", s )<0
otherwise.
(4.15)
and
C(s )= C(s' ) if r<exp({ -AC(s",
C(s k
C(s") otherwise.
s, ) / T) or AC(sk, Sn)<O
(4.16)
where r is a uniformly distributed random variable over (0,1).
Note that s'kn = Tikjk(s n ) is the Tij(s) or Lij(s) interchange at nth iteration presented in
Section 4.3.
(B). The k th processor participates in Citywise Exchanges with other processors and
performs the local annealing process as in Step 3A.
(C). And, the kth processor performs the local annealing process as in Step 3A to
optimize its subtour again.
NOTE: The kth processor repeats for cardinality times (or equivalently the number of cities
in the subtour) in Steps 3A, 3B, and 3C, respectively, at each iteration, i.e. Step 3.1. of
Figure 4.4(a).
4. The k th processor keeps the central coordinator informed of the status of acceptance of
the interchange. Whenever the interchange is accepted, the kth processor delivers a change
in cost and the perturbed subtour s', which has been accepted, to the central coordinator.
-71-
M. Tran
Chapter4: ParallelSimulated Annealing Algorithm
5. The central coordinator reconstructs the new tour a n +l and computes its new cost
according to the following relations:
P-I
On+1
Sn
•
i =O
(4.17)
and
P-1
C(an+1 )
= C(on) + 7
AC(s n,sn)
k=O
(4.18)
6. The central coordinator computes the change in cost, and performs the global annealing
process according to the following relations:
AC = C(O"+ 1) C
C(o)B
")B
n
C(" + 1) if r <exp { - AC / T") or AC <0
C("Y)B otherwise .
(d 1
(4.20)
and
, o " + if r<exp{ - AC / T") or AC<0
B =a
otherwise
B+I
(4.21)
where r is a uniformly distributed random variable over (0,1).
7. Check for a stopping criterion in terms of maximum number of iterations, i.e."frozen".
8. if the condition of Step 7 is not satisfied, then goto Step 1.
-72-
Chapter4: ParallelSimulated Annealing Algorithm
M. Tran
4.5.2 Algorithm B: Asynchronous Parallel Simulated Annealing
Algorithm.
Though the design of Algorithm B has not been carefully laid out in detail, its main
steps are outlined as follows:
Initialize: At iteration (n =) 1, generate an initial random tour a 1 , compute its initial cost
C(ol), set the initial best C(of )B = C(o 1 ).
1. A central coordinator partitions the entire TSP tour into P subtours and selects the
annealing temperature Tn at the nth iteration.
2. The subtour s and T" are delivered to the kth processor, making the kth processor busy.
3. The same step as in Step A3.
4. The same step in Step A4 and the kth processor becomes free.
5. The central coordinator reconstructs the tour and updates the cost of the tour by,
s <== s' = Tikjk(s)
C(a) <== C(a) + AC(s',s).
6. The same as in Step A6.
7. The same as in Step A7.
8. The same as in Step A8.
-73-
M. Tran
4.6 Implementation
Algorithm
Chapter4: ParallelSimulatedAnnealing Algorithm
Issues
of the Candidate
Algorithms A and B outline the major phases of the iterative process. The central
coordinator's tasks, Steps Al, A5 and A6, should be implemented as efficiently as
possible. Otherwise, processors may become idle with no jobs to perform. Thus, one of
the most important design goals for the central coordinator tasks is to reduce the overhead
introduced by the parallelization. At each iteration, which is one time step, the central
coordinator has basically four tasks: (1) selection of the temperature at each iteration; (2)
generation of a starting city, partition of the entire TSP tour into subtours, creation of a
processing element data structure for each subtour, and assignment for each processor a
subtour and a temperature value; (3) reconstruction of the new tour from the "better"
subtours and computation of the cost of the new tour; and (4) performance of a global
annealing. This section will discuss these tasks in detail.
First of all, in Step Al, the central coordinator is selecting the annealing temperature
at a given iteration. Two different annealing schedules, namely Equations 3.12 and 3.13
or Equations 4.22 and 4.23 respectively, are considered for Algorithm A, and they were
extensively discussed in Section 3.2.2. The first annealing schedule, Equation 4.22, has
been shown to provide the Simulated Annealing Algorithm good solutions when computing
the probability of accepting an uphill movement in cost [Mit85]. With this annealing
schedule, the temperature is initially started at an reasonably high temperature value and
then is held constant until the cost function has reached equilibrium or the steady state
probabilities. Then, the temperature is lowered slowly according to Equation 4.22.
However, it is impossible to determine exactly when the cost function has reached
equilibrium. Instead of having the temperature lowered at equilibrium, one can lower the
temperature after some fixed number of iterations or after the number of accepted
interchanges exceeds some pre-determined threshold. In Chapter 6, a computational study
-74-
Chapter4: ParallelSimulated Annealing Algorithm
M. Tran
on the first alternative, i.e. lowering temperature after some fixed number of iterations, is
performed. Figure 4.7 plots temperature versus time for various values of c.
Tk+1 = CTk
or equivalently,
Tk = CkT0
for 0.9 5 c 5 0.99 and k = 0, 1, 2, ... , max_iterations
0
50
100
150
200
250
300
350
Time (iterations)
(4.22)
400
450
500
Figure 4.7: Temperature versus time for Tk = ckTo for different values of c at To = 20.0.
The second annealing schedule, Equation 4.23, provides convergence in probability
to the set of optimum tours. However, the theory assumes that the algorithm is run for
infinite time steps. Practically, the algorithm cannot be run indefinitely; hence, only near
optimal solutions can be expected.
-75-
M. Tran
Chapter4: ParallelSimulated Annealing Algorithm
Tk = --d-
logk
for d 2 L and k = 2, 3, ... , max_iterations
(4.23)
Figure 4.8 plots temperature versus time for various values of d. In Chapter 6, the
results of a computational study on these annealing schedules are presented.
100
10
1
0
100
200
300
400
500
600
Time (iterations)
700
800
900
1000
Figure 4:8: Temperature versus time for Tk = d/logk for different values of d.
Secondly, the central coordinator generates randomly a starting city. Then, from
this initial city, it partitions the tour a into equal number of subtours s's, each of which is
associated with a data structure. Along with the computed temperature value, the central
coordinator assigns each available processor or PE a data structure.
Finally, the central coordinator is performing Steps A5 and A6. It reconstructs the
new tour a' from the processor updates. It then computes the new cost of the tour and
performs the global annealing process. If the central coordinator accepts the new tour, then
-'76-
M. Tran
Chapter4: ParallelSimulatedAnnealing Algorithm
it updates its data structure and uses this accepted tour as the current tour; otherwise, the
central coordinator keeps its old data structure and uses it for the next iteration. Either
accepting or rejecting the new tour, the algorithm is repeated until the stopping condition is
satisfied, i.e. maximum number of iterations.
In summary, the central coordinator starts at an iteration by selecting the
temperature value, randomly generating a starting city, and partitioning the current TSP
tour into subtours. It then delivers the temperature and the partitioned subtours to
Np
processing elements in the system. Each of these processors is responsible for perturbing
the subtour, computing the cost of the perturbed subtour, and performing the local
annealing process as in Step A3. When each of these processors is finished with its tasks
in its subtours, the central coordinator is reconstructing the new tour from these "better"
subtours, computing its new cost, and performing the global annealing process. Then, the
central coordinator repeats the iteration until some "maximum iterations" stopping criterion
has been satisfied.
In this chapter, the overall framework of the Parallel Simulated Simulated
Annealing Algorithm and the parallelization methodology were presented. Two different
neighborhood structures were discussed. The candidate algorithm and its implementation
issues were outlined and discussed. In the next chapter, the speedup of the Parallel
Simulated Annealing Algorithm (Algorithm A) in Section 4.5.1 is analyzed.
-77-
__
M. Tran
Chapter5: Speedup Analysis
CHAPTER 5
OJPMIDUP ANALYGU0 07 TE2
IPARAJLIL
TMUL
TaD AAAL1G
5.1 Introduction
In Chapters 2 and 3, the historical development and a mathematical model of the
Classical Simulated Annealing Algorithm were covered. In the last chapter, a Parallel
Simulated Annealing Algorithm was designed.
In this chapter, a measure of the
performance of the Parallel Simulated Annealing Algorithm (Algorithm A) is analyzed.
Generally, there are several objectives in parallel algorithm performance analysis.
First, one would like to know the execution time of the algorithm. The real execution time
depends on the details of the system on which the algorithm is run. If models do not
include all the details of a particular system, the execution time computed with respect to a
given model is only an approximate measure of the actual time. If the model includes
parameters describing the system, then the impact of the system on algorithm performance
can be studied. In the parallel system environment, communication factors can also have a
significant impact on algorithm performance. An algorithm analysis based on a model that
includes this aspect is generally more accurate and useful than one that does not. One
objective of this chapter is to show how communication can be incorporated in the analysis
-78-
M. Tran
Chapter5: Speedup Analysis
of parallel algorithms and in measures of algorithm performance. Another purpose of this
chapter is to discover methods for improving algorithm performance by removing
inefficiencies caused by communication overheads.
Intuitively, the degree of parallelism can be defined as some function of the number
of processors at any given moment. Ideally, in a system with Np processors, it is desirable
that the degree of parallelism always be Np or as close to Np as possible. This degree of
parallelism known as speedup is a measure of parallel algorithm performance. Though
various definitions of speedup do exist [Poly86], [Crane86], and [Eag89], the speedup of
the Algorithm A in Section 4.5.1 and the software system model in Figure 4.2 is in general
defined as follows,
S-
Where
IIT,
I,I T,P
(5.1)
.T 1 is the execution time of Algorithm A from Steps Al to A7 per
iteration using 1 processor.
.I1 is the number of serial iterations required for Algorithm A to
converge to some desired cost.
.TNp is the execution time of Algorithm A from Steps Al to A7 per
iteration using Np processors.
.and, INp is the number of parallel iterations required for Algorithm
A to converge to some desired cost.
Although they may be different in practice, Ii and INp are assumed to be the same
throughout the analysis of this chapter. Thus, the speedup definition of Equation 5.1 can
be reduced and modified to be Equation 5.2 so that the analysis and discussion which
follow can be done per iteration of Algorithm A.
-79-
M. Tran
Chapter5: Speedup Analysis
T
S=-
T
Where
1
NP
(5.2)
.T 1 is the execution time of Algorithm A per iteration using 1
processor.
.TNp is the execution time of Algorithm A per iteration using P
subtours and Np processors.
From Section 4.5.1, it can be seen that the Algorithm A consists principally of 7
steps. The parallelization scheme designed in this thesis is Step 3, and all other steps are
essentially sequential.
In Section 5.2, the speedup of independent subtours (Step 3A or Step 3C) is
analyzed. Interprocessor communication is discussed in Section 5.3. Speedup analysis of
interprocessor communication (Step 3B) is investigated in Section 5.4. In Section 5.5, the
speedup of Step 3 of Algorithm A is examined, and general bounds on speedup of
Algorithm A are given in Section 5.6.
5.2 Speedup Analysis of Independent Subtours
In this section, the speedup of independent subtours (Step 3A or Step 3C) is
analyzed. Let dij denote the execution time of a distance calculation between city i and city
j, and let e denote the execution time to perform a s(i) = s'(i) or s'(i) = s(i) operation for
some integer i. For each iteration in Step 3A or Step 3C of Algorithm A in Section 4.5.1,
the following 4 steps are performed:
1. Generate a perturbed subtour s'.
2. Compute the current subtour cost C(s).
3. Compute the perturbed subtour cost C(s').
-80-
M. Tran
Chapter5: Speedup Analysis
4. Perform the local annealing.
Since Steps 1 and 4 take the same amount of time (NE/P) to perform (N/P)
iterations and Steps 2 and 3 take the same amount of time (N/P - 1)dij to compute, the
execution time to compute Step 3A for one iteration is (2(N/P - 1)dij + 2(Ne/P)). Thus, the
total execution time tA to compute Step 3A of Algorithm A for (N/P) iterations in general is
(N/P)(2(N/P - 1)dij + 2(NE/P)) or
tA =124
P' "N" djij +e)-- d..
P
(5.3)
Assuming that
N (d + E) > > d..
-•
P i
(5.4)
the total execution time to compute an independent subtour (Step 3A or Step 3C of
Algorithm A) is
tA
2N2
(dP
+A i2
)=
N-
y , y=2(d.. + ).
(5 .5)
The execution time of Step 3A or Step 3C using P subtours and 1 processor is
A =
I = PtA
(5.6)
Before we consider the total execution time of Step 3A or Step 3C using P subtours
and Np processors, let us examine how processors are being allocated to ready subtours
and how they are being idle when there is no available subtour.
-81-
M. Tran
Chapter5: Speedup Analysis
There are generally two cases of processors allocations and two cases of idle
processors [Poly86], [Crane86] and [Eag89]: An unlimited number of processors, where
the number of processors is greater than or equal to the number of subtours, i.e. Np 2 P;
and, a limited number of processors, where the number of processors is less than the
number of subtours, i.e. Np < P. Let Np be the number of processors; let P be the number
of subtours; let I(Np) be the average processors idle time; and let tA be defined as in
Equation 5.5. For the unlimited number of processors (Np > P), we can allocate as many
processors to the subtours as we are pleased. P ready subtours are always being kept busy
by P allocated processors, and the execution time is just tA. In this case of processors
allocation, there are (Np - P) processors being idle, and the processors idle time I(Np) on
the average is just ((Np - P)/Np)tA. Unlike the case of unlimited number of processors, the
case of limited number of processors allocation (Np < P) is assumed to be as follows. Let
sj be denoted the jth subtour, where j = 1, ... ,P; and, let PEi be denoted the i tl processor
for any i, where i = (j + Np)%Np and % denotes the modulus operator. Note that the
modulus operator calculates for only the remainder of the two numbers, i.e. 10%3 = 1.
So, each processor PEi is assigned to the jth subtour in Figure 4.2. In other words, sl is
assigned to PEI, s2 to PE2, .... , SNp to PENp, S(Np + 1)to PE 1, s(Np + 2) to PE 2 , ...., until
all subtours are being executed. In this case of processors allocation, Np processors need
to keep P subtours busy; the last (P + Np)%Np subtours are being kept busy by the first
Np processors, and there are (Np - (P + Np)%Np) being idle. Thus, the processors idle
time I(Np) on the average is just ((Np - (P+Np)%Np)tA/Np). For the worst idle processors
case, where there are at most (Np - 1) idle processors, the processors idle time I(Np) on the
average is just ((Np - l)tA/Np) for all Np.
From the above discussion, the total execution time of Step 3A or Step 3C using P
subtours and Np processors when taking idle processors time I(Np) into consideration is in
general,
-82-
M. Tran
Chapter5: Speedup Analysis
3A
tA+ N
3C
T N =T N =
P
N, - (P + N,)% NP,
A
pN
, Np< P.
P
t A+
Np P.
(5.7)
or equivalently,
N
P
3A
(+
S--p,
)
%
N
-)
A
3C
N < P.
N =TN
tA
,Np>P.
(5.8)
and the corresponding speedup for an arbitrary number of idle processors is
PN
-(P+N,)% Np
S3A
S 3C
,N <P.
PN
,Np,
P.
(5.9)
And if there are at most (Np - 1) idle processors, the speedup in Equation 5.9 for Step 3A
or Step 3C becomes,
PN P
P+N -1
S3A= S3C
PN
,N ,2P.
2NP - 1
Let P be defined as follows,
-83-
m
,N <P.
(5.10)
Chapter5: Speedup Analysis
M. Tran
kNP
, N, < Por V k > 1.
P= NP
S
, N, Por Vm 21.
(5.11)
and substituting P of Equation 5.11 into Equation 5.9, the following relation results,
P
S 3A
< PorP=kN, ,Vk> 1.
,N
N
,N >PorP=- , V m 1.(512
Similarly, for the worst idle processors case if there are at most (Np - 1) idle processors,
Equation 5.10 becomes,
kN
2
,P
N p < PorP=kNp, V k> 1.
p
(k+ 1)N -1
S
3A
-
2
P
(2N - 1)m
Np
P
mPorP=-•-,
Vm 1.(5.13)
Let us make a few observations here.
Observation 5.2.1: If P >> Np or k is large in Equation 5.12, the speedup of Step 3A or
Step 3C can not be greater than Np and is asymptotically approaching to Np. Thus, we can
see that there is little benefit in partitioning the number of subtours P much beyond the
number of available processors Np in the system.
Observation 5.2.2: If Np >> P or m is large, the speedup of Step 3A or Step 3C degrades
somewhat and possibly decreases to a value less than 1. In fact, from Equation 5.9, in the
limit as Np approaches to infinity, S3A approaches to P/2; if we process the Algorithm A
sequentially, i.e. P = 1, then, the speedup is only 0.5, which is worsen than executing a
-84-
M. Tran
Chapter5: Speedup Analysis
sequential program on a 1-processor system. The consequence of this result proves to be
very useful when we are considering the sequential steps of the Algorithm A in a later
section. Like Observation 5.2.1, there is very little or no benefit in allocating the number
of processors Np much beyond the number of partitioned subtours P.
Observation 5.2.3: For the special case of m = 1 or Np = P in Equation 5.12, the speedup
S3A = Np results, as one would expect for Np independent subtours executing on an Npprocessors system. As we can see, the maximum speedup occurs when m = 1 or Np = P.
Observation 5.2.4: For the worst idle processors case, where there are (Np - 1) idle
processors, the speedup should be at least Equation 5.10 or Equation 5.13. Note that it is
always true that the speedup of the worst idle processors case is equal to or less than the
speedup of an arbitrary number of idle processors, i.e. Equation 5.10 is equal to or less
than Equation 5.9 or Equation 5.13 is equal to or less than Equation 5.12.
Thus far, we have discussed only Step 3A or Step 3C of Algorithm A but not much
being said about Step 3B of Algorithm A because the two former steps are steps, in which
subtours are processed or executed independently whereas the later step is a step, in which
subtours are "communicating" with one another. Thus, it is the subject of the next two
sections.
5.3 Interprocessor Communication
In the last section, the speedup of independent subtours was analyzed. In this
section, interprocessor communication is examined, which provides some useful results for
the speedup analysis of the next section. Interprocessor communication is the process of
information or data exchange between two or more processors.
interprocessor communication are distinguished:
Two types of
message communication, where
processors exchange message information, and data communication, during which one
-85-
M. Tran
Chapter5: Speedup Analysis
processor receives data that it needs from the other processors.
Both types of
interprocessor communication are significant because both are reflected as overheads in the
system network and the total execution time of a program, respectively.
In Algorithm A, interprocessor communication is essentially Step 3B, where
subtours or processors are exchanging cities with one another. Consider a 2-Processors
System with Interprocessor Communication in Figure 5.1. Assume that PEi wants to
exchange a city with PEj. It first reserves a city in its subtour, i.e. city A. It then requests
an available city from PEj, i.e. city B, and computes the new cost, including city B. If PEi
accept city B as an exchange, it sends a message to PEj to compute the cost of
r
I
PEj
PEi
I.1
Figure 5.1: A 2-Processors System with Interprocessor Communication.
its subtour, including city A; otherwise, PEi sends a message to PEj, telling PEj that it does
not accept city B as an exchange. Suppose that if PEi accept city B but PEj does not accept
city A, then PEi can not take city B away from PEj. Thus, if either PEi or PEj does not
accept the exchange, then neither PEi can take city B from PEj nor PEj can take city A from
PEi.
Hence, both processors PEi and PEj need to come to a mutual agreement when
exchanging the cities in their corresponding subtours. In an Np-processors system,
processors are communicated analogously.
-86-
M. Tran
Chapter5: Speedup Analysis
In the next subsection, message communication [Horo81] is analyzed; in the
following subsection, data communication [Poly86] is examined.
5.3.1 Message Communication
Message communication is a key overhead factor that influences the parallel system
performance. There is a methodology for measuring this message communication, called
message capacity. The rationale for this message capacity is to capture the phenomena
which lead to congestion and competition in a network-based parallel processor. Using
this metric, those networks which characterize how well the parallel machines support
multiple message transfers between arbitrary processors can be examined. This "message
capacity" is based on an examination of message density along each of the interconnection
paths between processors in the machine. It gives a figure of merit for the message
capacity of the network. This measure is an extension of the ideas in Horowitz and Zorat
[Horo81]. They use the number of communication paths which each processor in a
network is responsible for as a measure of the communication overhead in the system.
A message is defined for our application as a transmission or reception of a city
between processor i and processor j. Let Ms be the total number of messages that
processor i sends to other processors, i.e. one to every other processor, let Mr be the total
number of messages that processor i receives from other processors, i.e. one from every
other processor;, let Np denote the number of available processors in a parallel system; let P
denote the number of subtours in the TSP; let N denote the number of cities in the TSP
tour; let Ns be a set of messages which processor i sends to or receives from every other
processors, where Ns = (N/P). Assuming that a given TSP tour be partitioned into Np
subtours, i.e. Np = P and these subtours are executed concurrently on an Np-processors
system, then, there are
. Ms = (Np - 1)Ns messages sent by processor i to every other processor, and
-87-
M. Tran
Chapter5: Speedup Analysis
. Mr = (Np - 1)Ns messages received by processor i from every other processor;
Since each processor is responsible for transmitting or receiving Ns cities or
messages in a subtour. Thus, the processor i exchanges (sends to and receives from),
(5.13)
Ms + Mr = 2(Np - 1)Ns messages
with other processors. But, there are Np such processors. Thus, the total number of
messages exchanged in an interconnection network of an Np- processors system at a
particular iteration are
2Np(Np - 1)Ns = 2N(Np - 1) messages
(5.14)
and, on the average, a processor handles , in the bus, at each iteration of
2N(Np-1)
NP
= O(N) messages.
(5.15)
To illustrate, let us consider the case of a 2-Processors System in Figure 5.1. It can
be easily seen that there are a total of 16 message transmissions in the system, and on the
average, each processor handles 8 messages, i.e. 4 transmissions and 4 receptions.
5.3.2 Data Communication
In the last subsection, we have seen how a processor, on the average, handles
messages at a particular iteration. In this section, we will examine the execution time due to
the communication overhead factor. Let us assume that we are dealing with all cases of
number of processors allocations so that all P subtours are not necessarily equal to the Np
-88-
M. Tran
available processors.
Chapter5: Speedup Analysis
Of course, different subtours will be executed by different
processors during the parallel execution of Algorithm A, especially Step 3. Let us again
consider the problem of interprocessor communication for the case of a 2-processors
system and 2 subtours in Figure 5.1.
Let us review how interprocessor communication takes place between two subtours
si and sj. Let subtour si be assigned to processor i, and subtour sj be assigned to processor
j. Again, interprocessor communication between processor i and processor j takes place if
processor i and processor j mutually agree to exchange a city in their corresponding
subtours. Therefore, interprocessor communication involves explicit transmission or
reception of cities between processors, and of course, the number of cities that processor i
sends to processor j is equal to the number of cities that processor j receives from processor
i.
Let us define the communication unit r to be the execution time it takes to exchange
(to transmit and to receive) a city between two subtours or processors, i.e. the execution
time of 2 messages. Let Ns be the cardinality of the subtour or the number cities in the
subtour; let w be the communication weight between processor i and processor j. Then,
the time spent for communication during the concurrent execution of Algorithm A with 2
subtours on 2 processors is w, which is equal to rNs or r(N/P), where N is the number of
cities in the TSP, and P is the number of subtours. Thus, for 2 subtours i and j executing
on a 2-processors system, the execution time of subtour i by processor i when exchanging
Ns cities (transmitting to and receiving from) with processor j is (ti + r(N/P)).
For P subtours executing on a 1-processor system, which is a simulated
environment, the execution time due to communication overhead of the ith subtour when
exchanging Ns cities with other (P - 1) subtours is
(P - l)w units of time
-89-
(5.16)
M. Tran
Chapter5: Speedup Analysis
Thus, the total execution time due to communication overhead for exchanging Ns cities
among P subtours when using 1 processor is
Or = P(P - 1)w = N'(P - 1) units of time
(5.17)
and on the average, the execution time due to communication overhead for exchanging Ns
cities among P subtours when executing P subtours on an Np-processors system is
Oa=
NT(P- 1)
-
NP
1) units of time
(5.18)
To illustrate, let us consider the case of a 4-Subtours System with Interprocessors
Communication as shown in Figure 5.2. Then, the total execution time for exchanging Ns
cities among 4 subtours when using only 1 processor is 12w or 3 Nt, and the execution
Figure 5.2: 4-Subtours System with Interprocessor Communication
-90-
M. Tran
Chapter5: Speedup Analysis
time on the average for exchanging Ns cities among 4 subtours when using Np processors
is 12w/Np or is 3NT/Np for any Np, i.e. if Np = 4, then each processor, on the average,
takes 3w or 0.75Nt units of times to intercommunicate with other processors. Note that
Equation 5.17 is the special case of Equation 5.18. Note also that when using 1 processor,
we generally set P = 1 so that the communication overheads in Equations 5.16, 5.17, and
5.18 are equal to zero; thus, no interprocessor communication exists when executing a
program sequentially. For the purpose of calculating the average communication overhead,
P # 1 is assumed.
5.4 Speedup
Communication
Analysis
of
Interprocessor
In the last section, communication overheads were analyzed. Let us incorporate the
result of Equation 5.18 to our analysis of the speedup of Step 3B of Algorithm A. As can
be seen from Figure 5.1 of a 2-Processors System with Interprocessor Communication, the
execution of Step 3B follows the same steps as the execution of Step 3A or Step 3C except
for the communication factor, namely Citywise Exchange between 2 subtours or
processors. Thus, the execution time of Step 3B of Algorithm A on an Np-processors
system is just the execution time of Step 3A and the average communication overhead as
shown in Equation 5.19,
t, = tA + O
or equivalently,
-91-
(5.19)
Chapter5: Speedup Analysis
M. Tran
NT(P - 1)
N+
tB = tA
(5.20)
Let us consider a speedup of Step 3B for an arbitrary number of idle processors.
For P subtours, the sequential execution time of Step 3B is
3B
T 1 =Pt A
(5.21)
and the execution time of P subtours on an Np-processors system in general takes the
following form,
N - (P + Np
p
N -P
NP
or equivalently,
A+
N
tA+
O
,Np< P.
,Np
A+
P.
(5.22)
or equivalently,
'(P +N P- (P + N,)% Np)tA + N(P- 1)
3B
"P
(2NP - P) tA + Nt(P - 1)
Np
I.
,N,<P.
,Np<P.
NP-
(5.23)
Note that if Np = P, there is thus no idle processor, Equation 5.23 is reduced to Equation
5.20. The speedup of Step 3B for an arbitrary number of idle processors is of the
following form,
-92-
M. Tran
Chapter 5: Speedup Analysis
[
PN t A
I
Np <P.
(P + N, - (P + Np)% NP)tA+ Nr(P - 1)
S 3B
PN ptA
,NP
(2NP- P)tA+ Nt(P - 1)
P.
(5.24)
And, in the worst idle processors case, where there are (Np - 1) idle processors, the
speedup is
PNt A
(P + N - 1)tA+ NT(P - 1)
S =3B
,Np< P.
PN t A
N , <P.
(2N, - 1)tA + Nt(P-
1)
N >P.
(5.25)
Substituting Equation 5.5 for tA into Equation 5.24, the following equation results,
yPPNN
y(P + N, - (P + Np)% Np)N+ rP 2(P
S3B
-
,N <P.
1)
yPN N
y(2N, - P)N+ rP2(P
-
,N ,2P.
i)
(5.26)
Let us define N as follows,
kN, N,p
N=N,P=
N,Np
m
V k >1.
,Vm_ 1.
(5.27)
and substituting it and Equation 5.11 into Equation 5.26, we have the following speedup
equation,
-93-
M. Tran
Chapter5: Speedup Analysis
ykN ,N,
y(k + 1)N,+ rk(kNp- 1)
P =kNp, V k > 1.
S=B
3B 2
)nN,N p
P-
ym(2m - 1)N, + r(N, - m)
Np
,Vm>_.
(5.28)
Similarly, let us consider the worst idle processors case, where there are (Np - 1)
idle processors; with tA, then Equation 5.25 becomes,
yPNpN
,Np< P.
y(P + N, - 1)N + 'cP(P - 1)
S3B =
yPNPN
,Np,>P.
y(2Np - 1)N + tP'(P - 1)
(5.29)
and substituting Equations 5.11 and 5.27 into Equation 5.29, we have the following
relation,
'kN,N 2
S3B
P =kNp, V k > 1
y((k + 1)N, - 1)N, + rkN,(kN, - 1)
ymN ,N
ym2(2N,
-
1)N, + 2N,(Np-
m)
,P-
N,
, Vmk>l.
V m Ž.
(5.30)
A few key points can be observed from the above equations.
Observation 5.4.1: In the limit as N goes to infinity, Equation 5.26 is asymptotically
approaching to Equation 5.31, which is the same as Equation 5.9 for P independent
subtours executing on an Np-processors system.
Note also that Equation 5.31 is
independent of the t term, the communication overhead factor.
-94-
M. Tran
Chapter5: Speedup Analysis
PNPN
P+Np-(P+Np)%N
Lim S3=
N--
<P.
,N
-- '
P
PN,
SP
2NP P
,N>P.
P(5.31)
Observation 5.4.2: In the limit as the number of cities in a subtour Ns goes to infinity, the
speedup of Equation 5.28 approaches to a linear function of the number of processors as
shown in Equation 5.32. Note that if we substitute Equation 5.11 into Equation 5.26 and
take the limit as N goes to infinity, we will also have the same speedup result, i.e. Equation
5.32, which is equivalent to Equation 5.12, and Observations 5.2.1-5.2.3 are also applied
here. So, as the number of cities N or Ns increases, the effect of the communication
overhead factor seems to be less dominant, and as the number of cities approaches to
infinity, the communication overhead factor is independent of the Step 3B speedup
equation. This independence will be examined in some detail in a later section.
[( k N ,PP=kN,,Vk> 1.
Lim S3B = Lim S,3B=
N-N -a-
N
1
(2m- 1 N
mP
,
,
m
.
(5.32)
Observation 5.4.3: For (Np - 1) idle processors and in the limit as Ns approaches to
infinity in Equation 5.30, the speedup of Step 3B approaches to Equation 5.33, which is
equivalent to Equation 5.13. Note that if we substitute Equation 5.11 into Equation 5.29
and take the limit as N approaches infinity, we will also have the same result. Of course,
Observation 5.2.4 is equally applied here as well.
2
Lim S3 = Lim S,
-3B
N
=
~P
(k + 1)N,- 1
+ 2N
'
P = kN, Vk > 1.
N2
3B,
N
NP=
, V m> 1.
-p
(2N
, - 1)m
-95-
P
(5.33)
M. Tran
Chapter5: Speedup Analysis
5.5 Speedup Analysis of Step 3 of Algorithm A
In the last section, we have analyzed the speedup of the interprocessor
communication, namely Step 3B. In this section, we examine the speedup of Step 3 of
Algorithm A. As usual, we begin with the execution time. The execution time of Step 3
consists mainly of the execution times of Step 3A, Step 3B, and Step 3C. Since Step 3A is
the same as Step 3C, the execution time of Step 3 then comprises of the execution time of
Step 3B and twice as much as the execution time of Step 3A. Thus, the execution time of
Step 3 of Algorithm A using 1 processor is
3
3A
T =2T
3B
+TI =3Pt A
(5.34)
and, the execution time of Step 3 using an Np-processors system is
3
3A
3B
3A + TN
T3 N,= 2TN
+N
(5.35)
or equivalently,
3 (P+ N, - (P+ Np)% N,)t A + N
(P -
1)
TN
N =
3(2Np,- P)t A + N
(P -
1)
INp
, Np < P.
1
(5.36)
and, the speedup of Step 3 for an arbitrary number of idle processors is of the following
form,
-96-
M. Tran
Chapter5: Speedup Analysis
3 (P + N-
S
3PNp tA
(P + N,)% Np)tA + Nt(P- 1)
3PN,
Np <P.
tA
Np
>P.
'P
-
3(2Np - P)tA + Nt(P - 1)
(5.37)
In the worst idle processors case, where there are (Np - 1) idle processors, the speedup of
Step 3 becomes,
F
3PN pt A
I
S3= I
I 3(2N,
I
,N <P.
3(P + N, - 1)tA + NT(P - 1)
3PN ptA
,Np >P.
-1)t A+ N(P - 1)
(5.38)
After substituting Equation 5.5 for tA into Equation 5.37 and simplifying, the following
speedup result of Step 3 is obtained,
3yP N, N
S3
, N < P.
3y(P + N - (P + N,)% Np)N + tP 2 (P - 1)
"t=
3yPN pN
3y(2N,- P)N + tP (P
-
, N, 2 P.
1)
(5.39)
If we substitute Equations 5.11 and 5.27 into Equation 5.39, Equation 5.40 results,
P.
3kNkN,
3y(k + 1)N, + Tk(kN, - 1)
3
3~ym N, N
3ym(2m - 1)N, + z(Np - m)
P=kN, Vk > 1.
,PP
m,P
Vm>1.
Let's us make a few observations about the above equations.
-97-
(5.40)
M. Tran
Chapter5: Speedup Analysis
Observation 5.5.1: In the limit as the number of cities N in the TSP approaches to infinity
in Equation 5.39, the following speedup of Step 3 is obtained. Note that Equation 5.41 is
exactly equivalent to Equations 5.9 and 5.31. Of course, Observation 5.4.1 can be made
here as well.
PN
Lim 3
P +
Np - (P + Np)% Np
N <P.
P
Lim S =-
SN > P.
(5.41)
2N-P
Observation 5.5.2: In the limit as the number of cities Ns in a subtour approaches to
infinity in Equation 5.40, the speedup Equation 5.42 holds. Note that Equation 5.42 is
also exactly equivalent to Equations 5.12 and 5.32. Similarly, Observation 5.4.2 is also
applicable here.
Lim S 3 =Lim S 3 =
N-+ -
N -+-
K+ l)NP
1
2m-I N p
,P=kNp,Vk>1.
Np
NP Vm l.
m,
(5.42)
Observation 5.5.3: From Equation 5.40, if m = 1 or Np = P, Equation 5.43 is obtained.
Note that the communication overhead factor, the t term, in the denominator, is not likely
to be dominant in the speedup equation, a factor which will be discussed in some detail
when we are analyzing the speedup of the overall algorithm in the next section.
S3yN,5 Np
3yN, + t(N, - 1)
-98-
(5.43)
M. Tran
Chapter5: Speedup Analysis
Observation 5.5.4: If there were no communication overhead (' = 0), S3 = Np, resulting in
ideal speedup, which is consistent to our intuition for Np independent subtours executing
on an Np-processors system.
Observation 5.5.5: From Equation 5.43, for the number of processors Np is large, the
speedup of Step 3 is approximately equal to Equation 5.44.
37N,
T
3
(5.44)
Note that Equation 5.44 is essentially a function of the number of cities Ns in a subtour.
5.6 General Bounds on Speedup of Algorithm A
In the last section, we have analyzed the speedup of the parallelization step of
Algorithm A, namely Step 3. In this section, we will examine the speedup of the overall
algorithm.
As mentioned earlier, Algorithm A consists principally of 7 steps, all of which are
essentially sequential except Step 3. Let us consider the sequential steps. Let T 1 denote
the execution time of Steps 1 through 7 except 3 using 1 processor. Then,
1
2
4
5
6
T =T'+ 1 T + T + T1 + T+T
7
(5.45)
and, let Týp denote the execution time of Steps 1 through 7 except 3 when using Np
processors. Then,
-99-
M. Tran
Chapter5: Speedup Analysis
T
TpN
P(Np)
(5.46)
It is important to notice the assumption about 1(Np) in Equation 5.46 instead of Np
because Steps 1 through 7 except 3 are essentially serial codes rather than parallel codes.
Generally, the speedup is not defined when a sequential program module is being executed
by a 1-processor system. In practice, we don't know exactly how the value of P(Np)
varies as a function of Np when executing the sequential codes on an Np-processors system
because we have to take other overhead problems such as resource contentions and
bottlenecks into consideration, and we don't exactly know how the scheduler of the system
is going to assign what piece of sequential codes to which processor. We can speculate
that as the number of allocated processors increases, the number of idle processors
increases.
Let us examine how our mathematical model tells us about the value of P(Np). Let
us consider Equation 5.9. A given program is processed sequentially means that P = 1.
Thus, only the case of unlimited processors allocation (Np 2 P) in Equation 5.9 is
applicable, and S3A = Np/2Np - 1, for all Np 2 1. As we can see, P(Np) is equivalent to
S3A. If Np = 1, P(Np) = 1 and if Np approaches to infinity, P(Np) is asymptotically
approaching to 0.5. Thus,
3(N)
and
N,
2N, - 1
(5.47)
0.5 <I(Np) 5 1.0
Let us consider a total of 7 steps of Algorithm A. The execution time of Algorithm
A when using 1 processor is the sum of the parallel step and the sequential steps, namely,
-100-
M. Tran
Chapter5: Speedup Analysis
T1 = T + TI = 3Pt A
(5.48)
I
and, the execution time of Algorithm A when using Np processors is of the following
form,
TN =T 3N + T N P
(5.49)
or equivalently,
[30(P + N, - (P + NP)% NP)tA + tf3N(PI
1)+TVNp
tON,
T
,N < P.
32N
"r 3P(2N,
P - P) t A+
N(P - 1) + TN p
N, 2 P.
(5.50)
Thus, the speedup of Algorithm A for an arbitrary number of idle processors is
33PPNptA + PTNNp
3P(P + N, - (P + Np)% NP)t A
TPN(P-
NP< P.
1) + TNp
3PPNptA + IPT;N
3P(2N,- P)t-A+
,N p P.
,N _P .
1)P+T*N
1-
(5.51)
And, if there are at most (Np - 1) idle processors, the speedup is at least,
30PN pt A+
P-N
N, < P.
S3PNPtA
33P(P + N,- l1)t A + TN(P- 1) +
+ PTNNp
130(2N, - 1) t A + TON(P - 1)+ TNp
-101-
PTNp
Np >_P.
(5.52)
M. Tran
Chapter5: Speedup Analysis
Substituting tA of Equation 5.5 into Equation 5.51, the speedup of Algorithm A for an
arbitrary number of idle processors is obtained,
3PyPNN + T'PN
3py(P+Np- (P + N)% Np)N 2 + 3NP 2 (P- 1) +T;PN
i
,Np P
P 2N
2
3PPNN +
L3py(2NP- P)N2+
N <P.
-
NP
1) + T 1P N
(5.53)
and substituting Equation 5.11 into Equation 5.53, the following equation results,
, 2
2
3P*kNpN 2 + T k Np
3py(k + 1)N2 + f3k2 Np(kN, - 1)N+ Tk N
p
'k
, P =kN
> 1.
S=(
3T2NN2+
PT*2mN2
N2
3•ym2(2m -1)N2 + ct•N,(Np - m)N + TlmN
(5.54)
Let us make afew observations from the above equations.
Observation 5.6.1: In the limit as N goes to infinity in Equation 5.53, Equation 5.55
results. Note that Equation 5.55 is equivalent to Equations 5.9, 5.31, and 5.41, whose
comments are equally applied here.
LP + NP - (P+Np)% Np
Lim S =N-+*PN
I2N-P
IN < P.
>
,N P.
(5.55)
Again, the speedup of Algorithm A is independent of communication overhead in the limit
as the number of the cities in the TSP approaches to infinity.
-102-
M. Tran
Chapter5: Speedup Analysis
Observation 5.6.2: It is interesting to observe that the first term of both the numerator and
the denominator in Equations 5.53 and 5.54 is a quadratic function of N, which is the
number of cities in the TSP, whereas the communication overhead factor in the
denominator (term with r) is linear in N and quadratic in Np. Since the number of cities in
the TSP is much greater than the number of processors, i.e. N >> Np, in the limit as N
approaches infinity, N2 term dominates in both the numerator and the denominator, and the
speedup of Algorithm A is independent of the communication overhead term.
Observation 5.6.3: Consider Equation 5.56, which is from Equation 5.54. Let m = 1 or
S
2
3p3ym2N N + T;,mN
(
-)N
(m)N T
3 ym2(2m -1)N + TPNp(Np - m)N + TlmN p
,P =
, V m >1.
(5.56)
Np = P in Equation 5.56. Then, Equation 5.57 results. As we can see, the effect of the
communication overhead factor r3Np(Np - 1)N is to reduce the speedup equation.
3pyNpN + PTN,
S=
3P N 2 + TNp(Np - 1)N + TN p
Observation 5.6.4:
(5.57)
To see the above effect more clearly, suppose there is no
communication overhead factor (r = 0). Thus, Equation 5.57 is further reduced to
Equation 5.58. Obviously, Equation 5.58 is approximately a linear function of Np as will
be shown later.
S=
3pyNpN 2 + IT; Np
S3pyN2
P
-103-
(5.58)
M. Tran
Chapter5: Speedup Analysis
From Observation 5.6.2, we have mentioned that the speedup is a function of the
square of the number of cities N, and the communication overhead factor is a linear
function of N. However, we have not elaborated much about where they are come from
and what they mean. Consider the execution time of Step 3A or Step 3C, namely tA in
Equation 5.5 or Equation 5.59, and the average communication overhead, namely Oa in
Equation 5.18. As we can see, tA is O(N 2 ) whereas Oa is O(N). So, as N is approaching
to infinity, tA is increasing much faster than Oa, tA thus dominates the speedup in Equation
5.51. Hence, in the limit as N approaches to infinity, the speedup equation is independent
of the communication overhead factor. What this means is that if there exists a reliable
"stand-alone" computer with a massive memory to store data and an extensive computer
time to execute Algorithm A for a long period of time, then the speedup of Algorithm A is
approaching to an ideal value; otherwise, the speedup is affected by the communication
overhead factor t3Np(Np - 1)N. One may wonder about the quadratic term in Np in the
communication overhead factor. Let us examine this by substituting N = NsP into
Equation 5.5 for tA and into Equation 5.54 for the speedup of Algorithm A. We thus have
the two following equations,
N =NY2
tA AP 2
(5.59)
and
2
3pymN N +
131pym(2m - 1)N2 +
m2
M3T
- m) + Tm- 2N,(Np
1.
m
(5.60)
-104-
M. Tran
Chapter5: Speedup Analysis
Let us further observe the above equations.
Observation 5.65: Notice that both tA in Equation 5.59 and the speedup in Equation 5.60
are now quadratic in Ns, which is the number of cities in a subtour.
Observation 5.6.6: In the limit as Ns approaches to infinity in Equation 5.60, the speedup
is asymptotically approaching to Equation 5.61, which is equivalent to Equations 5.12,
5.32, and 5.42, and all previous observations about the characteristics of this equation are
also applicable here. As N approaches to infinity in Equation 5.54, Equation 5.61 also
holds.
k )N, , P=kN, Vk>1.
Lim S = Lim S=
N-
N
N.N
P
2m - 1
Vm>
'
(5.61)
Observation 5.6.7: From Equation 5.60, we can see that the communication overhead
factor now is linear rather quadratic in Np.
Let us examine the special case of Equation 5.60, where m = 1 or Np = P. Then,
Equation 5.60 is reduced to Equation 5.62 for with and without the communication
overhead factor.
PT;
23PyN •N,+ +P
3PyN,+ t3N,(N P- 1)+ T;
+=N, +
N3PyN
+ TJ1
I "
,
OV p.
0,
, V
3PyN + T*
Observation 5.6.8: If Np = 1 and 0 = 1, then the speedup S = 1 as expected.
-105-
(5.62)
I
M. Tran
Chapter5: Speedup Analysis
Observation 5.6.9: If Np is large, then the speedup S is approximately equal to (3yNs/t)
which is the equivalent to Equation 5.44.
Observation 5.6.10: As we can see, from Equation 5.62, the effect of the communication
overhead factor (t Ns(Np - 1)) is to degrade the speedup of the Algorithm A.
Observation 5.6.11(a):. For 0.5 <5 3 5 1.0, rt
0 and an arbitrary number of idle
processors, the bounds on speedup of Algorithm A are
1. 5yN'N, + 0. 5T*
3yNN, + T•
_<S
1. 5yN + 0. 5tN,(N, - 1) + T
,, 0
3yN2 + tN,(Np - 1) + T
(5.63)
Observation 5.6.11(b): Similarly, for 0.55 0 < 1.0, t = 0 and an arbitrary number of idle
processors, the bounds on speedup of Algorithm A are of the following form,
1.
1. 5yN, + T
N
0. 5T<
3 N+
3yN ,'+ T')
1·. 5yN2 + T
T=0
3yN + T-
(5.64)
Observation 5.6.11(c): If we substitute Equation 5.47 for P3(Np) into Equation 5.62,
Equation 5.65 results. Note that Equation 5.65 is always on or within the bounds on
Equations 5.63 and 5.64, respectively.
3yNN p+ TN•
S3yN2N, +
0.
NN,(N, - 1) + T (2N,- 1)
3yNNP +T;N,
N3NN
, 1 + T;(2Np - 1)
-106-
(5.65)
M. Tran
Chapter5: Speedup Analysis
Observation 5.6.1 1(d): In comparison, we can see that the communication overhead factor
has shifted the bounds on speedup of Algorithm A to a different range, from the range in
Equation 5.63 to the range in Equation 5.64. Of course, the range in Equation 5.63 is
strictly less than the range in Equation 5.64 for all * 0.
Observation 5.6.11 (e): As one would expect, if there is no communication overhead (t =
0), the speedup of the Algorithm A in Equations 5.62 and 5.64 is a linear function of Np.
Observation 5.6.12: As Ns approaches to infinity in the limit in Equations 5.63, 5.64, and
5.65, the speedup approaches to Np, resulting ideal speedup.
Observation 5.6.13: If no sequential part of the Algorithm A is involved, i.e. T] = 0, the
speedup of Equation 5.64 is reduced to S = Np, resulting ideal speedup.
Let's consider the bounds on speedup for the worst idle processors case, when
there are (Np - 1) idle processors. Substituting Equation 5.59 for tA into Equation 5.52,
after some simplification, the following relation is obtained,
3PkNN 2 2
N
+ T1N
y(+2 N 3P*NNp
3P1( (k + 1)N, - 1)N, + 2pkN,N, (kN, - 1) + T'N,
3mN2N + PTm2N
3pym2(2N,
-
1)N2+
sINNm(NP
-m)+
T*m 2N
P
NNP
kNp, V k > 1.
N
m
Vm 1.
m,
(5.66)
As usual, let us consider the special case of processors allocation where m = 1 or
Np = P. Thus, Equation 5.66 is reduced to Equation 5.67 for with and without the
communication overhead factor.
-107-
M. Tran
Chapter5: Speedup Analysis
2
i3
Py(2N-
2
•
1)N 2 + r3N , Np(Np - 1) + T;N
p
3pyNNp + PTNp0
3py(2N, - 1)N, + T';NP
(5.67)
Observation 5.6.14(a): From Equation 5.67, for 0.5
I05 1.0, Tr 0 and (Np - 1) idle
processors, the following bounds on speedup of Algorithm A holds,
1.5yN ,N + 0.5TN,p
3yN N + T1 N,
1. 5y(2N, - 1)NZ + 0. 5"rN , Np(N, - 1) + TN,
3y(2N - 1)N +
N,Np(N-
(5.68)
Observation 5.6.14(b):
From Equation 5.68, for 0.55 03
1.0, r = 0 and (Np - 1) idle
processors, the bounds on speedup in Equation 5.69 results,
1. 5yN2N' + 0. 5T 1N,
3yNN
<S<
+ T 1N,
3y(2N, - 1)N 2 + T;N P
1. 5y(2N, - 1)N2 + T'N,
,<5=0
(5.69)
Of course, we can see that most of the previous observations on the speedup of
Algorithm A, where an arbitrary number of processors are idle, be applied to the worst case
as well, where (Np - 1) processors are idle.
Let us establish the general bounds, within which all speedup of Algorithm A must
lie. It can be easily argued that the lower bound (the worst case possible) on the speedup
of Algorithm A occurs where 0 = 0.5 and (Np - 1) processors are idle and that the upper
bound (the best case possible) on the speedup occurs where 0 = 1.0 and no processor is
idle. From Equation 5.68, the lower bound on speedup of the Algorithm A is just the
lower bound in Equation 5.68, and the upper bound on speedup of the Algorithm A is the
-108-
1) + TN
M. Tran
Chapter5: Speedup Analysis
upper bound in Equation 5.63. Hence, the general bounds on speedup of Algorithm A are
established as shown in Equation 5.70,
1. 5yNN, + 0. 5T:N,
3yN N, + T1
1. 5y(2N, - 1)N,+ 0. 5rN,Np(Np- 1) + TN,p
3yNZ, + tN,(N,-
1) + T1
(5.70)
If there is no communication overhead, the general bounds in Equation 5.70 are
reduced to Equation 5.71.
1. 5yN N' + 0. 5TNp
3yN N, + T+
1. 5y(2N, - 1)N + TNP
3yN2 + Ti
(5.71)
and if the execution time of both the communication overhead and the sequential parts of
Algorithm A are not present, then the bounds on speedup in Equation 5.70 is reduced to
N2
2N
<S<N,
2Np -1
,j=OandT =0.
(5.72)
As one would expect, if Np independent subtours are executed on an Np-processors
system, the upper bound is bounded by the ideal speedup Np whereas the lower bound is
bounded by the first term in Equation 5.72. Note that if Np approaches to infinity, lower
bound in Equation 5.72 approaches to 0.5, and the upper bound approaches to infinity; if
Np = 1, then the speedup is 1.
In this chapter, the speedup of the Parallel Simulated Annealing Algorithm
(Algorithm A) was analyzed. Communication overheads as the principal factors which
degrade the performance of the Parallel Simulated Annealing Algorithm were discussed. It
was shown that each processor in an Np-processors system must handle on the average of
-109-
M. Tran
Chapter5: Speedup Analysis
O(N) messages, and overhead due to data communication is O(N) or O(NpNs) units of
time, i.e. Equation 5.18 or Equation 5.60. The speedup of Algorithm A is a linear function
of the number of processors Np if there is no communication overhead (r = 0), and is equal
to ideal speedup Np if both communication overhead and the sequential parts of Algorithm
A are not present, i.e. Observation 5.6.13. General bounds on speedup of the Algorithm A
were established in Equation 5.70.
In the next chapter, a computational study of the Synchronous Parallel Simulated
Annealing Algorithm is investigated for the two different annealing schedules and for two
different neighborhood structures presented in Chapter 4. With variation of certain
parameters, the quality of solutions can be obtained and analyzed, and questions such as
"Which annealing schedule or which neighborhood structure provides consistent quality of
solutions for the TSP?" can be addressed.
-110-
m
M. Tran
Chapter6: EmpiricalAnalysis
CHAPTER 6
1PARALLIL
IMULATIED
ANEALNjG
6.1 Introduction
In Chapters 2 and 3, the underlying foundation of the Classical Simulated
Annealing Algorithm was laid. In Chapter 4, the Parallel Simulated Annealing Algorithm
was designed. In the last chapter, the speedup of the Parallel Simulated Annealing
Algorithm was analyzed. In this chapter, computational results of the Parallel Simulated
Annealing Algorithm are analyzed, discussed, compared and evaluated.
Thus far, empirical results for the Classical Simulated Annealing Algorithm are
reported by a number of authors for various combinatorial optimization problems. Most of
these results focus on the quality of the final solutions and the corresponding running times
obtained by solving a given TSP instance. However, empirical results for the Parallel
Simulated Annealing Algorithm are still in their infancy. Thus, besides the work of Kim
[Kim86], the computational results presented in this chapter represent one of the first
attempts in investigating the performance of the Parallel Simulated Annealing Algorithm
empirically.
-111-
M. Tran
Chapter6: EmpiricalAnalysis
The methodology under investigation is analyzed and discussed in Section 6.2.
Since it is well-known that the performance of any Simulated Annealing Algorithm is
dependent upon the annealing schedule, two different annealing schedules discussed in
Section 3.2.2, namely Equation 3.12 and Equation 3.13, are selected to examine the
behavior of the algorithm. Thus far, a good numerical value for the constant c has not been
reported. Therefore, for the annealing schedule in Equation 3.12, a specifically good
choice for the value c is determined by comparing the quality of solutions for different
values of c. Similarly, for the annealing schedule in Equation 3.13, a specially good
choice for the value d is determined by evaluating the quality of solutions for different
values of d. Then, this d is compared with that of Kim's. In this way, the results of two
annealing schedules can be systematically analyzed and evaluated in Section 6.3, and "How
will different annealing schedules affect the behavior of the Parallel Simulated Annealing
Algorithm?" can be addressed. In this analysis, the most effective annealing schedule is
obtained.
In order to illustrate the assertion that the Parallel Simulated Annealing
Algorithm is more powerful than Local Optimization, an assertion which was discussed in
Section 2.2 and repeated throughout this thesis, an experimental study is conducted and
illustrated in Section 6.4. Since it is equally well-known that neighborhood structures have
major impacts on the performance of the Simulated Annealing Algorithm, two specific
neighborhood structures, namely Citywise Exchange and Edgewise Exchange, are selected
to investigate the performance of the algorithm; the quality of solutions of these structures
are compared and evaluated in Section 6.5. In this way, questions such as "How much
effect does the neighborhood structure have on the overall performance of the Parallel
Simulated Annealing Algorithm" can be addressed.
-112-
M. Tran
Chapter6: EmpiricalAnalysis
6.2 Analysis Methodology
The computational study of the performance of the Synchronous Parallel Simulated
Annealing Algorithm (Algorithm A) is measured in terms of the quality of the final
solutions. The results are presented in Sections 6.3 - 6.5. The purpose of the experiments
is to provide useful computational results of the quality of solutions and running times.
Since the Parallel Simulated Annealing Algorithm is a probabilistic hill climbing
algorithm, the quality of solutions is measured by sampling the C(On)B at regular intervals
(maximum iterations/number of samples to be taken) over a number of runs with different
random seeds for the random number generator.
In this way, the average-case
performance analysis or ensemble averaging discussed in Chapter 3 can be ideally
investigated.
Let Copt be the cost value of the globally minimal configuration; then, the landmark
paper of Beardwood et al. [Beard59] has shown that
Lim C =VNN-
op
(6.1)
Where 0 is an unknown constant; (numerical
estimates by means of Monte Carlo experiments yield
0 = 0.749) and N is the number of cities in the tour.
As was seen from Chapter 3, the quality of the final solution is defined to be the
difference in cost value between the final or minimum configuration and the globally
minimal configuration. However, Equation 6.1 is applicable only for N near infinity, and
we don't really know the value of Copt when N is small. Thus, in this chapter, let us
define the quality of the final solution to be the minimum value of the best cost Cmin, i.e.
Cmin = min C(on)B)} for some n. Note that n is the nth iteration at which the minimum
cost occurs during one run of Algorithm A. These minimum values and iterations are
tabulated in Tables 6.1 and 6.2.
-113-
M. Tran
Chapter6: EmpiricalAnalysis
The codes of the software for Algorithm A is written in "C", provided in the
Appendix A, and is simulated on the uniprocessor of the FTPP, which consists of 16 32bit Motorola 68020 processors at 25 MHz with Motorola 68881 Floating Point
Coprocessors and other important features described in Section 4.2. All floating point
operations are performed by the coprocessor, namely the computations of distances and the
calculations of temperatures. The execution of the program follows the steps in Algorithm
A as outlined in Section 4.5.1. Some of the properties of the software are discussed
below, and the reader is encouraged to refer to Appendix A for more detailed
understanding. The inputs to the program are the number of cities, num_nodes, which is
equivalent to N; the number of subtours, npe, which is equivalent to P; the maximum
number of iterations, max ito; a random seed for a map, mapseed; and a random seed for
the algorithm, seed; the number of runs of the algorithm, num_runs; and the number of
samples, sample_pts. Another input variable, depth, is for the annealing schedule, where
depth can be used either for the variable c in Equation 3.12 or for the variable d in
Equation 3.13 appropriately. At any given iteration, the outputs of the program are the
temperature value T, perturbed cost C(an), best cost C(0n)B, perturbed tour a n , and best
tour (an)B. At the end of each experiment, the average values of C(a n ) and C(an)B and the
standard of deviations, i.e. C(T) and a , over the number of runs can be the outputs. Due
to the computational intensity of the TSP, Algorithm A is simulated with only one run for
each experiment.
Although the algorithm described above and in Section 4.5.1 is applicable for any
size of the TSP instance which is divisible by the number of subtours, i.e.
(num_nodes/npe) or (N/P) must be an integer, all numerical data presented in this chapter
are obtained by applying the Parallel Simulated Annealing Algorithm to two different
instances of the TSP, one instance with 50 cities and the other with 100 cities. These
medium-size optimization problems are considered as proper representatives of a broad
-114-
M. Tran
Chapter6: EmpiricalAnalysis
class of combinatorial optimization problems to which the analysis developed throughout
this thesis can be successfully applied, for the following reasons: (1) there are many
different values of the cost function, (2) there are many local minima, and (3) the cost
function is reasonably smooth, i.e. no clustering.
These TSP instances under consideration are symmetric instances whose euclidean
distances are defined on the NxN square. The coordinates (x,y) of the cities in a 2dimensional plane are generated in the beginning of the program based upon the input
mapseed for a random number generator. These coordinates are stored in memory and the
euclidean distances are computed. The timing results in calculating the running time is
obtained by calling the system function time() provided by the Unix Operating System.
6.3 Annealing Schedule Analysis
In this section, a computational study for two annealing schedules, namely T(k+1)
= cT(k) (Equation 3.12) and Tk = d/logk (Equation 3.13), is performed. The 50-cities
TSP instance considered in this analysis is a set of random vertex locations on a square,
and this 50-cities TSP instance is processed as 10 subtours, each of which consists of 5
cities. From this analysis, the design choices of constants of the annealing schedules can
be obtained and evaluated. Algorithm A with Citywise Exchange discussed in Section
4.3.1 is used throughout this comparison.
As briefly discussed in Chapter 4, the
temperature value at the nth iteration is computed in Step Al of Algorithm A by the central
coordinator and is passed to each processor.
-115-
M. Tran
Chapter6: EmpiricalAnalysis
Let us consider the first annealing schedule, T(k+l) = cT(k); Figure 6.1 plots the
temperature versus time (iterations) at various values of c and T(0) = 20.0. As was
mentioned in Chapter 4, this annealing schedule has been shown to provide good solutions
with the Simulated Annealing Algorithm. To examine how the behavior of the Simulated
Annealing Algorithm depends upon the annealing schedule, Figure 6.2 through Figure 6.7
plot the perturbed cost C(o n ) and the best cost C(on)B versus time for a sample run of
Algorithm A with c = 0.94, 0.95, 0.96, 0.97, 0.98, and 0.99, respectively. And, Figure
6.8 and Figure 6.9 plot the combined perturbed costs and the best costs. Note that the best
cost functions allow some uphill movements in the cost values when the temperature values
are high because the best cost C(Gn)B = ((AC < 0) or (r < e-AC/T)}. As the temperature
values decrease, the frequency of accepting uphill movements in the costs is reduced.
Maps of the best tours for a sample run at various iterations for c = 0.94 are shown in
Figure 6.10 through Figure 6.14 while maps of the best tour for c = 0.94, 0.95, 0.96,
0.97, 0.98, and 0.99 are shown in Figure 6.14 through Figure 6.19. From the best cost
figures and Figure 6.1, it is interesting to note how the annealing schedule, particularly the
value of c, has influenced the behavior of the Parallel Simulated Annealing Algorithm. If
the rate of temperature reduction, or cooling, is too slow, i.e. c = 0.99, both the perturbed
cost function and the best cost function converge fairly slowly; On the other hand, if the
annealing schedule increases its cooling rate, thus decreases the value of c to 0.94, both the
perturbed cost function and the best cost function converge much faster. But the quality of
the final solution has deteriorated. This justifies the claim of Kirkpatrick that the annealing
schedule is merely a control parameter. From Table 6.1, as the value of c increases, the
number of iterations that Algorithm A required to converge to the minimum cost value
increases. Note that the iteration from Table 6.1 is taken when the value of the cost
function is minimal. Note also that the number of iterations that Algorithm A required to
converge to the best minimum cost value at c = 0.96 is less than half that at c = 0.99. It is
-116-
9
o•
U
iv U
13U
200
250
300
Time (iterations)
Figure 6.1
350
400
450
500
Ll
c;*
OI
Oo
I
C)
oN
N'
u
100
200
300
400
500
Time (iterations)
Figure 6.2
4;
1200
1100
1000
900
800
\ý0
600
500
400
NI
0I
300
200
0
100
200
300
400
.
500
Time (iterations)
Figure 6.3
V
c•"
NI
N
0
100
300
200
Time (iterations)
iF 64'
IgUB
V0
400
500
C1
N
r
I
OA
U
100
200
300
400
500
3
Time (iterations)
- .I"
rigure D.5
(m
NI
5:
Sou
200
300
Time (iterations)
Figure 6.6
400
500
P*%
1-3
1400
1300
1200
1100
1000
900
-I
800
600
500
400
300
N.
200
0
100
200
300
Time (iterations)
Figure 6.7
400
500
k
0
100
200
300
Time (iterations)
Figure 6.8
400
500
I..
· 1M
14qU
1300
1200
1100
1000
900
700
600
500
400
N
300
200
0
100
200
300
Time (iterations)
Figure 6.9
400
500
zn
Chapter6: EmpiricalAnalysis
5
10
15
20
25
30
35
40
45
Figure 6.10: Map of the best tour at 1st Iteration for T(k+1) = cT(k), c = 0.94,
T(0) = 20.0, N = 50, and Best Cost = 1376.23.
-126-
50
M. Tran
Chapter6: EmpiricalAnalysis
50
45
40
35
30
25
20
15
10
A
0
5
10
15
20
25
30
35
40
45
Figure 6.11: Map of the Best Tour at 15th Iteration for T(k+l) = cT(k), c = 0.94,
T(0) = 20.0, N = 50, and Best Cost = 742.42.
-127-
35
M. Tran
Chapter6: EmpiricalAnalysis
50
45
40
35
30
25
20
15
10
5
0
5
10
15
20
25
30
35
40
45
Figure 6.12: Map of the Best Tour at 35th Iteration for T(k+1) = cT(k), c = 0.94,
T(0) = 20.0, N = 50, and Best Cost = 529.52.
-128-
50
M. Tran
Chapter 6: Empirical Analysis
50
45
40
35
30
25
20
15
10
C
0
5
10
15
20
25
30
35
40
45
Figure 6.13: Map of the Best Tour at 60th Iteration for T(k+1) = cT(k), c = 0.94,
T(0) = 20.0, N = 50, and Best Cost = 342.27.
-129-
50
M.
Chapter6: EmpiricalAnalysis
Tran
50
45
40
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
Figure 6.14: Map of the Best Tour at 93rd Iteration for T(k+1) = cT(k), c = 0.94,
T(O) = 20.0, N = 50, and Best Cost = 319.95.
-130-
50
M. Tran
Chapter6: EmpiricalAnalysis
--
50
45
40
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
Figure 6.15: Map of the Best Tour at 122nd Iteration for T(k+l) = cT(k), c = 0.95,
T(0) = 20.0, N = 50, and Best Cost = 299.06.
-131-
50
Chapter6: EmpiricalAnalysis
M. Tran
50
45
40
35
30
22
1
0
5
10
15
20
25
30
33
4U
4
Figure 6.16: Map of the Best Tour at 230th Iteration for T(k+l) = cT(k), c = 0.96,
T(0) = 20.0, N = 50, and Best Cost = 292.27.
-132-
J
n
Chapter6: EmpiricalAnalysis
M. Tran
50
45
40
35
30
25
21
1(
0
5
10
15
20
25
30
35
40
4
Figure 6.17: Map of the Best Tour at 286th Iteration for T(k+l) = cT(k), c = 0.97,
T(0) = 20.0, N = 50, and Best Cost = 293.79.
-133-
u
M. Tran
Chapter6: EmpiricalAnalysis
50
45
40
35
30
25
20
15
10
0
5
10
15
20
25
30
35
40
45
Figure 6.18: Map of the Best Tour at 277th Iteration for T(k+l) = cT(k), c = 0.98,
T(O) = 20.0, N = 50, and Best Cost = 295.45.
-134-
50
M. Tran
Chapter 6: EmpiricalAnalysis
50
45
40
35
30
25
1:
1'
0
5
10
15
25
20
30
35
40
45
Figure 6.19: Map of the Best Tour at 407th Iteration for T(k+1) = cT(k), c = 0.99,
T(0) = 20.0, N = 50, and Best Cost = 296.85.
-135-
50
M. Tran
Chapter6: EmpiricalAnalysis
c
Iteration
Cmin
0.94
93
319.997
0.95
122
299.064
0.96
230
292.267
0.97
286
293.791
0.98
277
295.450
0.99
470
296.845
Table 6.1: Minimum Costs or Quality of Final Solutions for Various Values of c for
T(k+1) = cT(k), T(0) = 20.0 and N = 50.
320
310
E
300
290
0.93
0.94
0.95
0.96
0.97
0.98
0.99
Figure 6.20: Minimum Costs or Quality of Final Solutions Versus c
for T(k+1) = cT(k).
-136-
1.00
M. Tran
Chapter6: EmpiricalAnalysis
more interesting to investigate which value of c provides the minimum cost or the best
quality of the final solution Cmin. As can be seen from Figure 6.20 and Table 6.1, the
annealing schedule with c = 0.96 provides the best solution, i.e. Cmin = 292.267 units.
When comparing the solution of this annealing schedule with solutions of other c's, the
minimum cost values at c = 0.94, 0.95, 0.97, 0.98, and 0.99 are respectively equal to
319.997, 299.064, 293.791, 295.450, and 296.845 units, which are respectively 27.73,
6.797, 1.524, 3.183, 4.578 units higher than at c = 0.96. In comparison of the maps of
the best tours, Figure 6.16 at c = 0.96 also provides the most appealing tour with respect to
its counterparts in Figure 6.14 at c = 0.94, in Figure 6.15 at c = 0.95, in Figure 6.17 at c
= 0.97, in Figure 6.18 at c = 0.98, and in Figure 6.19 at c = 0.99.
Now let's consider the second annealing schedule Tk = d/logk whose temperature
values versus time are shown in Figure 6.21 for different values of d. From the theoretical
viewpoint, Geman and Geman required that d 2 L = I9 IA for convergence, and 19 I =
50!/2, where I9 1is the cardinality of the configuration space and A is some constant.
Independent of the value of A, L is extremely large, and the cooling of the Equation 3.13
becomes very slow. Moreover, Mitra, Romeo & Sangiovanni-Vincentalli [Mit85] required
d 2 rX for convergence, and the values of r and X are impossible to determine a priori
computationally.
Hajek's condition on d is similarly true [Haj85].
Therefore, a
computational study to investigate how the value of d influences the behavior of our
algorithm is required.
The behavior of Algorithm A with different values of d can best be seen from
Figure 6.22 through Figure 6.25. These figures plot the perturbed costs C(an) and the
best costs C(on)B versus time (iterations) for d = 5, 10, 15, and 20, respectively. And,
Figure 6.26 and Figure 6.27 summarize the results of these perturbed costs and best costs.
To examine closely how the value of d influences the behavior of Algorithm A, from
Figure 6.27, the best cost with d = 5 provides the minimal cost function with respect to
-137-
M. Tran
Chapter 6: EmpiricalAnalysis
other best cost functions, i.e. best costs with d = 10, 15, and 20. At d = 5, Figures 6.22
and 6.27 indicate a stable best cost function, allowing uphill and downhill movements in
cost values regularly.
As d increases, the behavior of Algorithm A becomes more
irregular. At d = 20, Figures 6.25 and 6.27 show many uphill movements in the cost
functions. From these figures, one can deduce that the performance of Algorithm A is very
dependent on the annealing schedule Tk = d/logk and is extremely sensitive to the variation
of the values of d. If the rate of temperature reduction, or cooling, is too slow, i.e. d >>
20 , the cost functions worsen and never seem to converge to a "near-optimal" solution.
Conversely, if the cooling rate is too fast, i.e. d << 5, the cost functions may not have any
indication of uphill movements in cost; thus, the algorithm may have a tendency to get
stuck at some local minimum. Between these two extreme cases, the cooling rate is gradual
enough such that it allows the algorithm to converge with regular uphill and downhill
movements in the cost functions. Thus, this value of d is likely to be around 5 in contrast
with the value found in [Kim86], which is 20. Although experiments to "fine-tune" the
value of d are not conducted in this thesis, in a future research effort, it would be very
interesting to see how the variation of d between 1 to 5 affect the performance of Algorithm
A.
In addition to the cost functions, the maps of the best tours are invaluable tools in
examining the performance of the Parallel Simulated Annealing Algorithm. The maps of
the best tours for a sample run at d = 5 are shown in Figure 6.28 through Figure 6.33, and
maps at d = 10, 15, and 20 are respectively provided in Figure 6.34 through Figure 6.36.
Since the quality of the final solutions are inversely related to the frequency of crossings in
the best tour, i.e. the less frequent crossings the better the tour and solution, it can be
observed that the best tour map in Figure 6.33 with d = 5 provides the best tour compared
to the maps in Figure 6.34 with d = 10, Figure 6.35 with d = 15, and Figure 6.36 with d =
20, respectively. Since each tour is associated with a minimum cost, the plot of the
-138-
M. Tran
Chapter6: EmpiricalAnalysis
minimum costs or the quality of the final solutions versus d in Figure 6.37 and Table 6.2
justify that at d = 5 the cost value is minimal. Hence, the annealing schedule Tk = d/logk
with d = 5 with the best cost = 285.54 units provides the best solution.
Let us compare the performance of Algorithm A using two different annealing
schedules discussed above. As was determined, T(k+l) = cT(k) with c = 0.96 provides
the best solution among all the solutions with other values of c's. As was determined in the
previous paragraph, Tk = d/logk with d = 5 provides the most competitive solution with the
Simulated Annealing Algorithm with respect to other values of d. To investigate which
annealing schedule, T(k+1) = cT(k) with c = 0.96 or Tk = d/logk with d = 5, provides the
better solution to the Traveling Salesman Problem, three criteria are examined. First, the
perturbed cost or the best cost functions plotted in Figure 6.38 and Figure 6.39 for the two
different annealing schedules are considered. As can be seen from Figure 6.39, the best
cost function with d = 5 converges initially to a good solution faster than that with c =
0.96, but it may converge to some local minimum. The best cost function for c = 0.96
reaches its minimum cost value at the 230th iteration while the best cost function for d = 5
converges to its minimum cost value at 293rd iteration. Although the best cost function
with d = 5 converges to its best solution slower than that with c = 0.96, it allows many
uphill cost movements, thus allowing the algorithm to escape from local minima. This
analysis intuitively justifies the theoretical fact that as time goes to infinity, the algorithm
converges to an optimal solution with an annealing schedule Tk = d/logk. Secondly, by
comparison of the TSP tours in Figure 6.16 for c = 0.96 and Figure 6.33 for d = 5
respectively, one can observe that their characteristics are quite similar. Finally, the
minimum cost or the quality of the final solution of the annealing schedule with c = 0.96
are 292.267 units and 286.97 units respectively while the minimum cost or the quality of
the final solution of the annealing schedule with d = 5 are respectively 285.54 units and
280.244 units; the minimum cost or the quality of the final solution with d = 5 are 6.727
-139-
N'
§\
Irrmmbrrr
X=DUctom
kt
N
0
100
200
300
400
500
600
Time (iterations)
Figure 6.21
700
800
900
1000
c'I.
,,
.,,,
1
4ýk
N
rm
N:
0
100
200
300
Time (iterations)
Figure 6.22
400
500
y
140C
1300
1200
1100
1000
900
800
' 700
600
500
400
300
200
0
100
200
300
Time (iterations)
Figure 6.23
S
400
500
N
p
w
N"
,J
0
100
200
300
Time (iterations)
Fieure 6.24
400
500
'AI
P
o\
~1
d
3
0
100
20C
300
Time (iterations)
Fieure 6.25
400
500
N
N
1uu
200
300
Time (iterations)
Fieure 6.26
400
500
1400
1300
1200
1100
1000
Cb)
900
*1
800
700
600
500
400
300
200
0
100
200
300
Time (iterations)
Figure 6.27
400
500
M. Tran
Chapter6: EmpiricalAnalysis
50
45
40
35
3C
2(
1
0
5
10
15
20
25
30
35
40
45
Figure 6.28: Map of the Best Tour at 1st Iteration for Tk = d/logk, d = 5, N = 50 and
Best Cost = 1376.23.
-147-
50
M. Tran
Chapter6: EmpiricalAnalysis
50
45
40
35
30
25
20
14
I(
0
5
10
15
20
25
30
35
40
45
50
Figure 6.29: Map of the Best Tour at 10th Iteration for Tk = d/logk, d = 5, N = 50 and
Best Cost = 596.85.
-148-
M. Tran
Chapter6: EmpiricalAnalysis
--
50
45
40
35
30
25
20
15
10
0
5
10
15
25
20
30
35
40
45
Figure 6.30: Map of the Best Tour at 43rd Iteration for Tk = d/logk, d = 5, N = 50 and
Best Cost = 317.23.
149-
50
M. Tran
Chapter6: EmpiricalAnalysis
5
10
15
20
25
30
35
40
45
Figure 6.31: Map of the Best Tour at 67th Iteration for Tk = d/logk, d = 5, N = 50 and
Best Cost = 311.63.
-150-
5(
Chapter6: EmpiricalAnalysis
M. Tran
50
45
40
35
30
25
2(
i:
1
0
5
10
15
20
25
30
35
4U
43
Figure 6.32: Map of the Best Tour at 203rd Iteration for Tk = d/logk, d = 5, N = 50 and
Best Cost = 307.54.
-151-
U
0
1
M. Tran
Chapter6: EmpiricalAnalysis
50
45
40
35
30
25
20
15
10
0
5
10
15
20
25
30
35
40
45
50
Figure 6.33: Map of the Best Tour at 293rd Iteration for Tk = d/logk, d = 5, N = 50 and
Best Cost = 285.54
-152-
Chapter6: EmpiricalAnalysis
M. Tran
50
45
40
35
30
25
20
15
ic
0
5
10
15
20
25
30
35
40
45
Figure 6.34: Map of the Best Tour at 300th Iteration for Tk = d/logk, d = 10, N = 50
and Best Cost = 365.65.
153-
A
50
M. Tran
Chapter6: Empirical Analysis
50
45
40
35
30
25
20
15
1(
0
5
10
15
20
25
30
35
40
45
Figure 6.35: Map of the Best Tour at 300th Iteration for Tk = d/logk, d = 15, N = 50
and Best Cost = 504.87.
-154-
50
Chapter6: EmpiricalAnalysis
M. Tran
50
45
40
35
30
25
2(
11
1(
A
0
5
10
15
25
20
30
35
40
45
Figure 6.36: Map of the Best Tour at 122nd Iteration for Tk = d/logk, d = 20, N = 50
and Best Cost = 514.24.
-155-
30
Chapter 6. EmpiricalAnalysis
M. Tran
d
iteration
Cm
5
293
285.54
10
300
365.65
15
300
504.87
20
122
514.24
Table 6.2: Minimum costs or Quality of Final Solutions for Various Values of d.
600
550
500
450
400
350
300
250
200
0
5
10
15
20
25
d
Figure 6.37: Minimum Costs or Quality of Final Solutions Versus d for Tk = d/logk.
-156-
N
k
r
ul
N
'.3
U
1o00
200
300
Time (iterations)
Figure 6.38
400
500
_ ___
1400
1300
1200
1100
1000
900
800
700
600
500
8"
400
o\
300
sI
200
100.0
200.0
300.0
Time (iterations)
400.0
500.0
ts
E;·
Figure 6.39
M. Tran
Chapter6: EmpiricalAnalysis
in the next section and subsequent sections, the annealing schedule Tk = d/logk with d = 5
is assumed or explicitly indicated otherwise.
6.4 Simulated Annealing Versus Local Optimization
In the previous section, the annealing schedule with Tk = d/logk at d = 5 was
shown to be better than the annealing schedule T(k+1) = cT(k), T(0) = 20.0, for any c. In
this section, using this determined annealing schedule, namely Tk = d/logk with d = 5, the
performance analyses of the Algorithm A using Simulated Annealing and the Algorithm A
using Local Optimization techniques are investigated.
Recall that Simulated Annealing accepts the next configuration with probability 1 if
AC < 0 and accepts it with probability exp(-AC/T) if AC > 0. On the other hand, Local
Optimization only accepts the next configuration if AC < 0 and rejects it, otherwise. In
order to implement Algorithm A using Local Optimization, i.e. without annealing, only
Step A6 is changed. By removing the factor (r < exp {-AC/Fn)) from both Equations 4.20
and 4.21, the Algorithm A becomes a Local Optimization Algorithm.
The experiments for both Simulated Annealing and Local Optimization are
conducted using the annealing schedule Tk = d/logk, d = 5. In this way, the behavior of
the Simulated Annealing can be compared with that of the Local Optimization. The map of
the best tour using Local Optimization is shown in Figure 6.40 while its associated cost
functions are plotted in Figure 6.41.
Figure 6.42 through Figure 6.44 plot different
variations of perturbed cost and best cost using Local Optimization and perturbed cost and
best cost using Simulated Annealing. The characteristic appeal of the best tour map using
-159-
Chapter6: EmpiricalAnalysis
M. Tran
50
45
40
35
30
25
2(
0
5
10
15
20
25
30
35
40
45
Figure 6.40: Map of the Best Tour using Local Optimization at 108th Iteration for
Tk = d/logk, d = 5, N = 50 and Best Cost = 327.93.
-160-
5050
N"
0
100
300
200
Time (iterations)
Figure 6.41
400
500
z-
0IAA
14U1
130(
1200
1100
1000
900
800
!
to\
700
!>
600
500
400
N
300
200
u
100
200
300
Time (iterations)
Figure 6.42
400
500
.
.^A
1400
1300
1200
1100
1000
900
800
700
600
500
400
N
300
200
0
100
200
300
Time (iterations)
Figure 6.43
400
500
-
o)
0
100
200
300
400
500
.
Time (iterations)
Figure 6.44
tim
M. Tran
Chapter6: EmpiricalAnalysis
AC/Tn}) from both Equations 4.20 and 4.21, Algorithm A becomes a Local Optimization
Algorithm.
The experiments for both Simulated Annealing and Local Optimization are
conducted using the annealing schedule Tk = d/logk, d = 5. Note that the annealing
schedule is still used for the local Simulated Annealing, i.e. annealing subtours. In this
way, the behavior of the Simulated Annealing can be compared with that of the Local
Optimization. The map of the best tour using Local Optimization is shown in Figure 6.40
while its associated cost functions are plotted in Figure 6.41. Figure 6.42 through Figure
6.44 plot different variations of perturbed cost and best cost using Local Optimization and
perturbed cost and best cost using Simulated Annealing. The characteristic appeal of the
best tour map using Simulated Annealing shown in Figure 6.33 appears much better than
the best tour map using Local Optimization shown in Figure 6.40. The best cost of
Algorithm A with Simulated Annealing is 285.54 units which occurred at the 293rd
iteration while the best cost with Local Optimization is 327.93 units which occurred at the
108th iteration; Algorithm A with the Simulated Annealing technique converges to a much
better solution than Algorithm A with the Local Optimization technique, but it has to iterate
much longer than Local Optimization. This observation is also clearly seen in Figure 6.44.
Notice that while Algorithm A with Local Optimization converges fairly quickly, and it gets
stuck at the first local minimum at the 108th iteration with the best cost equal to 327.93
units.
Algorithm A with Simulated Annealing continues to iterate, avoiding this local
minimum, to a lower (or better) best cost value. Both the ability to avoid being entrapped
in local minima and convergence to a better solution verify our intuition (Chapter 2) and
various notes throughout this thesis that the Simulated Annealing Algorithm is much more
powerful than Local Optimization.
The running time of Algorithm A for a 50-cities TSP instance, which has been
investigated in the last section and this, is on the average about 13 hours in real-time.
-165-
M. Tran
Chapter 6: EmpiricalAnalysis
6.5 Citywise Versus Edgewise Exchanges
In the last section, the usefulness of Simulated Annealing over Local Optimization
was shown. In this section, the question "How much impact does the neighborhood
structure have on the overall performance of the Parallel Simulated Annealing Algorithm" is
addressed.
For this particular investigation, two neighborhood structures, namely
Citywise Exchange and Edgewise Exchange, are selected. Furthermore, a 100-cities TSP
instance is used to process the experimental results independently, and these results are
compared. This 100-cities TSP is processed as 20 subtours, each of which consists of 5
cities.
Maps of the best tours using Citywise Exchange for a sample run of Algorithm A
are shown in Figure 6.45 through Figure 6.49. For comparison purposes, a map of the
best tour using Edgewise Exchange is also provided in Figure 6.50. As can be seen from
Figure 6.49 and Figure 6.50, the map of the best tour using Citywise Exchange is
intuitively more appealing than the map of the best tour using Edgewise Exchange. This
difference is justified by their corresponding best cost values; the best cost using Citywise
Exchange is 865.09 units while the best cost using Edgewise Exchange is 1087.16 units.
Moreover, Citywise Exchange allows the algorithm some regular uphill and downhill
movements in both the perturbed cost and best cost functions as shown in Figure 6.51
while allowing the algorithm to converge to a good solution. On the other hand, Edgewise
Exchange seems to allow the algorithm "only" downhill movements in the cost functions as
shown in Figure 6.52. Because of this "only" downhill movements, the algorithm
converges to a high cost value. This comparison can be seen in Figure 6.53 and Figure
6.54 where only the best cost functions of the two neighborhood structures versus time and
versus temperature are respectively plotted. It is interesting to note, from Figure 6.54, the
divergence point of these two functions, which is about at a temperature of 7.15 units. For
-166-
M. Tran
Chapter6: EmpiricalAnalysis
-100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
Figure 6.45: Map of the Best Tour using Citywise Exchange at 1st Iteration for
Tk = d/logk, d = 5, N = 100 and Best Cost = 5517.64.
-167-
95
10(
M. Tran
Chapter6: EmpiricalAnalysis
100 95
-
9085
-
8075 70
-
65
-
6055 -
50
-
45
-
4035
30252015 '
10'
5.
o
.
0
1
5
. a1 '
10
1
15
'
.
.
20
.1'
25
I1
30
'
1 '
35
.
1
40
'
.
I
.I..I..I..I . . I
45
50
55
60
65
70
75
.
80
..I.,I ..I.
85
90
Figure 6.46: Map of the Best Tour using Citywise Exchange at 10th Iteration for
Tk = d/logk, d = 5, N = 100 and Best Cost = 1778.22.
-168-
95
100
M. Tran
Chapter6: EmpiricalAnalysis
OAM
100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
Figure 6.47: Map of the Best Tour using Citywise Exchange at 40th Iteration for
Tk = d/logk, d = 5, N = 100 and Best Cost = 1236.59.
-169-
95
M. Tran
Chapter6: EmpiricalAnalysis
100-
95 9085 -
80 75
-
70
65
6055 50
45
45 40'
35'
30'
25
20
15
10
5
0
.
0
5
I vI II
10
II
15
v
I
20
I
25
.
.
rI
I
30
35
40
I
45
.
I .
50
I
55
. I
60
I
65
.
.'*I .
I
70
.'. .
75
80
.
.
85
.
I
*
90
Figure 6.48: Map of the Best Tour using Citywise Exchange at 94th Iteration for
Tk = d/logk, d = 5, N •00 and Best Cost = 992.75.
* I..
*
95
*
100
M. Tran
Chapter6: EmpiricalAnalysis
1007
95
90O
85
80
75
-
70
65
-
6055 50 45 40 35
-
30-
252015
-
10.
50-
-..v....................................
r0
5
10
15
20
.
25
30
35
40
45
50
55
60
.
I
65
a
I 1 611111 111
70
75
80
85
.I ..I.I
90
Figure 6.49: Map of the Best Tour using Citywise Exchange at 436th Iteration for
Tk = d/logk, d = 5, N = 100 and Best Cost = 865.09.
-171-
95
100
M. Tran
Chapter6: EmpiricalAnalysis
I
95
90-
85 80
-
75
-
70 65 -
6055 50 45 -
4035 30-
25
20
15 10 -
50-
"
0
.
1
5
' 1
I'
10
'
1
15
'
1
'''1
1''1
I
20
25
.
I
30
I
I
35
I
I
I
40
'
I
Ii
'
45
I1
50
''I
I
55
.
I
60
'
1
65
'
I 1
70
1
75
1
I '
80
.
1
I
85
.
1
I
.
90
Figure 6.50: Map of the Best Tour using Edgewise Exchange at 492nd Iteration for
Tk = d/logk, d = 5, N = 100 and Best Cost = 1087.16.
-172-
1
I
95
'
I
100
5500
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
r!1
0
0
100
200
300
Time (iterations)
Figure 6.51
400
500
6000
5500
I
5000
Costs versus Time for Tk = d/logk at d = 5 and N = 100.
{
4500
4000-
-*-
3500
Edgewise Perturbed Cost
Edgewise Best Cost
300025002000
1500
1000
J
-.
-
1
100
200
300
Time (iterations)
Figure 6.52
400
500
2_
_
_
'^^^
6(00
5500
5000
4500
4000
i
3500
3000
2500
2000
1500
1000
500
0
0
100
200
300
Time (iterations)
Figure 6.53
400
500
6UMO
5500
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
10
0
101
Temperature
Figure 6.54
10
2
Chapter6: EmpiricalAnalysis
M. Tran
this particular study, it is safe to conclude that the performance of Algorithm A using
Citywise Exchange exceeds the performance using Edgewise Exchange (Lin's 2-Opt
Algorithm) in three different perspectives: maps of the TSP tours, perturbed and best cost
functions, and minimum cost values.
From this analysis, one can see that the performance of the Parallel Simulated
Annealing Algorithm depends heavily upon the neighborhood structure(s) being used.
Though further investigation of the above neighborhood structures are not conducted in this
thesis, it would be a very interesting subject for further study by comparing the
performance of Citywise Exchange with some other neighborhood structure or combination
of neighborhood structures.
The running time of Algorithm A for a 100-cities TSP instance is on the average
about 45 hours in real-time, which is more than triple the running time of a 50-cities TSP
instance.
In this chapter, we have seen how strongly the performance of the Parallel
Simulated Annealing Algorithm depends on the chosen annealing schedule and its
variables. This was experimentally examined in Section 6.3. With a properly chosen
annealing schedule good solutions can be obtained for many combinatorial optimization
problems. In Section 6.4, the usefulness of Simulated Annealing over Local Optimization
was demonstrated. The performance of the Parallel Simulated Annealing Algorithm
depends not only on the annealing schedule but also on the neighborhood structure as
illustrated in Section 6.5. Finally, the computation times can be very extensive for some
large combinatorial optimization problems. For example, double the size of the TSP
instance from 50-cities to 100-cities, the running time is increased from 13 hours to 45
hours on the average, which is more than triple.
-177-
Chapter7: Summary and Conclusions
M. Tran
CHAPTER 7
sulMAkR
kAND
DCONCLUSIION
In Chapter 1, the Traveling Salesman Problem was defined and introduced.
Practical applications of the TSP were also briefly covered. The historical relationship of
the TSP with the field of combinatorial optimization was reviewed, and the methodology in
solving the TSP was reviewed and introduced. In Chapter 2, the historical development of
the classical Simulated Annealing was studied, and the Simulated Annealing Algorithm was
outlined. In Chapter 3, certain key mathematical concepts which are underlying the
Simulated Annealing Algorithm were examined. In Chapter 4, the Synchronous Parallel
Simulated Annealing Algorithm was designed (Algorithm A in Section 4.5.1). In this
candidate parallel algorithm (Algorithm A), the central coordinator begins at each iteration
by selecting the temperature value, randomly generating a starting city, and partitioning the
current tour into P subtours. Then, it delivers the temperature value and the P partitioned
subtours to Np processing elements in the system. Each of these processors is responsible
for perturbing the subtour, computing the cost of the perturbed subtour, and performing the
local annealing process as in Step A3. When each of these processors is finished with its
tasks in its subtours, the central coordinator is reconstructing the new tour from these
"better" subtours, computing its new cost, and performing the global annealing process.
Then, the central coordinator repeats the next iteration until some "maximum iterations"
-178-
M. Tran
Chapter 7: Summary and Conclusions
stopping criterion has been satisfied. In Chapter 4, two neighborhood structures were also
outlined and discussed.
In Chapter 5, the speedup of Algorithm A was analyzed for two different cases of
processors allocations, namely, the unlimited number of processors allocation (Np 2 P) and
the limited number of processors allocation (Np < P) and for two different cases of idle
processors, a special case of which was there are at most (Np - 1) idle processors. It was
shown that there is little benefit in partioning the number of subtours P much beyond the
number of allocated processors Np or in allocating the number of processors Np much
beyond the number of partitioned subtours P; the maximum speedup occurs when the
number of allocated processors is equal to the number of subtours, i.e. Np = P and
Obervations 5.2.1-5.2.3. Communications are taken place in Algorithm A between the
central coordinator and the processors, and between processors themselves. Interprocessor
communication is where processors are exchanging cities with one another. It was found
that, on the average, a processor in the network can handle O(N) messages, and
communication overhead is O(N) or O(NpNs) units of time from Equation 5.18 or
tPNs(Np - 1) at each iteration. The speedup of Algorithm A is not generally independent of
the communication overhead factor but rather is dependent upon the communication
overhead factor. However, as the number of cities N or Ns increases, the communication
overhead factor seems to be less donimant in the speedup equation; as the number of cities
N or Ns approaches to infinity, the speedup of the Algorithm A is asymptotically
approaching to a linear function of the number of allocated processors Np, (Equation 5.61
and Observation 5.6.6), and is independent of the communication overhead factor
,t3Np(Np - 1)N or rpNs(Np - 1). For a large number of allocated processors Np, the
speedup is approximately equal to (3yNs/t). For an arbitrary idle processors case, the
bounds on speedup of Algorithm A for all Ns and Np were shown to be in the range of
Equation 5.63 if the communication overhead factor were taken into consideration and in
the range of Equation 5.64, otherwise. For the worst idle processors case, the bounds on
-179-
[]
M. Tran
Chapter 7: Summary and Conclusions
speedup of Algorithm A were shown to be in the range of Equation 5.68 if the
communication overhead factor were taken into consideration and in the range of Equation
5.69, otherwise. For any case of idle processors, the general bounds on speedup of
Algorithm A for all Ns and Np were shown to be in the range of Equation 5.70 if
communication overhead factor were taken into consideration and in the range of Equation
5.71, otherwise.
In Chapter 6, an computational study was conducted for two instances of the TSP,
namely a 50 cities and a 100 cities. Using a 50-cities TSP instance, in Section 6.3,
experiments were performed for two annealing schedules, namely T(k+l) = cT(k) and Tk =
d/logk, and the results were compared and evaluated. For the annealing schedule T(k+1) =
cT(k), experiments were performed for T(0) = 20.0 and for different values of c in the
range of 0.94 • c 5 0.99. It was found that the annealing schedule T(k+1) = cT(k) with c
= 0.96 provided the best solution relative with respect to other values of c. The quality of
the final solution or the minimum cost Cmin was found to be 292.267 units. For the
annealing schedule Tk = d/logk, experiments were perfomred for d = 5, 10, 15, and 20,
and the results were compared and evaluated. It was found that the annealing schedule Tk
= d/logk with d = 5 provides the best solution with respect to other values of d's. The
quality of the final solution or the minimum cost Cmin was 285.54 units. In comparison of
the results of the two annealing schedules, the minimum cost value or the quality of the
final solution of Tk = d/logk with d = 5 is 6.727 units lower than those of T(k+l) = cT(k)
with c = 0.96; thus, the annealing schedule Tk = d/logk with d = 5 provided a better
solution than the annealing schedule T(k+l) = cT(k). In general, the annealing schedule
should be lowered slowly; otherwise, the cost function has the tendency to get stuck in
higher cost values, possibly a local minimum. Similarly, experiments were performed for
Simulated Annealing and Local Optimization in Section 6.4, and the results were compared
and evaluated. The results indicated that Simulated Annealing yielded much better solution
than Local Optimization, but it had to iterate much longer. In Section 6.5, a 100-cities
-180-
M. Tran
Chapter 7: Summary and Conclusions
TSP instance was used to process the experimental results for two neighborhood
structures, namely Citywise Exchange and Edgewise Exchange. It was found that the best
cost using Citywise Exchange was 865.09 units while the best cost using Edgewise
Exchange was 1087.16 units. With other criteria, it was concluded that the performance of
Algorithm A using Citywise Exchange outperformed the performance of Algorithm A using
Edgewise Exchange.
During the course of this study, the author has encountered several "interesting"
and "challenging" open research problems about the Traveling Salesman Problem and
Parallel Simulated Annealing Algorithm. Among these, two areas below are suggested for
further investigation:
1. A vigorous theoretical treatment of the average-case performance analysis for the
Parallel Simulated Annealing Algorithm.
2. A computational study of Algorithm B for the Traveling Salesman Problem.
-181-
M. Tran
Appendix A: Simulation Program
APPENDIX A
1MULAITON PGRA1J IR
A
SY4CURONIU0 PARATLTML SlTATIID
AIHAILkNHG AkLGRITUjM
The software program for the Parallel Simulated Annealing Algorithm is given in
the following pages. It is written in "C" on Unix Operating System. In the simulation
program, five math library subroutines were accessed by including <math.h>: drand48(),
which returns a non-negative double-precision values uniformly distributed over the
interval [0,1); pow() perfoms XY operations; exp() perfoms ex operations; loglO()
perfoms logarithmic operations for the annealing schedule, and sqrt() perfoms square root
operations. Timing information is obtained by calling the function time() provided the
Unix Operating System; this time() function computes the execution of the program in realtime. The structure of the program follows the general steps outlined in Section 4.5.1.
-182-
Appendix A: Simulation Program
M. Tran
/*
/*
/*
SYNCHRONOUS PARALLEL SIMULATED ANNEALING ANLGORITHM */
*/
FOR THE TRAVELING SALESMAN PROBLEM
*/
written in "C", July 15, 1989.
/*******************************************************
#define
#define
#define
#define
INDEBUG
MAXNODES 500
MAXSUBTOURS
MAXITO
/** input() debug constant **/
/** maximum nodes, subtours and iterations */
20
500
#define
#define
NOINIT
INIT
/** states of the subtour **/
#define
#define
ACCEPTED
NOTACCEPTED
#define
#define
FREE
BUSY
#define
#define
AVAILABLE
RESERVED
#define
#define
FALSE
TRUE
#define
#define
NONODE
NODE +1
3
9 /** states of Annealing **/
10
/** status of the processors **/
19 /** states of the intersubtours pairwise exchage */
/**************************STRUCTURES**********************/
/** structure to save xy coordinates of a random vertex map **/
struct dsp_map_struct {
float x_coord;
float y_coord;
/** message structure between central coordinator and processors **/
struct message_struct {
current_subtour[MAXNODES]; /* current nodes in subtour */
int
perturb_subtour[MAXNODES]; /* the perturbed subtour */
int
/* states of the nodes */
node_state[MAXNODES];
int
reserved_node[MAXNODES]; /* reserved nodes */
int
/* status of the processor */
status;
int
int
state_subtour; /* defined as above */
int
state_tour,
int
float
float
float
/* number of nodes in a subtour */
cardinality;
temperature;
cost_curr subtour; /* cost of the current subtour */
cost_perturb_subtour; /* cost of the perturb subtour */
-183-
n
M. Tran
float
float
float
float
Appendix A: Simulation Program
cost_subtour;
chgin cost;
rng;
/*
acceptprob;
/* accepted cost of subtour */
/* change in cost of subtour */
random genertor */
/* acceptance probability */
/** processing element structure **/
struct tsp_pe_struct {
struct
message_struct
msg;
/** central coordinator structure **/
struct tsp_ccstruct I
float cost_init_tour[20];
int
soln_tour[MAXNODES];
/**
float
cost_soln tour;
/** C(O) **/
int
best_tour[MAXNODES];
float
int
float
float
float
float
float
float
float
float
float
int
/*****
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
**/
/**
**/
3B
cost_best_tour;
/** C(})B **/
optimal_tour[MAXNODES]; /** min{oB} **/
/** min{C(C)B} **/
costopt_tour[20];
opttemperature[20];
opt_iteration[20];
iteration;
mgcc;
acceptprobcc;
chgincostcc;
temperature;
state_tour;
not_acc_cnt;
global variables ******/
int init_run_time, final_run_time, elapsed_run_time;
int npe;
/** number of processing elements **/
int max_subtours;
/** maximum # of subtours formed **/
int num_nodes,num_edges; /** number of vertices, nodes in TSP */
int seed,mapseed;
/** seeds for random number generator **/
int
num_runs,run_num; /** # of runs of program */
int sample_pts;
/** # of sample points to be taken **/
int
SamplePts;
/** running index for sample_pts **/
float max_ito;
/* stopping criterion, on maximum iterations*/
int
iter,
-184-
M. Tran
Appendix A: Simulation Program
/*****structures ***************/
extern struct tsp cc struct central_coord; /* central coordinator */
extern struct tsp_pe_struct proc_pe[];
/* p processing elements */
extern struct dsp_map_struct
map_nodes[]; /* map */
/***** annealing schedule externals ******/
extern float
depth;
/** T=depth / log(k) or T = (depth)kT(O) **/
/******* functions ***********/
extern int
extern float
extern int
step_A3(),step_A10(),step_A5()0;
anneal_schedule();
inito,inputo,output();
extern int
extern int
timeO,results();
update stats();
extern
extern
extern
extern
best_history[] [MAXNODES];
soln_history[] [MAXNODES];
cost_history[][40];
uphill_history[][40];
int
int
float
float
#include
#include
/** ALGORITHM A main steps */
"defs.c"
"external.c"
/************************* BEGIN OF GLOBAL.C ******************/
/**
Allocate storage space for global parameters
**/
I****************************************************************I
int
int
initruntime, final_run_time, elapsed_run_time;
npe;
float
max_ito;
int
int
int
max_subtours;
num_nodes,num_edges;
num_runs, run_num;
float
depth;
int
int
int
seed, mapseed;
sample_pts;
SamplePts;
struct tsp cc_struct central_coord;
/** 1 central coordinator */
struct tsp_pe_struct proc_pe[MAXSUBTOURS]; /** p processing elements **/
struct dsp_map_struct
map_nodes[MAXNODES]; /** coordinator per node **/
int best_history[MAXNODES][MAXNODES];
int soln_history[MAXNODES][MAXNODES];
float cost_history[MAXNODES][40];
-185-
Appendix A: Simulation Program
M. Tran
float uphill_history[MAXNODES] [40];
/******************* END OF GLOBAL.C ***********************I
/**********************
BEGIN OF IN.DAT ***************/
10 50 500.0
10 10
5.0
1 100
#define INDAT
#ifdef INDAT
printf('%\,t npe \t num_nodes \t max_ito\n");
printf('t\t %d \t\t %d \t\t %f\n",npe,num_nodes,max_ito);
printf('"t\t mapseed \t seed \n");
printf('"t\t %d \t\t\t %d\n",mapseed,seed);
printf('ttt depth\n");
printf('"\tt %fAn",depth);
printf("num_runs\t sample_pts\n");
printf("%d\t %d~\",num_runs,sample_pts);
#endif
********************* END OF IN.DAT ******************/
#include
<stdio.h>
#include
<math.h>
"defs.c"
#include
#include
"external.c"
/************************* BEGIN OF INPUT.C *****************/
/** Inputs variables from file in.dat
**/
input()
{
struct pair struct
*prs; /* pointers to structures */
FILE *fopeno, *fp; /* fp=ptr to FILE = type */
static char infile[] = "in.dat";
/** get input via file name "in.dat" **/
/* fp=fopen(name,mode) */
if( (fp = fopen(infile,"r")) == NULL){ /* Std i/o library function */
printf('"n error on reading input file! \n");
exit();
-186-
Appendix A: Simulation Program
M. Tran
fscanf(fp,"%d %d %f\n",
&npe,
&num_nodes,
&max_ito);
fscanf(fp,"%d %d\n",
&mapseed,
&seed);
fscanf(fp,"%f\n", &depth);
fscanf(fp,"%d %d\n",
&numruns,
&sample_pts);
/** maximum # of processors **/
/** number of nodes in TSP */
/** stopping criterion on iterations */
/** seed for map generation **/
/** seed for random number generator*/
/** annealing schedule, d **/
/** # of runs of program **/
/** # of sample points taken **/
fclose(fp);
#define INDEBUG
#ifdef INDEBUG
printf('\t INPUT DATA \n\n");
printf('"týt npe \t num_nodes \t max_ito\n");
printf('"V\tt %d'\t\t %d \t\t %f \n",
npe,num_nodes,max_ito);
printf('"\ft mapseed \t seed \n");
printf('Nt %d t\t %d\n",mapseed,seed);
printf('l\\t depth \n");
printf('"\.t %f\n",depth);
t num_runs\t
printf(' A\t
sample_pts \n ");
printf("\t\t %d \tNt %d\n ",num_runs,sample_pts);
#endif
return;
I
/**************************
#include
#include
#include
#include
END OF INPUT.C *
<stdio.h>
<math.h>
"defs.c"
"external.c"
/**************************** BEGIN OF MAP.C *****************/
/** this subroutine generates a random vertex map in a 50x50 square
**/
/**************************************************************/
make_map()
struct dspmapstruct
struct dsp-map-struct
*map; /* map pts to dsp_map_struct */
-187-
M. Tran
int
Appendix A: Simulation Program
i,node, nodel, node2;
float chg_x, chg_y, dist;
double
sqrt(), drand48();
static char mapfl] = "map.dat";
FILE *fp, *fopen();
/*** for every node or city; num_nodes = max # of cities in TSP ***/
for( node = 0; node < num_nodes; node++ )(
/**pointer to map structure **/
map = &map_nodes[node];
/**get a random x coordinate for a node **/
map->x_coord = drand48(mapseed) * num_nodes;
/**get a random y coordinate for a node **/
map->y_coord = drand48(mapseed) * num_nodes;
#define XYDEBUG
#ifdef XYDEBUG
if ( num_nodes < MAXNODES )(
if( (fp = fopen(mapf,"w") ) == NULL )(
printf(" map.dat failed to open\n ");
exit();
/*** for every node or city; num_nodes = max # of cities in TSP ***/
fprintf(fp,'\nNODE NUMBER, X AND Y COORDINATE POSITION\n\n");
fprintf(fp,"Node Number \t X Coordinate \t Y Coordinate \n");
for( node = 0; node < numnodes; node++ ){
/**pointer to map structure **/
map = &map_nodes[node];
/**get a random x coordinate for a node **/
map->x_coord = drand48(mapseed) * num_nodes;
/**get a random y coordinate for a node **/
map->y_coord = drand48(mapseed) * num_nodes;
fprintf(fp,"%d \t %.2f \t %.2f\n",node,map->x_coord,map->y_coord);
}
#endif XYDEBUG
#ifdef MAPDEBUG
fprintf(fp,"\n DISTANCE MATRIX \n\n");
-188-
M. Tran
Appendix A: Simulation Program
i= 0;
for(nodel = 0; node 1 < num_nodes; nodel++) {
for(node2 = 0; node2 < num_nodes; node2++){
chg_x = map_nodes[nodel].x_coord map_nodes[node2].x_coord;
chg_y = map_nodes[nodel].y_coord map_nodes[node2].y_coord;
dist = sqrt( (chg_x * chgx) + (chg_y * chg_y) );
if( i !=0)
if( (i % 10) ==0 )
fprintf(fp,'"\n"); /**next line **/
fprintf(fp," %.2f ",dist);
i++;
I
fprintf(fp,' n\n");
)
#endif
fclose(fp);
}else{
printf('"\n MAP.C: num_nodes must be less than MAXNODES \n");
exit();
return;
/************************** END OF MAP.C **********************/
#include
#include
#include
#include
<stdio.h>
<math.h>
"defs.c"
"external.c"
/******************* BEGIN OF INIT.C ******************/
/** initialize all the necessary parameter for ALGORITHM A
init()
{
struct tsp cc_struct *cc;
struct tsppe_struct *pe;
double
drand48(, sqrt();
float xdelta, ydelta, distance;
int
node, i, node 1l,node2;
int
tour_loc[MAXNODES];
static char init_tourf[] = "init_tour.dat";
FILE *fp, *fopen();
-189-
**/
M. Tran
Appendix A: Simulation Program
cc = &central_coord;
seed += 10;
/** generate a random initial tour generating a # between 0 & num_nodes **/
/** and assigning it to each elt of the soln_tour[] **/
for (i = 0; i < num_nodes; i++)
tourloc[i] = NONODE;
i= 0;
while (i != num_nodes)(
node = drand48(seed) * num_nodes;
if ( tourloc[node] == 1 )
continue;
cc->soln_tour[node] = i;
tourloc[node] = 1;
i++;
/** calculate initial cost of tour by calculating the sum of the distances **/
/** of the map_nodes array whose indices are controlled by soln_tour[] **/
cc->cost_soln_tour = 0.0;
for( node = 0; node < num_nodes; node++){
nodel = cc->soln_tour[node]; /* a randomly generated node*/
node2 = cc->soln_tour[(node+ 1+num_nodes)%num_nodes];
/* []= nodel + 1 ==> num_nodes */
xdelta = map_nodes[nodel ].x_coord - map_nodes[node2].x_coord;
ydelta = map_nodes[nodel].y_coord - map_nodes[node2].y_coord;
distance = sqrt( (xdelta * xdelta) + (ydelta * ydelta) );
cc->cost_soln_tour += distance;
/* get the cost of the initial tour */
/** Save the cost of the initial and optimal tours for each run **/
cc->cost inittour[runnum] = cc->cost_soln_tour;
cc->cost_opt_tour[run_num] = cc->cost_soln_tour;
#ifdef INITDEBUG
if( (fp = fopen(init_tourf,"w") ) == NULL){
printf(" init_tour.dat failed to open \n");
exit();
}else{
fprintf(fp,'"n\n THE RANDOM INITIAL TOUR IS \nLn");
for (node = 0; node < num_nodes; node++)(
if (node != 0 )
if ((node % 10)== 0)
fprintf(fp,'\n"); /** next line **/
-190-
Appendix A: Simulation Program
M. Tran
fprintf(fp," %d\t",cc->soln_tour[node]);
fprintf(fp,'\n\n");
fprintf(fp,'\n\n TOTAL COST OF THE INITIAL TOUR = %f\n",cc->cost_soln_tour);
fclose(fp);
#endif
/** intialize the best_tour **/
cc->cost_besttour = cc->cost_solntour; /* Let best cost = initial cost */
for ( node = 0; node < num_nodes; node++ )
cc->best_tour[node] = cc->soln_tour[node];
/** initialize the initial statistics **/
cc->iteration = 1.0; /** initial time before any execution **/
opt tour_record();
update_stats();
/** initialize other variables **/
/* At iteration = n = 2 */
cc->iteration = 2.0;
cc->temperature = anneal_schedule(cc->iteration);
opt_tour_record();
num_edges = num_nodes;
/* Let num of edges = num of nodes */
max_subtours = npe; /* max_subtour = npe = P = max # of parallel processes */
return;
************************* END OF INIT.C ************************
#include
#include
#include
#include
<stdio.h>
<math.h>
"defs.c"
"external.c"
BEGIN OF TOUR1.C *********************/
/** stepA () tasks: (1). establishes a message channel beween the
**/
every
processor
**/
central
coordinator
and
/**
(2). select the annealing temperature
**/
/**
/**
(3). and partition the tsp tour into subtours
**/
/*********************
/**************************************************************/
-191-
Appendix A: Simulation Program
M. Tran
stepA 1()
{
struct tsp_cc_struct *cc;
struct tsp_pe_struct *pe;
i,k,proc, node,snode,nodel,node2,cardinality;
int
int iter, InitRanSeed, InitRanNode;
float xdelta,ydelta,distance,init_cost;
double sqrto, drand48();
int offset_tourloc [MAXNODES];
cc = &central_coord;
/** Generate the initial random node **/
iter = cc->iteration;
InitRanSeed = (iter + num_nodes)%num_nodes;
InitRanNode = drand48(seed + InitRanSeed)*num_nodes;
/** Offset the tour to begin at the initial random node **/
for ( i = 0; i < num_nodes; i++ ){
offsettour_loc[i] = cc->soln_tour[(InitRanNode + i + num_nodes)%num_nodes];
printf("offet_tour_loc[%d] = cc->soln_tour[%d] = %d\n",
i,( (InitRanNode + i + num_nodes)%num_nodes ),cc->soln_tour[i]);
/** Get the offset tour **/
for (i = 0; i < num_nodes; i++)
cc->soln_tour[i] = offset_tourloc[i];
/*** for each process in system, make a message between the central ***/
/*** coordinator and processes. Partition the tsptour into subtours.***/
node = 0;
cardinality = num_nodes/npe; /* cardinality = # nodes in subtour */
for( proc = 0; proc < npe; proc++ ){
pe = &proc_pe[proc];
pe->msg.state_subtour = INIT;
pe->msg.status = FREE;
pe->msg.cardinality = cardinality;
/** All nodes are available **/
for (k = 0; k < cardinality; k++)
pe->msg.node_state[k] = AVAILABLE;
/** Update the temperature at every iteration **/
-192-
Appendix A: Simulation Program
M. Tran
if ( cc->iteration < 2.0 )
cc->iteration = 2.0;
}else(
cc->temperature = pe->msg.temperature = anneal_schedule(cc->iteration);
}
if( pe->msg.temperature == ){
printf(" tourl.c: The temperature can not be ZERO\n");
exit();
}
/**partition the tsp tour into 'npe' or P subtours **/
for ( snode = 0; snode < cardinality; snode++){
node = ( node + num_nodes ) % num_nodes;
pe->msg.current subtour[snode] = cc->soln_tour[node];
node++;
/**compute the initial cost of the subtour **/
init_cost = 0.0;
for ( snode = 0; snode < cardinality-1; snode++)(
node 1 = pe->msg.current_subtour[snode];
node2 = pe->msg.current_subtour[snode+ 1];
xdelta = map_nodes[nodel].x_coord - map_nodes[node2].x_coord;
ydelta = map_nodes[nodel].y_coord - map_nodes[node2].y coord;
/**eucledian distance, cost of edge **/
distance = sqrt( ( xdelta * xdelta ) + ( ydelta * ydelta) );
init_cost += distance;
pe->msg.cost_subtour = init_cost;
#ifdef T1DEBUG
printf("\n\t TOUR1=A1(): msg number = %d\n",proc);
if ( pe->msg.state_subtour == INIT)
printf('Nt\t state_subtour = INIT\n");
if ( pe->msg.status == FREE )
printf('\t\t processor %d status = FREE\n",proc);
printf('Nti temperature = %f\n",pe->msg.temperature);
printf('\t\t cardinality = %d\n",pe->msg.cardinality);
printf('"t\t cost_subtour = %f\n",pe->msg.cost_subtour);
printf('\t\t subtour is ...
\n");
for ( i = 0; i < cardinality; i++)(
if( i !=0)
if( (i%10)== 0)
printf("\n");
printf("%d\t",pe->msg.current_subtour[i]);
}
printf('M"\n");
#endif T1DEBUG
-193-
m
M. Tran
Appendix A: Simulation Program
) /* for loop */
return;
S/* Step_A1() */
/**************** END OF TOUR1.C ***********************/
#include
<math.h>
#include
<stdio.h>
#include
"defs.c"
#include
"external.c"
/*********** annealing schedule computation **********/
**/
/** The following function will computes appropriate
/** different annealing schedules
/**
Input:
time or iteration
**/
/**
Output:
temperature value
**/
float anneal_schedule(curr_time)
float curr_time; /* curr_time = cc->iteration */
{
struct tsp_cc_struct *cc;
float prob,tempiter, anneal_tmp;
double
pow(),log10();
cc = &central_coord;
/****** Inverse natural log annealing T = d/logt ******/
anneal_tmp = depth / log10(curr_time);
/****** linear schedule with various scalings
if ( curr_time <= 2 ){
anneal_tmp = 20.0;
}else {
anneal_tmp = 20.0*pow(depth,curr_time);
return(anneal_tmp);
#include <stdio.h>
-194-
M. Tran
Appendix A: Simulation Program
#include <math.h>
#include "defs.c"
#include "external.c"
BEGIN OF SUBTOUR.C ************/
/* This subroutine performs STEP A3 of Algorithm A .
*/
/* Code for the p subtours or processors in the parallel computer;
*/
perform a Tij or Lij interchange, change in costs, interprocessors */
/*
/*
communication, and performs the annealing test.
*/
/***********************
step_A3()
int proc;
#ifdef CITYWISE
/** Step 3A for Citywise Exchanges and annealing in subtours **/
for( proc = 0; proc < max_subtours; proc++ )
citywise_exchange(&proc_pe[proc],proc);
#endif
#define EDGEWISE
#ifdef EDGEWISE
/** Step 3A Edgewise Exchanges and annealing in subtours **/
for( proc = 0; proc < max_subtours; proc++ )
edgewise_exchange(&proc_pe[proc],proc);
#endif
/**Step 3B for Interprocessor Communication **/
if (npe != 1 )
for( proc = 0; proc < max_subtours; proc++ )
Intercommunication(&proc_pe[proc],proc);
}
#ifdef CITYWISE
/** Step 3C for Citywise Exchanges and annnealing in subtours **/
to sort out the subtours
for ( proc = 0; proc < max_subtours; proc++)
citywise_exchange(&proc_pe[proc],proc);
#endif
#define EDGEWISE
#ifdef EDGEWISE
/** Step 3C for Edgewise Exchanges and annealing in subtours **/
for( proc = 0; proc < max_subtours; proc++ )
edgewise_exchange(&proc_pe[proc],proc);
#endif
-195-
Appendix A: Simulation Program
M. Tran
return;
/** Each processing element performs its Tij exchange and internal annealing **/
/*****************************************************************
citywise_exchange(pe,proc)
struct tsp_pe_struct *pe;
/* pointer to processor 'proc' data */
int
proc;
struclt tsp_cc_struct *cc;
structt dsp_map_struct
*map;
/* pointer to xy coordinates of ma p */
float xdelta,ydelta,distance;
/* map variables */
float chgjincost;
/* chg in cost for Lij or Tij */
float temp,accept_prob,rng;
doub le
sqrto,powo;
/* for the cost of subtours */
doub le
drand48(),exp();
/*returns a non-negative double-precisi
on floating-point values uniformly dis
tributed over the interval [0,1)*/
int i,j,node 1,node2,pairl ,pair2,node,next_node;
intcardinality;
/* cardinality of subtour */
int p,k,nodei,nodej,nodei_prime,nodej_prime;
intireservedjreserve,jreserved,proci,procj;
intjreserve_seed,nootheravail node;
cc = &central_coord;
pe = &proc_pe[proc];
cardinality = pe->msg.cardinality = num_nodes/npe;
/** NODEWISE EXCHANGE **/
for ( nodei = 0; nodei < cardinality; nodei++ ){
current_subtour_cost(&proc_pe[proc],proc);
/** set up for NODEWISE EXCHANGE **/
for ( i = 0; i < cardinality; i++)
pe->msg.perturb_subtour[i] = pe->msg.current_subtour[i];
nodej = drand48(seed) * cardinality;
while (nodei == nodej ) (
nodej = drand48(seed) * cardinality;
n
nodei_prime = ( ( nodei < nodej ) ? nodei : nodej ); /* min(nodei,nodej) */
nodejprime = ( ( nodei > nodej ) ? nodei : nodej ); /* max(nodei,nodej) */
for (k = 0; k < nodei; k++ )
pe->msg.perturb_subtour[k] = pe->msg.current_subtour[k];
-196-
Appendix A: Simulation Program
M. Tran
for ( k = nodei_prime; k <= nodej_prime ;k++)
pe->msg.perturb_subtour[k] = pe->msg.current_subtour[k];
for ( k = nodej_prime+1; k < cardinality; k++ )
pe->msg.perturb_subtour[k] = pe->msg.current_subtour[k];
#ifdef SUBTOURS
printf(" INTRASUBTOUR RESULTS for processor %d\n",proc);
current_subtours(&proc_pe[proc],proc);
#endif
/** compute cost of (ordered) permuted_subtour[] **/
perturbed_subtour_cost(&proc_pe[proc],proc);
/** ANNEALING THE SUBTOUR **/
intra_subtour anneal(&proc_pe[proc],proc);
#ifdef SUBDEBUG
printf(" INTRASUBTOURS RESULTS for processor %d\n",proc);
subtour_result(&proc_pe[proc],proc);
#endif
)/* nodei */
#ifdef SUBDEBUG
printf(" INTRASUBTOURS RESULTS for processor %d\n",proc);
subtour_result(&proc_pe[proc],proc);
#endif
return;
/** Each processing element performs its Lij exchange and internal annealing **/
/1****************************************************************
edgewise_exchange(pe,proc)
struct tsp_pe_struct *pe;
int
proc;
/* pointer to processor 'proc' data */
struct tsp_ccstruct *cc;
/* pointer to xy coordinates of map */
struct dspmap_struct
* ma]p;
float xdelta,ydelta,distance;
/* map variables */
float chgincost;
/* chg in cost for Lij or Tij */
float temp,accept_prob,rng;
double
sqrtO,pow();
/* for the cost of subtours */
double
drand48(),exp();
/*returns a non-negative double-precisi
-197-
M. Tran
int
int
int
int
int
Appendix A: Simulation Program
on floating-point values uniformly dis
tributed over the interval [0,1)*/
i,j,node 1,node2,pair 1,pair2,node,next_node;
cardinality;
/* cardinality of subtour */
p,k,nodei,nodej,nodei_prime,nodej_prime;
ireserved,jreservejreserved,proci,procj;
jreserve_seed,no_other_avail_node;
cc = &central_coord;
pe = &proc_pe[proc];
cardinality = pe->msg.cardinality = num_nodes/npe;
for ( nodei = 0; nodei < cardinality; nodei++ ){
current_subtour_cost(&proc_pe[proc],proc);
/** set up for EDGEWISE EXCHANGE **/
for ( i = 0; i < cardinality; i++)
pe->msg.perturb_subtour[i] = pe->msg.current_subtour[i];
nodej = drand48(seed) * cardinality;
while( nodei == nodej )(
nodej = drand48(seed) * cardinality;
I
nodei_prime = (( nodei < nodej ) ? nodei : nodej ); /* min(nodei,nodej) */
nodejprime = ( ( nodei > nodej ) ? nodei : nodej );/* max(nodei,nodej) */
for ( k = 0; k < nodei; k++ )
pe->msg.perturb_subtour[k] = pe->msg.currentsubtour[k];
for ( k = 0; k <= ( nodej_prime - nodei_prime );k++)
pe->msg.perturb_subtour[nodei_prime+k] =
pe->msg.current_subtour[nodej_prime-k];
for ( k = nodej_prime+1; k < cardinality; k++ )
pe->msg.perturb_subtour[k] = pe->msg.current_subtour[k];
#ifdef SUBTOURS
printf(" INTRASUBTOUR RESULTS for processor %d\n",proc);
current_subtours(&proc_pe[proc],proc);
#endif
/** compute cost of (ordered) permutedsubtour[] **/
perturbed_subtourcost(&proc_pe[proc],proc);
/** ANNEALING THE SUBTOUR **/
intra_subtour_anneal(&proc_pe[proc],proc);
#ifdef SUBDEBUG
printf(" INTRASUBTOURS RESULTS for processor %d\n",proc);
subtourresult(&proc_pe[proc],proc);
#endif
} /* nodei */
-198-
Appendix A: Simulation Program
M. Tran
#ifdef SUBDEBUG
printf(" INTRASUBTOURS RESULTS for processor %d\n",proc);
subtour_result(&proc_pe[proc],proc);
#endif
return;
}
/** INTERPROCESSOR COMMUNICATION **/
Intercommunication(pe,proc)
struct tsp_pe_struct *pe;
/* pointer to processor 'proc' data */
int
proc;
{
structt tspccstruct *cc;
structt tsp_pe_struct *pei;
structt tsp_pe_struct *pej;
struct dsp_map_struct
/*pointer to xy coordinates of ma p */
*map;
float xdelta,ydelta,distance;
/*map variables */
float chgjincost;
/*chg in cost for Lij or Tij */
float temp,accept_prob,mg;
doub le
sqrto,powo;
/* for the cost of subtours */
doub le
drand48(),exp();
/*returns a non-negative double-precisi
on floating-point values uniformly dis
tributed over the interval [0,1)*/
int i,j,nodel ,node2,pairl ,pair2,node,next_node;
intcardinality;
/*cardinality of subtour */
int p,k,nodei,nodej,nodei_prime,nodejprime;
int ireserved,jreserve,jreserved,proci,procj;
intjreserve_seed,noother_avail_node;
cc = &central_coord;
pe = &proc_pe[proc];
cardinality = pe->msg.cardinality = num_nodes/npe;
proci = proc; pei = &proc_pe[proci];
for ( ireserved = 0; ireserved < cardinality; ireserved++ ){
wait_with_timeout(&proc_pe[proci]);
pei->msg.nodestate[ireserved] = RESERVED;
pei->msg.reserved_node[ireserved] = pei->msg.current_subtour[ireserved];
pei->msg.status = FREE;
for ( procj = 0; procj < npe; procj++ )
if ( procj != proci ){
pej = &proc_pe[procj];
-199-
M. Tran
Appendix A: Simulation Program
wait_with_timeout(&proc_pe[procj]);
/** Exchange reserved_node[ireserved] with 'cardinality' randomly
generated jreserved nodes **/
for ( j = 0; j < cardinality; j++ )(
for ( jreserve = 0; jreserve < cardinality; jreserve++ )
if ( pej->msg.nodestate[jreserve] != RESERVED ){
jreserve_seed = seed;
jreserved = drand48(jreserve_seed)*cardinality;
no_other_avail_node = 0;
while( (jreserved < jreserve) II
(pej->msg.nodestate[jreserved] == RESERVED) )(
jreserve_seed += 5;
jreserved = drand48(jreserve_seed)*cardinality;
if ( no_other_avail_node == num_nodes ) (
printf(" proc %d may not has other available node\n",procj);
jreserved = jreserve;
break;
no_other_avail_node += 1;
) /** while **/
pej->msg.node_stateljreserved] = RESERVED;
pej->msg.reserved_node[jreserved] =
pej->msg.current_subtour[jreserved];
#ifdef SUB
printf("pei->msg.reservednode[%d] = %d of processor %d\n",
ireserved, pei- >msg.reserved_node [ire served] ,proci);
printf("pej->msg.reserved_node[%d] = %d of processor %d\n\n",
jreserved, pej->msg.reserved_node [jreserved],procj);
#endif
break;
) /** if pej**/
if (pej->msg.node_state[cardinality- 1] == RESERVED)(
printf(" No available node in processor %d\n",procj);
goto NextProcessor;
i
S/** jreserve **/
-200-
n
Appendix A: Simulation Program
M. Tran
pej->msg.status = FREE;
wait_with_timeout(&proc_pe[proci]);
/************ pei SUBTOUR ANNEALING *************/
/** Set up to perturb the current_subtour **/
for ( k = 0; k < cardinality; k++ )
pei->msg.perturb_subtour[k] = pei->msg.current_subtour[k];
wait_with_timeout(&proc_pe[procj]);
/** Nodewise Exchange **/
pei->msg.perturb_subtour[ireserved] =
pej->msg.reserved_node[jreserved];
#ifdef SUB
printf('\n/** Nodewise Exchange **An ");
printf("PEi: current and perturbed subtours \n");
current_subtours(&proc_pe[proci],proci);
#endif SUB
pej->msg.status = FREE;
/** Get the current cost of the subtour **/
current_subtour_cost(&proc_pe[proci],proci);
/** Get the perturb cost of the subtour **/
perturbed_subtour cost(&proc_pe[proci],proci);
/** Annealing the subtour **/
inter_subtour_anneal(&proc_pe[proci],proci);
if ( pei->msg.state_subtour == ACCEPTED )
#ifdef SUB
if ( pei->msg.state_subtour == ACCEPTED )
printf('\npei->msg.state_subtour = ACCEPTED\n ");
printf("pei: current and perturbed subtours \n");
current_subtours(&proc_pe[proci],proci);
#endif SUB
wait_with_timeout(&proc_pe[procj]);
/************ pej SUBTOUR ANNEALING *************/
/** Set up to perturb the current_subtour **/
for (k = 0; k < cardinality; k++ )
pej->msg.perturb_subtour[k] = pej->msg.current_subtour[k];
/** Nodewise Exchange **/
pej->msg.perturb_subtour[jreserved] =
-201-
Appendix A: Simulation Program
M. Tran
pei->msg.reserved_node[ireserved];
#ifdef SUB
printf('"\n/** Nodewise Exchange **/\n ");
printf("PEj: current and perturbed subtours \n");
current_subtours(&proc_pe[procj],procj);
#endif SUB
pei->msg.status = FREE;
/** Get the current cost of the subtour **/
current_subtour_cost(&proc_pe[procj],procj);
/** Get the perturb cost of the subtour **/
perturbedsubtour_cost(&proc_pe[procj],procj);
/** Annealing the subtour **/
inter_subtour_anneal(&proc_pe[procj],procj);
if ( pej->msg.state_subtour == ACCEPTED ){ /** pej ACCEPTED **/
#ifdef SUB
if ( pej->msg.state_subtour == ACCEPTED )
printf('Nnpej->msg.state_subtour = ACCEPTED\n ");
printf("pej: current and perturbed subtours: \n");
current_subtours(&procpe[procj],procj);
#endif SUB
wait_with_timeout(&proc_pe[proci],proci);
#ifdef SUB
printf('1n\nPEi: BEFORE THE ACTUAL EXCHANGE \n");
printf("PEi: current and perturbed subtours \n");
current_subtours(&proc_pe[proci],proci);
#endif SUB
/************ ACTUAL EXCHANGE for Pei *************/
/* get the accepted subtour cost */
pei->msg.state_subtour = ACCEPTED;
pei->msg.cost_subtour = pei->msg.cost_perturb_subtour,
/* update the current_subtour[] */
cardinality = pei->msg.cardinality;
for ( k = 0; k < cardinality; k++ ) (
pei->msg.current_subtour[k] = pei->msg.perturb_subtour[k];
pei->msg.node_state[k] = AVAILABLE;
/** Set Up for the next NODEWISE EXCHANGE **/
pei->msg.node_state[ireserved] = RESERVED;
pei->msg.reserved_node[ireserved] =
-202-
__
M. Tran
Appendix A: Simulation Program
pei->msg.current_subtour[ireserved];
#ifdef SUBDEBUG
printf('\t PEi: AFTER THE ACTUAL EXCHANGE \n ");
printf('Nt INTERSUBTOUR RESULTS for (ith) processor %d :\n",proci);
subtour_result(&proc_pe[proci],proci);
#endif
#ifdef SUB
printf('nn\nPEj: BEFORE THE ACTUAL EXCHANGE \n");
printf("PEj: current and perturbed subtours: \n");
current_subtours(&proc_pe[procj],procj);
#endif SUB
/************** ACTUAL EXCHANGE FOR Pej *************
pej->msg.state_subtour = ACCEPTED;
/*get the accepted subtour cost */
pej->msg.cost_subtour = pej->msg.cost_perturbsubtour;
/* update the current_subtour[] */
cardinality = pej->msg.cardinality;
for ( k = 0; k < cardinality; k++ ){
pej->msg.current_subtour[k] = pej->msg.perturb_subtour[k];
pej->msg.node_state[k] = AVAILABLE;
I
#ifdef SUBDEBUG
printf('\t PEj: AFTER THE ACTUAL EXCHANGE \n ");
printf('.t INTERSUBTOUR RESULTS for (jth) processor %d :\n",procj);
subtour_result(&proc_pe[procj],procj);
#endif
)else( /**pej NOTACCEPTED **/
pej->msg.state_subtour = NOTACCEPTED;
/**Pej wants the reserved node back **/
pej->msg.current_subtour[jreserved] =
pej->msg.reserved_node[jreserved];
/** Made all nodes of pej available for Nodewise Exchange **/
for ( k = 0; k < cardinality; k++)
pej->msg.nodestate[k] = AVAILABLE;
-203-
Appendix A: Simulation Program
M. Tran
/** Pej NOTACCEPTED, so set up for the next NODEWISE EXCHANGE **/
pei->msg.state_subtour = NOTACCEPTED;
for ( k = 0; k < cardinality; k++ )
pei->msg.node_state[k] = AVAILABLE;
/** Get the reserved node back **/
pei->msg.nodestate[ireserved] = RESERVED;
pei->msg.current_subtour[ireserved] =
pei->msg.reserved_node[ireserved];
#ifdef SUBDEBUG
printf(" \t INTERSUBTOUR RESULTS for (ith) processor %d :\n",proci);
subtour_result(&proc_pe[proci],proci);
#endif
#ifdef SUBDEBUG
printf(" \t INTERSUBTOUR RESULTS for (jth) processor %d :\n",procj);
subtour_result(&proc_pe[procj],procj);
#endif
}/** pej NOTACCEPTED **/
)else{ /** pei NOTACCEPTED **/
pei->msg.state_subtour = NOTACCEPTED;
/** Set up for next NODEWISE EXCHANGE with another processor **/
for ( k = 0; k < cardinality; k++ )
pei->msg.node_state[k] = AVAILABLE;
/** Get the reserved node back **/
pei->msg.node_state[ireserved] = RESERVED;
pei->msg.current_subtour[ireserved] =
pei->msg.reservednode[ireserved];
/** Pei NOTACCEPTED, so set up for the next NODEWISE EXHANGE **/
pej->msg.state_subtour = NOTACCEPTED;
/** Get the reserved node back **/
pej->msg.current_subtour[jreserved] =
pej->msg.reserved_node[jreserved];
/** Made all nodes available for Exchange **/
for ( k = 0; k < cardinality; k++ )
pej->msg.node_state[k] = AVAILABLE;
-204-
n
M. Tran
Appendix A: Simulation Program
#ifdef SUBDEBUG
printf(" \t INTERSUBTOUR RESULTS for (ith) processor %d :\n",proci);
subtour_result(&proc_pe[proci],proci);
#endif
#ifdef SUBDEBUG
printf(" \t INTERSUBTOUR RESULTS for (jth) processor %d :\n",procj);
subtour_result(&proc_pe[procj],procj);
#endif
S/** pei NOTACCEPTED **/
S/** forj **/
) /* if procj **/
NextProcessor: { ; ]
/** Last processor had no available node. Check next processor **/
} /**for procj **/
) /** for ireserved **/
return;
}
/**********************
FUNCTIONS FOR SUBTOUR ******************/
wait_with_timeout(pe)
struct tsp_pe_struct *pe;
static int time_out = 0;
/** the processor must have been Busy with the present task **/
while( (pe->msg.status == BUSY) && (time_out < 5) )
time_out += 1;
/** do nothing **/
/** Get the processor BUSY for the next task **/
return(pe->msg.status = BUSY);
}
/** Compute the cost of the current suibtour **/
current_subtour_cost(pe,proc)
/* pointer to processor 'proc' data */
struct tsp_pe_struct *pe;
int
proc;
*map;
struct dsp_map_struct
float xdelta,ydelta,distance;
/* pointer to xy coordinates of map */
/* map variables */
-205-
Appendix A: Simulation Program
M. Tran
float current_cost;
/* cost for a Tij interchange */
double
sqrtO, powo;
/* for the cost of subtours */
int
int
nodel, node2, node;
cardinality;
/* cardinality of subtour */
pe = &proc_pe[proc];
cardinality = pe->msg.cardinality = num_nodes/npe;
current_cost = 0.0;
for( node = 0; node < cardinality-1; node++ ){
/* compute cost of subtour */
node 1 = pe->msg.current_subtour[node]; /* these msg.array are nodes */
node2 = pe->msg.current_subtour[(node+ 1+
inlity)%cardinality];
/* or coordinates if map_nodes[]*/
/* from Al()=tourl.c */
xdelta = map_nodes[nodel].x_coord - map_nodes[node2].x_coord;
ydelta = mapnodes[nodel].y_coord - map_nodes[node2].y_coord;
/** eucledian distance, cost of edge */
distance = sqrt( (xdelta * xdelta) + (ydelta * ydelta) );
current_cost += distance; /* total cost of current_subtour */
pe->msg.cost_curr_subtour = current_cost;
return;
I
/** Compute the perturbed subtour cost **/
perturbed subtour_cost(pe,proc)
/* pointer to processor 'proc' data */
struct tsp_pe_struct *pe;
int
proc;
struct
float
float
float
double
dsp_map_struct
*map;
/* pointer to xy coordinates of map */
xdelta, ydelta, distance;
/* map variables */
/* cost for a Tij interchange */
perturb_cost;
chgincost;
/* chg in cost for Lij or Tij */
sqrto,powO;
/* for the cost of subtours */
int
node 1l,node2,node;
int
cardinality;
/* cardinality of subtour */
pe = &proc_pe[proc];
cardinality = pe->msg.cardinality = num_nodes/npe;
/** compute cost of (ordered) permuted_subtour[] **/
perturb_cost = 0.0;
for( node = 0; node < cardinality-1; node++ ){
node 1 = pe->msg.perturb_subtour[node];
node2 = pe->msg.perturb_subtour[node+1];
xdelta = map_nodes[nodel].x_coord - map_nodes[node2].x_coord;
ydelta = map_nodes[node 1l].y_coord - map_nodes[node2].y_coord;
-206-
M. Tran
Appendix A: Simulation Program
/** cost of an ordred edge of the permuted_subtour **/
distance = sqrt( (xdelta * xdelta) + (ydelta * ydelta) );
perturb_cost += distance; /* total cost of permuted_subtour */
}
pe->msg.cost_perturb_subtour = perturb_cost;
return;
}
/**** ANNEALING INSIDE A GIVEN SUBTOUR ****/
intra_subtour_anneal(pe,proc)
/* pointer to processor 'proc' data */
struct tsp_pe_struct *pe;
int
proc;
I
struct tsp_cc_struct *cc;
struct dsp_map_struct
*map;
float xdelta,ydelta,distance;
float
chgincost;
/* pointer to xy coordinates of map */
/* map variables */
/* chg in cost for Lij or Tij */
float temp,acceptprob,mg;
double
sqrto,powo;
/* for the cost of subtours */
double
drand48(),exp();
/*returns a non-negative double-precisi
on floating-point values uniformly dis
tributed over the interval [0,1)*/
int
int
i,node 1,node2,node;
cardinality;
/* cardinality of subtour */
pe = &proc_pe[proc];
cc = &central_coord;
/************ ANNEALING ************************/
/** if the new change in cost, swtch_cost, < 0, **/
/** then accept it and otherwise => annealing **/
mg = drand48(seed); /*returns non-negative double-precision floating-point*/
/*values uniformly distributed over the interval [0,1)*/
pe->msg.mg = rng; /* to be used as r = exp {-(chgjincost)/Tn) condition */
chgin_cost = pe->msg.cost_perturb_subtour - pe->msg.cost_curr_subtour;
/* difference in cost */
temp = pe->msg.temperature;
if (temp == 0 ){
printf(" ERROR===>CAN'T DIVIDE A ZERO TEMPERATURE !!!\n");
exit();
I
accept_prob = ( (chgincost > 0 ) ? exp(-chgjincost/temp)
: exp(chgjincost/temp));
pe->msg.accept prob = accept_prob;
if ( (pe->msg.cost_perturb_subtour < pe->msg.cost_curr_subtour)
II(rng < accept_prob) ){
-207
Appendix A: Simulation Program
M. Tran
pe->msg.state_subtour = ACCEPTED; /** accept the perturbed tour **/
pe->msg.cost_subtour = pe->msg.cost_perturb_subtour,
/* get the accepted subtour cost */
pe->msg.chg incost = chgin_cost; /* get the decremental cost */
/* update the current_subtour[] */
cardinality = pe->msg.cardinality;
for ( i = 0; i < cardinality; i++ )
pe->msg.current_subtour[i] = pe->msg.perturb_subtour[i];
}else{
pe->msg.state_subtour = NOTACCEPTED;
pe->msg.cost_subtour = pe->msg.cost_curr_subtour;
pe->msg.chgincost = 0.0;
I
return;
/**** ANNEALING BETWEEN SUBTOURS ****/
inter_subtouranneal(pe,proc)
/* pointer to processor 'proc' data */
struct tsp_pe_struct *pe;
int
proc;
{
struct tspccstruct *cc;
*map;
struct dsp_map_struct
float xdelta,ydelta,distance;
float
chgin_cost;
/* pointer to xy coordinates of map */
/* map variables */
/* chg in cost for Lij or Tij */
float temp,accept_prob,mg;
double
/* for the cost of subtours */
sqrto,powo;
double
/*returns a non-negative double-precisi
drand48(),exp();
on floating-point values uniformly dis
tributed over the interval [0,1)*/
int i,nodel,node2,node;
int c;ardinality;
/ * cardinality of subtour */
pe = &proc_pe[proc];
cc = &central_coord;
/************ ANNEALING ************************/
/** if the new change in cost, swtch_cost, < 0, **/
/** then accept it and otherwise => annealing **/
mg = drand48(seed); /*returns non-negative double-precision floating-point*/
/*values uniformly distributed over the interval [0,1)*/
pe->msg.mg = rug; /* to be used as r = exp {-(chgincost)/Tn } condition */
-208-
m
M. Tran
Appendix A: Simulation Program
chgjin_cost = pe->msg.costperturb_subtour - pe->msg.cost_curr_subtour;
/* difference in cost */
temp = pe->msg.temperature;
if ( temp == 0 )
printf(" ERROR==>CAN'T DIVIDE A ZERO TEMPERATURE !!!\n");
exit();
}
accept_prob = ( (chgincost > 0 ) ? exp(-chgincost/temp)
: exp(chgjincost/temp));
pe->msg.acceptprob = accept_prob;
if ( (pe->msg.cost_perturb_subtour < pe->msg.cost_curr_subtour)
II(mg < accept_prob) ){
pe->msg.state_subtour = ACCEPTED; /** accept the perturbed tour **/
pe->msg.chgincost = chgincost; /* get the decremental cost */
}else{
pe->msg.state_subtour = NOTACCEPTED;
pe->msg.chgincost = chgjincost; /* get the decremental cost */
return;
/** States of the tours **/
current_subtours(pe,proc)
struct tsp_pe_struct *pe;
int
{
int
/* pointer to processor 'proc' data */
proc;
i,cardinality;
pe = &proc_pe[proc];
cardinality = pe->msg.cardinality;
printf(" the current subtour is....\n");
for ( i = 0; i < cardinality; i++ )
printf(" %d\t",pe->msg.current_subtour[i]);
if(i !=0)
if ( (i% 10) == 0)
printf("\n");
/** new line **/
printf("hn the perturbed subtour is ....\n");
for( i = 0; i < cardinality; i++ ){
-209-
M. Tran
Appendix A: Simulation Program
printf(" %d\t",pe->msg.perturb_subtour[i]);
if( i !=0)
if((i% 10) == 0)
printf('\n");
/** new line **/
printf('\n");
return;
/** Results of the subtours **/
subtour_result(pe,proc)
struct tsp_pe_struct *pe;
int
proc;
/* pointer to processor 'proc' data */
I
int
i,cardinality;
pe = &proc_pe[proc];
cardinality = pe->msg.cardinality;
printf('t SUBTOUR RESULT: for processor %d ",proc);
if ( pe->msg.status == FREE)
printf(" FREE \n");
else printf(" BUSY \n");
printf('%t\t RNG = %f\n",pe->msg.rng);
printf('1t\t PWR = %f\n",pe->msg.acceptprob);
printf('\t\t temperature = %f\n t",pe->msg.temperature);
printf('Nt\t cardinality = %d\n",pe->msg.cardinality);
printf('Nt\t current_cost = %f\n",pe->msg.cost_curr_subtour);
printf('t•t perturb_cost = %f\n",pe->msg.cost_perturbsubtour);
printf('Nt\t cost_subtour = %f\n",pe->msg.cost_subtour);
printf('Nt\t chgin cost = %f\n",pe->msg.chgin cost);
if ( pe->msg.chgin_cost < 0.0 )
printf('"\t CHG IN COST < 0\n");
if ( pe->msg.state_subtour == INIT)
printf('"t\t subtour_state = INIT\n");
else if ( pe->msg.state_subtour == ACCEPTED )
printf('tAn subtour_state = ACCEPTED\n");
else printf( '%tVn subtour_state = NOT ACCEPTED\n");
printf(" the new subtour is....\n");
for ( i = 0; i < cardinality; i++ ){
if( i != 0 )
if((i% 0) == 0)
printf('\n");
/** new line **/
-210-
M. Tran
Appendix A: Simulation Program
printf(" %d\t",pe->msg.current_subtour[i]); /* from Al) */
}
printf('"n\n");
return;
I
#include
#include
#include
#include
<stdio.h>
<math.h>
"defs.c"
"external.c"
/**************** BEGIN OF TOUR2.C ********************/
/** The following function performs steps A5, A6, and A7
**/
/** Tasks: (1). Reconstruct the new tour from the subtours
**/
(2). Compute the cost of the new tour
/**
/**
(3). Perform the global annealing
**/
/**
(4). Check for stopping condition
**/
int step_A5()
struct tsp_cc_struct *cc;
struct tsp_pe_struct *pe;
int
i,iter,proc,cardinality,rotate_tour;
int
node,node 1,node2;
float xdelta,ydelta,distance;
float tempcc,acceptprobcc,rngcc;
float chgincostcc, min_cost;
double drand48();
cc = &central_coord;
/** read the processor updates and reconstruct the tour **/
reconstruct_tour();
/****************** ANNEALING GLOBALLY ***************/
/** compute the cost of the new tour **/
cost_current_tour();
-211-
Appendix A: Simulation Program
M. Tran
/** Find the best cost of the tour and the best tour **/
global_annealing();
#ifdef TOURDEBUG
tour_results();
#endif
/**** record of the optimal tour ****/
opt tour_record();
#define T2DEBUG
#ifdef T2DEBUG
if(cc->cost_best_tour < cc->cost_opt_tour[run_num])
tour_results();
#endif T2DEBUG
/** increment iterations and call statistics **/
update_stats(); /* update the statistical data of the tour */
cc->iteration += 1.0; /** next iteration **/
/** Set up for the next iteration **/
if ( cc->state_tour != ACCEPTED )
for (i = 0; i < num_nodes; i++)
cc->soln_tour[i] = cc->best_tour[i];
/** check for stopping criterion: stepA7 & step_A8 **/
/** # of iterations > maximum # of iterations **/
if( cc->iteration > max_ito)
return(TRUE);
return(FALSE);
/** continue to iterate **/
S/** tsp **/
/************************** END TOUR2.C ************************I
/************ FUNCTIONS CALLED BY STEP_A5() BEGIN ************/
/** reconstruct the entire TSP tour from updated subtours **/
reconstruct_tourO
struct tsp_cc_struct *cc;
struct tsp_pe_struct *pe;
int
int
i,cardinality;
node,proc;
-212-
Appendix A: Simulation Program
M. Tran
cc = &central_coord;
/** read the processor updates and reconstruct the tour **/
node = 0;
cardinality = num_nodes/npe;
/** reconstruct the TSP tour **/
for ( proc = 0; proc < npe; proc++ ){
pe = &proc_pe[proc];
for (i = 0; i < cardinality; i++ ){
node = ( node + num_nodes ) % num_nodes;
cc->soln_tour[node++] = pe->msg.current_subtour[i];
)
return;
/**************** cost of the current tour **************/
cost_current_tourO
{
struct tsp_cc_struct *cc;
struct tsp_pe_struct *pe;
int
proc, node,nodel,node2;
float xdelta,ydelta,distance;
cc = &central_coord;
cc->cost_soln_tour = 0;
for (proc = 0; proc < npe; proc++)
pe = &procpe[proc];
cc->cost_soln_tour += pe->msg.costsubtour;
I
return;
/*************** Global Annealing of the TSP tour **************/
global_annealing()
struct tspccstruct *cc;
int
i
float tempcc,acceptprobcc,rngcc;
float chgincostcc;
double drand48();
cc = &central_coord;
cc->rngcc = rngcc = drand48(seed);
chgjin_costcc = cc->chgincostcc = cc->cost_soln_tour - cc->cost_best_tour;
tempcc = cc->temperature;
if ( tempcc == 0 ){
printf(" ERROR CAN'T DIVIDE ZERO TEMPERATURE!! !\n");
-213-
M. Tran
Appendix A: Simulation Program
exit();
accept_probcc = ( (chgjincostcc > 0) ? exp(-chgincostcc/tempcc)
:exp(chg.incostcc/tempcc));
cc->accept_probcc = accept_probcc;
if( (cc->cost_soln_tour < cc->cost_best_tour) II(rngcc < accept_probcc) ) {
cc->state_tour = ACCEPTED;
cc->notacc_cnt = 0;
cc->chgLincostcc = chgin_costcc;
cc->cost_best_tour = cc->cost_soln_tour;
/** Update the TSP tour **/
for( i = 0; i < num_nodes; i++)
cc->best_tour[i] = cc->solntour[i];
}else{
cc->state_tour = NOTACCEPTED;
cc->not_acc_cnt += 1;
cc->chg incostcc = chgjin_costcc;
return;
)
/********************* update_stats()O
update_stats()
**************/
*data;
struct stats_struct
struct tsp_ccstruct *cc;
int
i, j;
unsigned itose, k, offsets, zoos, 1;
FILE *fopenO, *fp;
static char bestf[]="best.dat";
int
ito;
cc = &central_coord;
if( cc->iteration == 1 ){
SamplePts = 0;
cc->state_tour = INIT;
tour_results();
itose = cc->iteration;
k = max_ito / sample_pts;
if( ((itose % k) == 0) II(itose == 1) )(
/* history of the cost of best_tour */
-214-
Appendix A: Simulation Program
M. Tran
cost_history[SamplePts][run_num] = cc->cost_best_tour;
/* history of the cost of solntour */
uphill_history[SamplePts] [run_num] = cc->cost_soln_tour;
/******* history of the best_tour and the soln_tour ******/
for( i = 0; i < num_nodes; i++ )(
best_history[SamplePts][i] = cc->best_tour[i];
soln_history[SamplePts][i] = cc->soln_tour[i];
#ifdef T2
printf("uphill_history[%d][%d] = %f\t",SamplePts,run_num,cc->cost soln_tour);
printf("cost_history[%d][%d] = %f\n",S amplePts,run_num,cc->cost_best_tour);
#endif T2
itose = cc->iteration;
printf("SampPts = %d,iter = %d,costperb_tour = %f,cost_acc_tour = %f\n",
SamplePts,itose,cc->cost_soln_tour,cc->costbest_tour);
SamplePts ++;
I
#define STDEBUG
#ifdef STDEBUG
itose = cc->iteration;
k = maxito / sample_pts;
(itose == 1) ){
if( ((itose % k) == 0) II
if ( (fp = fopen(bestf,"a+") ) == NULL ){
printf(" best.dat failed to open!!!\n");
exit();
if (itose == 1){
fprintf(fp," run_number = %d\n",run_num);
fprintf(fp,"iteration\t temperature\tcost_soln_tour\t cost_best_tour\n\n");
fprintf(fp,"%d \t %f\t %f\t %f\n",
1,cc->temperature,cc->cost_soln_tour,cc->cost_best_tour);
}else(
fprintf(fp,"%f\t %f\t%f\t %f\n",
cc->iteration,cc->temperature,cc->cost_soln_tour,cc->cost_best_tour);
if ( itose == max_ito)
fprintf(fp,'.n\n");
fclose(fp);
#endif
return;
}
-215-
m
M. Tran
Appendix A: Simulation Program
/********** record of optimal tour ***********/
opt tour_recordO
{
struct tsp_cc_struct *cc;
int i;
int num_opt_out,iter,opt_out;
FILE *fopenO, *fp;
static char optf[] = "opt.dat";
cc = &central_coord;
/** To keep track of the optimal tour **/
if(cc->cost_best_tour < cc->cost_opt_tour[run_num]) {
cc->costopttour[run_num] = cc->cost_besttour;
cc->opttemperature[run_num] = cc->temperature;
cc->opt_iteration[runnum] = cc->iteration;
for (i = 0; i < num_nodes; i++ )
cc->optimal_tour[i] = cc->best_tour[i];
I
#define OPTDEBUG
#ifdef OPTDEBUG
/** number of optimum tours to be outputted **/
num_opt_out = 5;
iter = cc->iteration;
opt_out = maxito/num_opt_out;
if( ( (iter%opt_out) == 0 ) 11(iter == 1) ){
if( ( fp = fopen(optf,"a+") ) == NULL ){
printf("****** Optimal.dat failed to open *****\n");
exit();
if( cc->iteration == 1.0 ){
fprintf(fp,"number of pe
= %d\n",npe);
fprintf(fp,"number of nodes
= %d\n",num_nodes);
fprintf(fp,"mapseed
= %d\n",mapseed);
fprintf(fp,"seed
= %d\n",seed);
fprintf(fp,"annealing - depth
= %f\n",depth);
fprintf(fp,'\n THE RANDOM INITIAL COST = %f\n",cc->cost_soln_tour);
fprintf(fp,'\n\n THE RANDOM INITIAL TOUR .... \n");
for ( i = 0; i < num_nodes; i++){
if(i != 0 )
if((i% 10)== 0)
fprintf(fp,'\n"); /** next line **/
fprintf(fp,"%cN ",cc->soln_tour[i]);
fprintf(fp,'\n");
fprintf(fp," At %dth run\n",run_num);
-216-
M. Tran
Appendix A: Simulation Program
fprintf(fp," The optimal tour cost = %f\n",cc->cost_opt_tour[run_num]);
fprintf(fp," occurs at the optimal iteration = %f\n",
cc->opt_iteration[run_num]);
fprintf(fp," and at the optimal temperature = %f\n",
cc->opt_temperature[run_num]);
fprintf(fp,"\n");
fprintf(fp,"THE OPTIMAL TOUR ....\n");
for (i = 0; i < num_nodes; i++){
if(i !=O)
if((i% 10)== 0)
fprintf(fp,'1n"); /* line feed */
fprintf(fp,"%d\t",cc->optimal_tour[i]);
fprintf(fp,'"\n\n");
fclose(fp);
#endif
return;
I
/**** results of the best tour or the current tour ****/
tour_results()
{
struct tsp_cc_struct
int i;
*cc;
cc = &central_coord;
if ( cc->state_tour == ACCEPTED ){
printf('Ntin ITERATION ACCEPTED_TOUR RESULTS (in tour2.c):\n ");
printf('t,\t mgcc = %f\n",cc->rngcc);
printf('\t\t accept_probcc = %f\n",cc->accept_probcc);
printf('%t\t temperature = %f\n",cc->temperature);
printf('"\tt opt_temperature = %f\n",cc->opt_temperature[run_num]);
printf('\tft cost_opttour = %f\n",cc->cost opt_tour[runnum]);
printf('"\t opt_iteration = %f\n",cc->opt_iteration[run_num]);
printf('"t\t iteration = %ftn",cc->iteration);
printf('Nt\t chg_in_costcc = %f\n",cc->chgin_costcc);
if ( cc->state_tour = INIT )
printf('Nt\t state_tour = INIT \n");
if ( cc->state_tour == ACCEPTED )
printf('t\t state_tour = ACCEPTED \n");
else printf('t\t state_tour = NOT ACCEPTED \n");
printf("'tNt not_acc_cnt = %d\n",cc->not_acc_cnt);
printf('%t\t cost_accepted_tour = %f\n",cc->cost_best_tour);
\");
printf('Nt\t the accepted tour = ....
for( i = 0; i < num_nodes; i++ )(
if(i != )
-217-
|
M. Tran
Appendix A: Simulation Program
/** next line **/
if((i % 10) == 0)
printf('Nn");
printf("%d\t",cc->best_tour[i]);
printf('n\n");
}else(
printf('t\n ITERATION PERTURBED_TOUR RESULTS (in tour2.c):\n ");
printf('%\t rngcc = %f\n",cc->rngcc);
printf('Nt\t accept_probcc = %f\n",cc->acceptprobcc);
printf('Nt\t temperature = %f\n",cc->temperature);
printf('Nt\t costopt_tour = %f\n",cc->cost_opt_tour[run_num]);
printf('\tt opttemperature = %f\n",cc->opLttemperature[run_num]);
printf('%\t opt_iteration = %f\n",cc->opt_iteration[run_num]);
printf('\tt iteration = %f\n",cc->iteration);
printf('"\t chgin_costcc = %f\n",cc->chgincostcc);
if ( cc->state_tour = INIT )
printf('%t\t state_tour = INIT \n");
if ( cc->state_tour == ACCEPTED )
printf('Wt state_tour = ACCEPTED \n");
else printf('\tt state_tour = NOT ACCEPTED \n");
printf('"\t\t not_acc_cnt = %d\n",cc->not_acc_cnt);
printf('t\t cost_perturbed_tour = %f\n",cc->cost_soln_tour);
n");
printf('"\t\t the perturbed tour = ....
for( i = 0; i < num_nodes; i++)(
if(i !=O)
if ((i % 10) == 0)
/**next line **/
printf('Nn");
printf("%d\t",cc->soln_tour[i]);
printf('\n\n");
} /*else */
return;
)/*struct */
/********** END OF FUNCTIONS CALLED BY STEP_A5() ***********/
#include
#include
#include
#include
<stdio.h>
<math.h>
"defs.c"
"external.c"
BEGIN OF OUTPUT.C ***************/
/**This function outputs the best possible tour or optimal tour of the TSP **/
/*************************
output()
{
-218-
M. Tran
struct
int
int
int
FILE
static
Appendix A: Simulation Program
tspcc_struct
*cc;
time();
upper, lower;
i, j;
*fopeno, *fp;
char outfil[]="out.dat";
cc = &centralcoord;
final_run_time = time();
#define OUTDEBUG
#ifdef OUTDEBUG
if( ( fp = fopen(outfil,"a+") ) == NULL ){
printf(" ****** Out.dat failed to open ***** \n");
exit();
I
fprinff(fp,'t\t\t**TRAVELING SALESMAN / ANNEALING ALGORITHM **\\n");
fprintf(fp,"number of nodes
= %d\n",num_nodes);
fprintf(fp,"mapseed
= %d\n",mapseed);
fprintf(fp,"seed
= %d\n",seed);
fprintf(fp,'\n");
fprintf(fp,"annealing - depth
= %f\n",depth);
fprintf(fp, "initruntime
= %d\n",init_run_time);
fprintf(fp,"final_run_time
= %d\n",final_run_time);
fprintf(fp,"elapsed_run_time
= %d\n",final_run_time-init_run_time);
fprintf(fp,'"n\n");
fprintf(fp," At %dth run\n",run_num);
fprintf(fp," The optimal tour cost = %f\n",cc->cost_opt_tour[run_num]);
fprintf(fp," occurs at the optimal iteration = %f\n",
cc->opt_iteration[run_num]);
fprintf(fp," and at the optimal temperature = %fin",
cc->opt_temperature[run_num]);
fprintf(fp,'\n");
fprintf(fp,"THE WINNING TOUR \n");
for (i=0; i < num_nodes; i++) {
if ( ( i % 10 ) == 0 ) fprintf(fp,'"n");
fprintf(fp," %d\",cc->optimal_tour[i]);
}
fprintf(fp,'"\n\n");
for( i = 0; i < sample_pts; i++)
fprintf(fp, "%f\n",costhistory[i] [run_num]);
**/
fprintf(fp,'\n\n");
fclose(fp);
-219-
M. Tran
Appendix A: Simulation Program
#endif
return;
I
#include
<stdio.h>
#include
<math.h>
#include
"defs.c"
#include
"external.c"
/********************BEGIN OF STAT.C ******************/
/** This function outputs the results of statistical data for every run
**/
/***********************************************************
results()
struct tsp_cc_struct *cc;
static char total[] = "total.dat";
FILE *fopenO, *fp;
float inittot cost,best_tot_cost, soln_tot_cost;
float opttotcost,opttotiter,opt_tottemp;
float opt_mean,itermean,temp_mean,soln_mean, best_mean, std_dev;
float mean_differ;
int
itr,sample, iter;
cc = &central_coord;
if( (fp=fopen(total,"a+") ) == NULL ){
printf("total.dat failed to open\n");
exit();
fprintf(fp,'Nt **** INPUT DATA ****\n\n");
fprintf(fp,'\t npe \t num_nodes \t max_ito \n");
fprintf(fp,'\t %d \t %d \t %f\n\n",npe,num_nodes,max_ito);
fprintf(fp,'\t mapseed \t seed \t depth \n");
fprintf(fp,'1t %d \t %d \t %f\n\n",mapseed,seed,depth);
fprintf(fp,'Nt num_runs \t sample_pts\tn");
fprintf(fp,'\t %d \t %d \n",num_runs,sample_pts);
fprintf(fp,'\n");
/** Calculate the average optimal cost or min {best cost), iteration and temperature **/
opttot_cost = 0.0;
opttot_iter = 0.0;
opt_tot_temp = 0.0;
for ( run_num = 0; run_num < num_runs; run_num++ )(
opt_tot_cost += cc->cost_opt_tour[run_num];
opttot_iter += cc->optiteration[run_num];
-220-
m
Appendix A: Simulation Program
M. Tran
opttottemp += cc->opt_temperature[run_num];
)
opt_mean = opt_tot_cost/num_runs;
iter_mean = opttot-iter/num_runs;
temp_mean = opttot temp/num_runs;
fprintf(fp,"The average optimal cost = %fAn",opt_mean);
fprintf(fp,"occurs at the average temperature = %f\n",temp_mean);
fprintf(fp," and at the average iteration = %f\n\n",iter_mean);
fprintf(fp,'" t tt COST_HISTORY STATISTICS \n");
fprintf(fp,"Itenrt Soln_Mean\t Best_Mean \t Std_dev \n\n");
/** Calculate the average initial cost **/
init_tot cost = 0.0;
for (run_num = 0; run_num < num_runs; run_num++ )
init_tot_cost += cc->cost_init_tour[runnum];
}
soln_mean = best_mean = init_tot_cost / num_runs;
mean_differ = std_dev = 0.0;
fprintf(fp,"%d \t %.2f \t %.2f\t %.2f\t %.2f\n",
1,soln_mean,best_mean,mean_differ,std_dev);
/** Average Ensembling best costs and current costs **/
for( sample = 0; sample < SamplePts; sample++ )
best_tot_cost = 0.0; /** need to be initilized at every samplept **/
soln_totcost = 0.0;
for ( run_num = 0; run_num < num_runs; run_num++){
/*total cost of the history of the cost_best_tour */
best_tot_cost += cost_history[sample][run_num];
/*total cost of the history of the cost_soln_tour */
soln_tot_cost += uphill_history[sample][run_num];
I
/*The average of the total cost of the history of the cost_soln_tour */
soln_mean = soln_tot_cost / numruns;
/*The average of the total cost of the history of the costbesttour */
best_mean = best_tot_cost / num_runs;
best_tot_cost = 0;
for ( run_num = 0; run_num < num_runs; run_num++ )(
/*total cost of the history of the costbest tour */
best_tot_cost += ( cost_history[sample] [run_num] - best_mean );
-221-
M. Tran
Appendix A: Simulation Program
I
std_dev = sqrt(besttotcost/num_runs);
/** the iteration starts at 2 and (maxito/sample_pts)*iter after **/
iter = (max_ito/sample_pts)*sample;
fprintf(fp,"%d \t %.2f \t %.2f \t %.2f\n",
iter,soln_mean,best_mean,std_dev);
) /* sample */
fprintf(fp,'ln\n\n");
fclose(fp);
return;
}
/*********************** END OF STAT.C ***********************
222-
M. Tran
Bibliography
[Adams86] M.B. Adams and R.M. Beaton, "Automated Mission and Trajectory Planning,"
Pilot's Associate Planning Conference, Aspen, Colorado, September 16, 1986.
[Aho74] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, The Design and Analysis of
Computer Algorithms, Addison-Wesley, Reading, MA 1974.
[Amda67] G.M. Amdahl, "Validity of the Single Processor Approach to Achieving LargeScale Computing Capabilities," Proc. AFIPS, 30, 483-485, 1967.
[Beard59] J. Beardwood, J.H. Halton, and J.M. Hammersley, "The Shortest Path
Through Many Points," Proc. CambridgePhilos. Soc., 55, 299-327, 1959.
[Bell68] M. Bellmore and G. L. Memhauser, "The Traveling Salesman Problem: A
Survey," Operation Research, 16, 538-558, 1968.
[Bert85] D. Bertsekas and J. Tsitsiklis, "Distributed Asynchronous Optimal Routing in
Data Networks," MIT LIDS-P-1452, 1985.
[Bod83] L. Bodin, A.A. Golden, and M. Ball, "Routing and Scheduling of Vehicles and
Crews, The State of the Art," Comp. and Oper. Res., 10, 69--211, 1983.
[Bon84] E. Bonomi and J. Lutton, "The N-City Traveling Salesman Problem: Statistical
Mechanics and the Metropolis Algorithms," SIAM Review, 26, No. 4, 551-568, October
1984.
[Cern85] V. Cerny, "Thermodynamical Approach to the Traveling Salesman Problem: An
Efficient Simulation Algorithm," J. Opt. Theory Appl., 45, 41-51, 1985.
[Cerv87] J.H. Cervantes, "The Boltzmann Machine and Its Application to the TravelingSalesman Problem," S.B. Thesis, Dept. of EECS, Massachusetts Institute of Technology,
June 1987.
[Chen75] T. C. Chen, "Overlap and Pipeline Processing," Introduction to Computer
Architecture, H. Stone, Ed. SRA, 375-431, 1975.
[Chris76] N. Christofides, "Worst-Case Analysis of a New Heuristic for the Traveling
Salesman Problem," Report 388, Graduate School of Industrial Administration, Carnegie
Mellon University, February 1976.
[Con67] R.W. Conway, W.L. Maxwell, and L.W. Miller, Theory of Scheduling,
Addison-Wesley, Reading, MA, 1967.
[Cook71] A.S. Cook, "The Complexity of Theorem-Proving Procedures," Proc. 3rd
Annual ACM Symp. on Theory of Computing, Association for Computing Machinery,
New York, 151-158, 1971.
-223-
M. Tran
Bibliography
[Crane86] R.L. Crane, M. Minkoff, K.E. Hillstrom, and S.D. King, "Performance
modeling of large-grained parallelism," Technical Memorandum No. 63, Argonne
National Laboratory, Argonne, Illinois 60439, March 1986.
[Dant54] G.B. Dantzig, D.R. Fulkerson, and S.M. Johnson, "Solution of a Large Scale
Traveling Salesman Problem," OperationResearch, 2, 393-410, 1954.
[Dav69] P.S. Davis and T.L. Ray, "A Branch-Bound Algorithm for the Capacitated
Facilities Location Problem," Nav. Res. Log. Q., 16, 331-344, 1969.
[Deut85] O.L. Deutsch, J.V. Harrison, and M.G. Adams, "Heuristically-Guided Planning
for Mission Control/Decision Support," Technical Report, The Charles Stark Draper
Laboratory, Cambridge, MA 02139, 1985.
[Eag89] D. L. Eager, et. al., "Speedup Versus Efficiency in Parallel Systems," IEEE
Transactionson Computers, 38, No. 3, March 1989.
[Fed80] A. Federgruen and P. Zipkin, "A Combined Vehicle Routing and Inventory
Allocation Problem," Technical Report No. 345A, Graduate School of Business, Columbia
University, June 1980.
[Ford62] L.R. Ford and D.R. Fulkerson, Flows in Nerworks, Princeton U. P., Princeton
University, NJ, 1962.
[Gar79] M.R. Garey and D.S. Johnson, Computers and Intractibility:A Guide to the
Theory of NP-Completeness,Will Freeman, San Francisco, 1979.
[Gem84] S. Geman and D. Geman, "Stochastic Relaxation, Gibbs Distributions, and the
Bayesian Restoration of Images," IEEE Trans. on Pattern Analysis and Machine
Intelligence, November 1984.
[Greene84] J.W. Greene, "Simulated Annealing without Rejected Moves," Proceedings
IEEE Conference on Computer Design: VLSI in Computers, Port Chester, NY, October
1984.
[Gold86] B. L. Golden and C.C. Skiscim, "Using Simulated Annealing to Solve Routing
and Location Problems," Naval Logistics Research Quaterly, 33, 261-279, 1986.
[Harp85] R.E. Harper and J. Lala, "A Fault-Tolerant Parallel Processor," Technical
Report, The Charles Stark Draper Laboratory, Cambridge, MA 02139, 1985.
[Harp87] R.E. Harper, "Critical Issues in Ultra-Reliable Parallel Processing," PhD Thesis,
Dept. of Aero./Astro., Massachusetts Institute of Technology, June 1987.
[Haj85] B. Hajek, "Cooling Schedules for Optimal Annealing," Dept. of EE, University of
Illinois at Urbana-Campaign, 1985.
[Horo81] E. Horowitz and A. Zorat, "The Binary Tree as an Interconnection Network:
Applications to Multiprocessor Systems and VLSI," IEEE Trans. on Computers, 30, No.
4 April 1981.
[Held62] M. Held and R.M. Karp, "A Dynamic Programming Approach to Sequencing
Problems," SIAM, 10, 196-210, 1962.
-224-
m
M
M. Tran
Bibliography
[Karp72] R. M. Karp, "Reducibility Among Combinatorial Problems," in Complexity of
Computer Computations, R.E.Miller and J.W. Thatcher, Eds., Plenum, New York, 85103, 1972.
[Karp77] R. M. Karp, "Probabilistic Analysis of Partitioning Algorithms for the Traveling
Salesman Problem," SIAM J. Computing, 8, 561-573, 1977.
[Kim86] S. Kim, "A Computational Study of a Parallel Simulated Annealing Algorithm for
the Traveling Salesman Problem", S.M. Thesis, Dept. of Aero./Astro., Massachusetts
Institute of Technology, June 1986.
[Kirk82] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, "Optimization by Simulated
Annealing," IBM Thomas J. Watson Research Center, Yorktown Heights, NY, 1982.
[Kirk83] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, "Optimization by Simulated
Annealing," Science, 220, 671-78, May 1983.
[Klee80] V. Klee, "Combinatorial Optimization: What Is the State of the Art?" Math. Oper.
Res., 5, 1-26, 1980.
[Kron85] L. Kronsjo, Computational Complexity of Sequential and ParallelAlgorithms,
John Wiley & Sons, 1985.
[Kung76] H.T. Kung, "Synchronized and Asynchronous Parallel Algorithms for
Multiprocessors," Algorithms and Complexity, Academic Press, 153-200, 1976.
[Law85], E.L.Lawler (Ed) et al., The Traveling Salesman Problem: A guided Tour of
CombinatorialOptimization, John Wiley & Sons Ltd., 1985.
[Law76] E.L.Lawler, CombinatorialOptimization:Netwoks andMatroids, Holt, Rinehart
& Winston, New York, 1976.
[Len75] J.K. Lenstra and A.H.G. Rinnooy Kan, "Some Simple Applications of the
Traveling Salesman Problem," Oper. Res. Quart., 24, 717--733, 1975.
[Len81] J.K. Lenstra and A.H.G. Rinnooy Kan, "Complexity of Vehicle Routing and
Scheduling Problems," Networks, 11, 221-227, 1981.
[Lin65] S. Lin, "Computer Solutions for the Traveling Salesman Problem," Bell Systems
Technical Journal,44 ,2245-2269, 1965.
[Lin73] S. Lin and B.W. Kerningham, "An Effective Heuristic Algorithm for the Traveling
Salesman Problem," OperationResearch, 21, 498-516, 1973.
[Lit63] J.D.C. Little, K.G. Murty, D.W. Sweeney, and C. Karel, "An Algorithm for the
Traveling Salesman Problem," OperationResearch, 11, 195-1203, 1976.
[Man64] A.S. Manne, "Plant Location Under Economics-of Scale-Decentralization and
Computation," Management Science, 11, 213-235, 1964.
[Met53] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller,
"Equation of State Calculations by Fast Computing Machines," JournalChemical Physics,
21, 1156-1159, 1953.
-225-
m
M. Tran
Bibliography
[Mit85] D. Mitra, F. Romeo, and A. Sangiovanni-Vincentelli, "Convergence and Finitetime Behavior of Simulated Annealing," UC Berkeley Memo. No. UCB/ERL M85/23,
March 1985.
[Moh82] J. Mohan, "A Study in Parallel Computation--The Traveling Salesman Problem,"
Dept. of CS, Carnegie Mellon University, 1982.
[Park83] R.G. Parker and R.L. Rardin, "The Traveling Salesman Problem: An Update of
Research," Naval Res. Log. Quart., 30, 69--96, 1983.
[Poly86] C. D. Polychronopoulos, "On Program Restructing, Scheduling, and
Computation for Parallel Systems," PhD Thesis, Univ. of Illinois at Urbana-Champaign,
1986.
[Rom85] R. Romeo and A.L. Sangiovanni-Vincentelli, "Probabilistic Hill Climbing
Algorithms: Properties and Applications," Proc. 1985 Chapel Hill Conference on VLSI,
393-417, May 1985.
[Ross86] Y. Rossier Y., M. Troyon, T. Liebling, "Probabilistic Exchange Algorithms and
Eulidean Traveling Salesman Problems," Dept. de Math., Polytech. Federale, Lausanne,
Switzerland OR Spektrum (Germany) 8, No.3, 151-164, 1986.
[Sch84] R.B. Schnabel, "Parallel Computing in Optimization", Dept. of CS, University of
Colorado, CU_CS_282-84, October 1984.
[Simp69] R. Simpson, "Scheduling and Routing Models for Airline Systems,"
unpublished report, Flight Transportation Laboratory, MIT, 1969.
[Tsit84] J.N. Tsitsiklis, "Problems in Decentralized Decision Making and Computation,"
PhD Thesis, Department of EECS, MIT, 1984.
[Tsit85] J.N. Tsitsiklis, "Markov Chains with Rare Transitions and Simulated Annealing,"
Dept. of EECS, MIT LIDS Memo., August 1985.
[van87] P. J. M. van Laarhoven and E. H. L. Aarts, Simulated Annealing: Theory and
Applications, Kluwer, Dordrecht, 1987.
[Vec83] M.P. Vecchi and S. Kirkpatrick, "Global Wiring by Simulated Annealing," IEEE
Trans. on Computer-Aided Design, CAD-2, 259 - 271, 1983.
[Wag58] H.M. Wagner and T.M. Whithin, "Dynamic Version of the Economic Lot Size
Model," Management Science, 5, 89-96, 1958.
-226-
LIST OF DISTRIBUTION
MS
INTERNAL:
20
10
1
1
1
1
1
1
1
1
1
1
1
EXTERNAL:
Mua Tran
Rick Harper
Mark Busa
Mark Dzwonczyk
Paul Mukai
Ken Jaskowiak
Nghia Nguyen
Tuan Le
John Deyst
James Cervantes
Doug Fuhry
Roger Hain
Education Office
3E
3E
3E
3E
3E
3E
20
63
92
3B
2B
2B
5-7
Professor Wallace E. Vander Velde
ROOM
3443A
2466A
3441A
3440A
3435A
1452A
3130
2325D
8105C
3463
2463
2451
5286
MIT-33-109
Download