Evolutionary Computing and the Traveling Salesman By Sam Mulder 11/2/2002 CS 447 – Advanced Topics in AI Abstract The Traveling Salesman Problem (TSP) is a very hard optimization problem in the field of operations research. It has been shown to be NP-complete, and is an often-used benchmark for new optimization techniques. One of the main challenges with this problem is that standard, non-AI heuristic approaches are currently very effective and in wide use for the common fully connected, Euclidean variant that is considered here. In this paper we examine current evolutionary approaches to the TSP, and introduce a new approach that takes advantage of the modern state-of-the-art as exemplified in the Chained Lin-Kernighan algorithm to provide learning. We also discuss the limitations of evolutionary approaches in general as relate to this problem as well as potential areas where they may prove useful and not just academic. In the results section, we compare the new algorithm with traditional Chained Lin-Kernighan as well as a modern neural clustering approach. Keywords – Traveling Salesman Problem (TSP), Evolutionary Computing, Evolutionary Algorithms, Divide-and-Conquer, Chained Lin-Kernighan, Adaptive Resonance Theory (ART), Self-organizing Maps (SOM), Clustering Table of Contents I. Introduction A. Traveling Salesman Problem B. Heuristic Approaches C. Neural Clustering Approach D. Evolutionary Algorithms II. Current Evolutionary Techniques III. Proposed Algorithm IV. Experimental Setup V. Results VI. Conclusions VII. Bibliography Introduction Evolutionary Algorithms (EAs) are an exciting, and rapidly expanding field of research in the artificial intelligence community. They attempt to model natural processes in order to evolve better solutions to problems using the basic operators of crossover and mutation. EAs have many advantages in trying to solve hard problems, since they evolve solutions from random starting points and require little domain knowledge about the error surface to be explored. The Traveling Salesman Problem (TSP) is one of the most studied problems in computer science literature. It is an example of the important class of problems known as NPcomplete problems. An NP-complete problem is one that is solvable in polynomial time by a non-deterministic algorithm, but not necessarily by a deterministic one. Another way of considering this class of problems is to say that a correct solution may be checked in polynomial time, but no known algorithm can solve the problem in that time. These problems are interesting because it is possible to draw parallels between any two NPcomplete problems and show that finding an algorithm to solve one in polynomial time gives you an algorithm to solve the other. They are also interesting because it has never been proven that NP-complete problems cannot be solved in polynomial time, so it remains as an open question. To show where the class of NP-complete problems falls more clearly, consider two example problems and how they compare to the TSP. The first problem is to sort a list of integers. This problem takes on the order of O(n lg(n)) time, using “Big-Oh” notation to represent an asymptotic upper bound, where n is the number of elements in the list. The sorting problem is therefore solvable in polynomial time, since O(n lg(n)) = O(n2). Many common algorithms are known to solve the problem in this bound, such as Merge Sort and Heap Sort. The second example problem is to generate every combination of integers in a list. This clearly takes O(n!) time to solve, since every combination must be generated. This problem is actually harder than the traveling salesman though, because even if we had a non-deterministic algorithm that always made the correct choices it would take O(n!) time. Using the other description of NP-complete, it would take O(n!) time to even check if we had a correct answer. The TSP falls between these two example problems in difficulty. The most general form of the TSP is to find a Hamiltonian cycle given an arbitrary graph. All known algorithms to solve this problem take greater than polynomial time, but if we have a solution we can check it in O(n) time. The more specific case of the TSP considered in this paper is also a member of NP-complete, but is more complicated to check. Now that we see why the TSP is interesting, what hope do EAs have of solving this problem? The first thing to consider is that for any practical instance of the TSP, an exact solution is intractable. The best exact algorithms become intractable even with the number of cities less than 100. Instead, heuristical algorithms are used to find approximate solutions. In this paper we consider how an EA may be used to augment standard heuristic approaches and create possibilities for greater parallelism in finding solutions. Traveling Salesman Problem Given a complete undirected graph G =(V,E), with non-negative integer costs c(u,v) for each edge in E, the most general form of the traveling salesman problem is equivalent to finding any Hamiltonian cycle over G where such a cycle is known as a tour. The more common form of the problem is the optimization problem of trying to find the shortest Hamiltonian cycle. Both of these problems have been proven to be NP-complete in [1]. These problems are very useful to consider because they map so easily to many realworld applications in a wide range of fields from network routing to cryptography [2]. The variation of the TSP that we consider is even more limited. We look only at graphs that can be mapped to a two-dimensional Euclidean coordinate system where the edge weight between two nodes is the distance between them. This system implies that the triangle inequality holds such that c(u,w) c(u,v) + c(v,w). We also require that the XY coordinates of each node be given. This variation can be shown to still be NP-complete and therefore to map onto the more general problem [3]. The Euclidean form of the TSP has been widely studied [4-7]. Heuristic Approaches The current best results on large-scale TSP instances come from variations on an algorithm proposed by Lin and Kernighan in 1973 [8]. This algorithm takes a randomly selected tour and optimizes it by stages until a local minimum is found. This result is saved and a new random tour selected to begin the process again. Given enough time, the optimal tour can be found for any instance of the problem, although in practice this is limited to small instances for an absolute result due to time constraints. Even with extremely large instances, however, reasonably short tours can be found [8]. The optimizations performed by the Lin-Kernighan (LK) algorithm use the concept of optimality. If n is the number of cities in a TSP, then we know that given any tour we can reach an optimal tour by swapping at most n edges. Using -optimality, we consider the optimal tour that can be reached by swapping only edges. As grows, the chance that the -optimal tour is actually optimal increases. Unfortunately, the processing time required to find -optimal tours is intractable for large . The largest commonly used values for are 2 and 3, although some attempts have been made with as 4 or 5 [8]. The algorithm proposed by Lin-Kernighan examines variable optimization. In brief, an initial 2 edge swap is chosen. The algorithm then examines whether it would be profitable to make a 3 edge swap, and continues increasing the number of edges as long as the total swap remains an improvement over the initial tour. This process is repeated with each node being considered in turn as a starting point for the initial 2 edge swap until no further improvements can be found. The original algorithm then started with a different random tour and worked from the beginning, always saving the best tour seen. A popular variant and the most effective algorithm for large TSP problems known is Chained LK. With Chained variants of LK, instead of beginning with a random tour at each new iteration, the algorithm perturbs the previous tour by some amount and optimizes from there. This perturbation generally consists of a process known as a double bridge [8, 9]. This consists of a 4-edge exchange that is never found by the basic LK algorithm and is illustrated below. Figure - Double Bridge This exchange has the effect of changing the global shape of the tour and starts the basic LK algorithm in a position that should find a different minimal tour, increasing the chance that the actual minimum tour is found [17]. Neural Clustering Approach Previous research in this area focused on developing an algorithm based on neural clustering. This algorithm used Adaptive Resonance Theory and Self Organizing Maps to first cluster the data, find solutions for each cluster, and then merge the resulting subtours into a single tour. This algorithm will be briefly explained below. Adaptive Resonance Theory (ART) was first introduced by Carpenter and Grossberg [17]. The unsupervised variants provide a simple, effective neural clustering algorithm. The original algorithm, designated ART1, used binary input sequences. The variation developed for this solution uses two-input integer sequences where the integers are the XY coordinates of a node. In this form, ART is functionally similar to k-means clustering except that k is increased dynamically as new patterns are introduced. In the basic ART algorithm, patterns are introduced to a simple two-layer network. Each neuron in the first layer represents an input. Each neuron in the second layer represents a pattern recognized by the network. As new patterns are considered, the network calculates the distance between the new pattern and each of the known patterns. If the distance is less than the vigilance factor for the network, then that pattern resonates with the input. The nearest resonating pattern is chosen as the winner, and it is adjusted towards the input pattern by some learning rate. If no patterns resonate, a new neuron is created in the second layer corresponding to the new pattern. The size of the vigilance factor chosen indirectly determines the number of patterns created for a given dataset. Self-Organizing Maps were first introduced by Kohonen [18]. In their original form, they provide a neural technique for clustering data points scattered on a Euclidean surface. Neurons are laid out in a grid on the surface and then shifted towards the clusters. As each input point on the surface is considered, the closest neuron fires and its weights are adjusted to move it towards the point based on some learning rate. In this way, the neurons are drawn towards the nearest clusters of points. SOM differs from ART in that it begins with a fixed number of neurons that are already associated with points on the surface. The main advantage of SOM for the TSP is the fact that the initial neurons can be laid out in an arbitrary fashion. By choosing a number of neurons equal to the number of nodes in a graph and setting them in a specific order, we can maintain a path between them that gives us a final route through the nodes after the network is trained. We use a modification of SOM that includes regeneration in order to adapt more effectively to the distribution of the nodes in the graph [16]. Our variation of SOM begins with the goal of matching one neuron to each node while maintaining an efficient path through the graph. To prevent some neurons from oscillating between nodes and others from remaining dormant, we inhibit a node after it has fired once in a given epoch. If the neuron would have fired a second time, we instead generate a new neuron at its location and shift it towards the new node. We insert the new neuron in the existing path based on its relative location to the previous neuron. If a neuron goes through an entire epoch without firing, it is removed. This process increases the density of neurons in areas of high node density while maintaining a constant number of neurons after each epoch. Since the new neurons are inserted into the path in the correct position, path crossovers are generally prevented. This allows us to skip the timeconsuming crossover detection of previous variants in our algorithm. Our main neural algorithm considered in this paper first uses the ART technique described above to form clusters of cities. Within each cluster, the SOM algorithm described above is used to form an initial tour. After initial tour formation, an iteration of the original Lin-Kernighan algorithm is run to drive the tour to a local minimum. Tours are then merged together using a simple closest match ring merge algorithm. Evolutionary Algorithms Evolutionary Algorithms attempt to use what we know about how genetics in real organisms work in order to evolve solutions to problems using computers. The basic idea of an EA is to start with a population of random solutions and then use certain operations to change that population through generations, hopefully converging towards a solution. Two primary operators are used in EAs, crossover and mutation. Both of these operators contribute to either creating a new individual or changing an existing individual. At each iteration of the algorithm, individuals are created and/or changed and a competition takes place to see which individuals get to be part of the population for the next iteration. Crossover is a method for creating a new individual. In general two or more parents are chosen from the existing population and combined in some way to create a child. A typical crossover implementation takes a certain number of the genes from one parent and a certain number from the other. This form of recombination allows existing solutions to contribute to the formation of new solutions. Mutation is a method for changing an existing individual. In general, each member of an existing population will have some small chance to be mutated. If mutation occurs, a small change will be randomly applied to that individual. This technique allows an EA to explore the problem space in directions not included in the initial population. The combination of crossover and mutation provides a partially directed search that still has the capability to eventually find any point in the problem space. Previous State of the Art EAs have been applied to the TSP before, but have been generally unsuccessful as compared to state of the art heuristic algorithms. A good overview of the current state of the art in TSP optimization is given in [21]. EAs receive barely a passing mention, because they have not been competitive with this type of problem. [13] and [22] present a good overview of straightforward implementations of genetic algorithms for the TSP. While they do give decent results on small problem instances (less than 100 cities), the times required to solve such instances are very large compared to other heuristic approaches. Although discovered after the implementation of our proposed algorithm, the most promising research considered was [19]. While not directly considering the TSP, Land looked at using a combination of EAs and local search heuristics for a different type of optimization problem. Since there is a known local search heuristic for the TSP that has given good results for large problem instances, the Lin-Kernighan heuristic described above, it seems likely that this combination will prove beneficial to the TSP as well. A parallel implementation of a basic EA using Lin-Kernighan was also considered in [23]. This algorithm did not use the LK optimization during the evolutionary process or take advantage of the nature of the search space as described below. Still, it shows that parallel implementations of the TSP show great promise. Proposed Algorithm Two algorithms are proposed in this paper, one that implements a straightforward EA and one that may combine the EA approach with ART clustering to provide faster results. The problem representation used is a simple permutation of the cities. This allows us to use some basic operators to perform crossover and mutation. Crossover is controlled by a crossover rate. When creating children, a random number is chosen between 0 and 1. If the number is less than the crossover rate, two parents are chosen, otherwise only one is chosen. Parents are chosen according to an even distribution, with all parents having an equal probability of being selected. The crossover operator behaves as follows. One parent is chosen to be the primary parent, and one the secondary. A randomly selected segment of the primary parent is chosen to be fixed and retained. The cities chosen are removed from the second parent, and the remaining cities placed after the cities from the first parent in their relative order. This process allows a sub-tour from the first parent to be maintained intact, and the relative ordering of the cities in the second parent to be maintained. Mutation is applied based on a mutation rate. When creating children, a random number is chosen between 0 and 1. If the number is less than the mutation rate, then the mutation operator is applied to that child. The mutation operator randomly selects 8 cities from the tour. These 8 cities are then involved in a double bridge swap as discussed in the section on heuristic algorithms. This modification assures that the learning algorithms will not settle to the same tour. The number of children is decided by a parameter set in the parameter file. A fixed number of children are generated for each iteration of the EA. A competition is then held to select the parents for the next generation. In this competition, the best n individuals from the total pool of parents and children are chosen, where n is the original number of parents. Taking into account the extreme difficulty of the problem in general, some form of Lamarkian learning is desirable [20]. Before the competition to select new parents, a round of the original Lin-Kernighan optimization algorithm is run on each of the children created this round. This brings them to a local optimum. In general, it keeps the algorithm exploring a space of good solutions instead of flailing about randomly among the nearly n! bad ones. After learning has taken place, the competition is applied. As described in [19] for other types of optimization problems, the search space for TSP may be viewed as a surface with many basins. Using the Lin-Kernighan algorithm for local optimization allows us to drive a solution to the local optimum of a particular basin. In order for our search to be efficient, mutation must be chosen such that a given mutation does not just move the solution around within the current basin, but instead jumps it to another basin. With the given optimization technique, basins may be defined in terms of the double bridge formation as discussed above. By using a double bridge as our mutation operator, we guarantee that each mutation moves the search to a different basin. Viewed in this light, a variant of this algorithm with the mutation rate set to 1 and the crossover rate set to 0 may be seen as a parallel implementation of the popular Chained Lin-Kernighan variant described in [9] with the difference that instead of always applying the mutations to the best known tour, a population of tours is kept. Whether or not this is advantages has yet to be determined. Experimental Setup The experimental test-bed uses the routines developed previously for testing neural network techniques. The problems are taken from a collection of well-tested, randomly distributed cities described by x,y coordinate pairs. The distances are not stored, but calculated on the fly. The tours are stored as permutations of the city list. This code base allows the easy inclusion of Lin-Kernighan optimization and neural techniques for comparison. A strict object oriented approach is not followed, as the testbed is currently more of a quick-and-dirty solution to the problem to generate proof-ofconcept results than a final solution. Further work may be done in this area to clean up and optimize the code. It should be noted that the included implementation of the LinKernighan algorithm was coded from scratch using only the original paper on the subject. Other optimizations have been published and publicly available software may be located with much faster implementations. A final version of this software would take advantage of the latest advances in this area. Results Chained Lin-Kernighan (Concorde package) on a Sun Ultra 10 Problem Size Tour Length Time 100 7.43111E+06 0.24 1000 2.34148E+07 3.78 2000 3.26303E+07 8.36 8000 6.43398E+07 64.74 10000 7.20532E+07 100.54 20000 1.01468E+08 257.49 250000 3.58274E+08 5778.28 Clustered Divide and Conquer with SOM+ART+Lin-Kernighan on a PIII 933 Mhz Problem Size Tour Length Time % off 100 8.20608E+06 0 1000 2.77407E+07 3 2000 3.86702E+07 5 8000 7.66305E+07 63 10000 8.61756E+07 68 20000 1.20066E+08 189 250000 4.28589E+08 2926 10.43% 18.47% 18.51% 19.10% 19.60% 18.33% 19.63% EA + Lin-Kernighan on a PIII 933 Mhz 100 7.40308E+06 1000 2.42913E+07 -0.38% 3.74% % off 78 4235 The results displayed here show a comparison between the three techniques discussed in this paper. Standard Chained Lin-Kernighan as represented by the Concorde implementation in [24], the neuro-clustered divide and conquer approach developed in previous research in [25], and the EA approach presented in this paper. Obviously the EA approach did not scale as well as the other two. This is due to the fact that it was running the LK algorithm on a population of size 100. In a parallel implementation, this disadvantage may be minimized, because the Concorde package would not be able to take direct advantage of this parallelism. It should be noted that, in contrast to the neural approach, the EA actually found a better tour than the standard Concorde configuration did for the 100 city problem. The Concorde program could be modified to run the Chained Lin-Kernighan algorithm for a longer period of time in order to find this better result, but standard comparisons are made versus the default configuration. On the 1000 city problem, the EA was a bit off, although still much closer than the divide and conquer technique. This represents the inherent limitations that any divide and conquer approach will have on tour quality. The numbers shown here are exact for the first two algorithm, which are deterministic, and are averages over 10 runs for the stochastic EA algorithm. The EA was run with a population of 100, and 100 children for 100 generations. The mutation rate was set to 1 and the crossover rate set to 0. Raising the crossover or lowering the mutation only led to worse results, so it was determined that mutation was the only positive effect. Conclusions EAs will probably never provide the fastest possible solutions to the TSP. We have demonstrated, however, that in combination with local search techniques they are capable of finding very good solutions. The primary advantage of the EA in a combinatorial optimization environment may well prove to be as a model for parallel implementation. The parallel implementation described in [23] may be seen as an example of the power of using LK in conjunction with the parallel-nature of algorithms made available through EAs. In conclusion, we have demonstrated that a combination of EAs and local search techniques can provide good solutions to the TSP. In fact, for the 100 city problem, the solution found by our algorithm was better than that found by the chained LK toolbox at [24] using default settings. The traditional chained LK algorithm always searches from the best-known solution. It is possible that an EA approach that saves a population of good solutions and makes improvement based on those may find the optimal solution faster. A parallel version will need to be implemented to make a better comparison. The proposed algorithm for using the EA within the context of ART clustering has not yet been implemented. This will be an area for future research. Since the divide and conquer concept of the ART implementation reduces the complexity of the problem, while placing some limitation on the quality of the results, we expect to see these same characteristics if it is implemented using the EA. Bibliography [1] T. Cormen, C. Leiserson, R. Rivest. Introduction to Algorithms, MIT Press, 1996, 954-960. [2] S. Lucks. “How Traveling Salespersons Prove Their Identity”. 5th IMA conference on cryptography and coding, Springer LNCS 142-149, obtainable through: http://th.informatik.uni-mannheim.de/People/Lucks/papers.html, no date given. [3] C.H. Papadimitriou. “The Euclidean Traveling Salesman Problem is NP-Complete”, Theoretical Computer Science 4, 1977, 237-244. [4] M.L. Braun, J.M. Buhmann. “The Noisy Euclidean Traveling Salesman Problem and Learning”, Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, MA, 2002. [5] E.M. Cochrane, J.M. Cochrane, “Exploring competition and co-operation for solving the Euclidean Traveling Salesman Problem by using the Self–Organizing Map.” Artificial Neural Networks, 7-10 September 1999. 180 – 185, Volume:1. [6] S. Aurora. "Polynomial time approximation schemes for {Euclidean} traveling salesman and other geometric problems", Journal of the ACM V.45 N.5, 1998, 753-782. [7] J. Gudmundsson and C. Levcopoulos. "Hardness result for TSP with neighborhoods", Technical Report LU-CS-TR:2000-216, Department of Computer Science, Lund University, Sweden, 2000. [8] S. Lin, B. W. Kernighan, “An Effective Heuristic Algorithm for the Traveling Salesman Problem”, Operations Research 21, 1973, 498-516. [9] D. Applegate, W. Cook, and A. Rohe, “Chained Lin-Kernighan for large traveling salesman problems” Technical report, 2000. http://www.keck.caam.rice.edu/reports/ chained lk.ps [10] P. Moscoto. “TSPBIB Home Page” http://www.densis.fee.unicamp.br/~moscato/TSPBIB_home.html, 2002. [11] N. Vishwanathan, D. Wunsch. “A Hybrid Approach to the TSP”, IEEE International Joint Conference on Neural Networks, Washington, DC, July 2001. [12] P. Prinetto, M. Rebaudengo, M. Sonza Reorda. “Hybrid Genetic Algorithms for the Traveling Salesman Problem”. Artificial Neural Nets and Genetic Algorithms Proceedings, Innsbruck, Austria, 1993. [13] K. Bryant. “Genetic Algorithms and the Traveling Salesman Problem”. December 2000. Thesis, Harvey Mudd College, Dept. of Mathematics, 2000. [14] M. Dorigo, L.M. Gambardella. “Ant Colonies for the Traveling Salesman Problem”. BioSystems, 1997. [15] O. Mayumi, O. Takashi. “A New Heuristic Method of the Traveling Salesman Problem Based on Clustering and Stepwise Tour Refinements”. IPSJ SIGNotes – ALgorithms No. 45. 1995. [16] A. Plebe. “Self-Organizing Map Approaches to the Traveling Salesman Problem”. LFTNC 2001. [17] G. A. Carpenter and S. Grossberg. “The art of adaptive pattern recognition by a self-organizing neural network”. IEEE Computer, pages 47-88, March 1988. [18] T. Kohonen. “The Self - Organizing Map”. Proceedings of the IEEE 78. pp 1464 – 1480. 1990. [19] M. Land. “Evolutionary Algorithms with Local Search for Combinatorial Optimization”. PhD Thesis, Computer Science & Engineering Dept. Univ. Calif. San Diego, 1998. [20] D. Ackley and M. Littman. “A Case for Lamarkian Evolution”. Artificial Life III. Addison-Wesley, 1994, pp 487-509. [21] G. Gutin and A. Punnen eds. The Traveling Salesman Problem and It’s Variations. Kluwer Academic Publishers, Netherlands, 2002. [22] P. Jog, J. Suh, D. Gucht. “Parallel Genetic Algorithms Applied to the Traveling Salesman Problem”. SIAM Journal of Optimization. 1(4):515529, 1991. [23] R. Baraglia, J. Hidalgo, R. Perego. “A Parallel Hybrid Heuristic for the TSP”. EvoWorkshop 2001. LNCS 2037, pp 193-202, 2001. [24] Concorde software. http://www.math.princeton.edu/tsp/concorde.html. [25] S. Mulder and D. Wunsch. “Large-Scale Traveling Salesman Problem via Neural Network Divide and Conquer”. ICONIP, 2002.