The Traveling Salesman Problem in Theory & Practice Lecture 7: Local Optimization 4 March 2014 David S. Johnson dstiflerj@gmail.com http://davidsjohnson.net Seeley Mudd 523, Tuesdays and Fridays Outline 1. Tour of the DIMACS TSP Challenge Website and other web resources 2. Basic local optimization heuristics and their implementations • 2-Opt • 3-Opt Projects and Presentations Please email me by 3/11: – The planned subject for your project (survey paper, theoretical or experimental research project, etc.) and – The paper(s)/result(s) you plan to present in class. – Preferred presentation date. We have 7 more classes after this one: 3 more for me, 3 for presentations, and the last (4/29) for a wrap-up from me and 10-minute project descriptions from you. Final project write-ups are due Friday 5/2. DIMACS Implementation Challenge • Initiated in 2000 • Major efforts wound down in 2002 • Still updateable (in theory) Challenge Testbeds All provided by means of instance generation code and specific seeds for the random number. Running Time Normalization Source code for Greedy and a generator for random Euclidean instances, provided for download. Participants reported their running time for Greedy on the Test Battery instances. Machine-Specific Correction Factors 103 104 105 106 107 A Tour of the Website Click Here Local Optimization: 2-Opt Basic Scheme • Use a tour construction heuristic to build a starting tour. Which heuristic? • While there exists a 2-opt move that yields a shorter tour, How do we determine this efficiently? – Choose one. Which one? – Perform it. With what data structure? Each choice can affect both running time and tour quality. Determining the existence of an Improving 2-Opt move • Naïve approach: Try all N(N-3)/2 possibilities. • More sophisticated: Observe that one of the following must be true: d(a,b) > d(b,c) or d(c,d) > d(d,a). Suppose we consider each ordered pair (t1,t2) of adjacent tour vertices as candidates for the first deleted edge in an improving 2-opt move. Then we may restrict our attention to candidates for the new neighbor t3 of t2 that satisfy d(t2,t3) < d(t1,t2). If the improving move to the left is not caught when (t1,t2) = (a,b), it will be caught when (t1,t2) = (c,d). Sequential Searching t3 t4 t3 For t1 going counterclockwise around the tour, • For t2 a tour neighbor of t1, • For all t3 with d(t2,t3) < d(t1,t2), • For the unique t4 that will yield a legal 2-opt move, • Test whether d(t1,t4)+d(t2,t3) is less than d(t1,t2)+d(t3,t4). • If so, add 〈(t1,t2),(t4,t3)〉 to the list of improving moves. • Otherwise, continue. t2 t1 t2 Note: For geometric instances where k-d trees have been constructed, we can find the acceptable t3’s using fixed-radius searches from t2 with radius d(t1,t2). Which Improving Move to Make? Best Possible: Time consuming, not necessarily best choice for the long run. Best of those for the current choice of t1: Still not necessarily best in the long run, but significantly faster. Best of the first 8 new champions for the current choice of t1: Still faster. First found for the current choice of t1: Even faster, but not necessarily best (or fastest) in the long run. % Excess over Held-Karp Bound Variant 103 104 105 Running Time in 150Mhz Seconds 106 103 104 105 106 Best 4.7 4.6 4.5 4.5 0.21 3.1 54.9 2285 8th 4.9 4.9 4.7 4.7 0.20 2.4 49.1 2344 First 6.1 6.0 5.8 5.7 0.17 2.2 47.5 2754 [Jon Bentley’s Geometric Code] Don’t-Look-Bits • One bit associated with each city, initially 0. • If one fails to find an improving move for a given choice of t1, we set the don’t-look-bit for t1 to 1. • If we find an improving move, we set the don’t-look-bits for t1, t2, t3, and t4 all to 0. • If a given city’s don’t-look-bit is 1, we do not consider it for t1. Costs perhaps 0.1% in tour quality, factor of 2 or greater speedup. Enables processing in “queue” order: • Initially all cities are in queue. • When a city has its don’t-look-bit set to 1, it is removed from the queue. • When a city not in the queue has it’s don’t-look-bit set to 0, it is added to the end of the queue. • For the next city to try as t1, we pop off the element at the head of the queue. Which Improving Move to Make? Best Possible: Time consuming, not necessarily best choice for the long run. Best of those for the current choice of t1: Still not necessarily best in the long run, but significantly faster. Best of the first 8 new champions for the current choice of t1: Still faster. First found for the current choice of t1: Even faster, but not necessarily best (or fastest) in the long run. % Excess over Held-Karp Bound Variant 103 104 105 Running Time in 150Mhz Seconds 106 103 104 105 106 Best 4.7 4.6 4.5 4.5 0.21 3.1 54.9 2285 8th 4.9 4.9 4.7 4.7 0.20 2.4 49.1 2344 First 6.1 6.0 5.8 5.7 0.17 2.2 47.5 2754 [Jon Bentley’s Geometric Code] Tour Representations Must maintain a consistent ordering of the tour so that the following operations can be correctly performed. 1. Next(a) and Prev(a): Return the successor/predecessor of city a in the current ordering of the tour. 2. Between(a,b,c): Report whether, if one starts at city a and proceeds forward in the current tour order, one will encounter b before c. (This will be needed for 3-opt.) 3. Flip(a,b,c,d): If b = Next(a) and c = Next(d), update the tour to reflect the 2-opt move in which the tour edges (a,b) and (c,d) are replaced by (b,c) and (a,d). Otherwise, report “Invalid Move”. Tour Representations Must maintain a consistent ordering of the tour so that the following operations can be correctly performed. 1. Next(a) and Prev(a): Return the successor/predecessor of city a in the current ordering of the tour. 2. Between(a,b,c): Report whether, if one starts at city a and proceeds forward in the current tour order, one will encounter b before c. (This will be needed for 3-opt.) 3. Flip(a,b,c,d): If b = Next(a) and c = Next(d), update the tour to reflect the 2-opt move in which the tour edges (a,b) and (c,d) are replaced by (b,c) and (a,d). Otherwise, report “Invalid Move”. Tour Representations Must maintain a consistent ordering of the tour so that the following operations can be correctly performed. 1. Next(a) and Prev(a): Return the successor/predecessor of city a in the current ordering of the tour. 2. Between(a,b,c): Report whether, if one starts at city a and proceeds forward in the current tour order, one will encounter b before c. (This will be needed for 3-opt.) 3. Flip(a,b,c,d): If b = Next(a) and c = Next(d), update the tour to reflect the 2-opt move in which the tour edges (a,b) and (c,d) are replaced by (b,c) and (a,d). Otherwise, report “Invalid Move”. See [Fredman, Johnson, McGeoch, & Ostheimer, “Data structures for traveling salesmen,” J. Algorithms 18 (1995), 432-479]. Array Representation Tour a b c d e f g h i j k l m n o p q r s t u v w x Array of City Indices City Array of Tour Indices Next(ci) = Tour[City[i]+1(mod N)] Prev(ci) = Tour[City[i]-1(mod N)] (analogous) Between(ci, cj, ck): (Straightforward) y z Array Representation: Flip Tour a b c d e f g h i j k l m n o p q r s t u v w x y z g q r s t u v w x y z g q r s t u v w x c b Flip(f,g,p,q) a b c d e f p o n m l k j i h Flip(x,y,c,d) a z y d e f p o n m l k j i h Array Representation: Costs • Next, Prev: O(1) • Between: θ(N) • Flip: θ(N) Speed-up trick: If the segment to be flipped is greater than N/2, flip its complement. Problem for Arrays • For random Euclidean instances, 2-opt performs θ(N) moves and, even if we always flip the shorter segment, the average length of the segment being flipped, grows roughly as θ(N0.7) [Bentley, 1992]. • Doubly-linked lists suffer from the same problems. Can we do better with other tour representations? • We can in fact do much better (theoretically). • By representing the tour using a balanced binary tree, we can reduce the (amortized) time for Between and Flip to θ(log(N)) per operation, although the times for Next and Prev increase from constant to that amount. “Splay Trees” are especially useful in this context (and will be described in the next few slides). • Significant further improvements are unlikely, however: • Theorem [Fredman et al., 1995]. In the cell-probe model of computation, any tour representation must, in the worst case, take amortized time Ω(log(N)/loglog(N)) per operation. Binary Tree Representation • Cities are contained in a binary tree, with a bit at each internal node to tell whether the subtree rooted at that node should be reversed. (Bits lower down in the tree will locally undo the effect of bits at their ancestors.) • To determine the tour represented by such a tree, simply push the reversal bits down the tree until they all disappear. An inorder traversal of the tree will then yield the tour. • (To push a reversal bit at node x down one level, interchange the two children of x, complement their reversal bits, and turn off the reversal bit at x.) Splay Trees [Sleator & Tarjan, “Self-adjusting binary search trees,” J. ACM 32 (1985), 652-686] • Every time a vertex is accessed, it is brought to the root (splayed) by a sequence of rotations (local alterations of the tree that preserve the inorder traversal). • Each rotation causes the vertex that is accessed to move upward in the tree, until eventually it reaches the root. • The precise operation of a rotation depends on whether the vertex is the right or left child of its parent and whether the parent is the right or left child of its own parent. The change does not depend on any global properties of the subtrees involved, such as depth, etc. • All the standard binary tree operations can be implemented to run in amortized worst-case time O(log(N)) using splays. • In our Splay Tree tour representation, the process of splaying is made slightly more difficult by the reversal bits. We handle these by preceding each rotation by a step that pushes the reversal bits down out of the affected area. Neither the presence of the reversal bits nor the time needed to clear them affects the amortized time bound for splaying by more than a constant factor. Splay Tree Tour Operations Next(a): 1. Splay a to the root of the tree. 2. Traverse down the tree (taking account of reversal bits) to find the successor of a. 3. Splay the successor to the root. Prev(a): Handled analogously. Between(a,b,c): 1. Splay b to the root, then a, then c. Note that [Sleator & Tarjan, 1985] shows that no rotation for a vertex x causes any vertex to increase its depth by more than 2. Thus, after these splays, c is the root (level 1), a is no deeper than level 3, and b is no deeper than level 5. They also show that if a is at level 3, then it either the left child of a left child or the right child of a right child. 2. Clear all the reversal bits from the top 5 levels of the tree. 3. Traverse upward from b in its new position in the tree. 4. The answer is yes if 5. – we reach a first and arrive from the right, or – we reach b first and arrive from the left. Otherwise, it is no. c a a a a Splay Tree Flip(a,b,c,d) • Splay d to the root, then splay b to the root, and push all reversal bits down out of the top three levels. • There are four possiblities (TiR represents the subtree with the reversal bit at its root complemented): b d b d b b d d b b x b d d x Reverses the path from b to d. x d d b x Reverses the path from d to b. Speedups (Lose theoretical guarantees for better performance in practice) • No splays for Next and Prev – simply do tree traversals, taking into account the reversal bits. • No need to splay b in the Between operation. Instead simply splay a and c, and then traverse up from b until either a or c is encountered (as before). • Operation of Flip unchanged. • Yields about a 30% speedup. Advantages of Splay Trees • Ease of implementing Flip compared to other balanced binary tree implementations. • “Self-Organizing” properties: Cities most involved in the action stay relatively close to the root. And since typically most cities drop out of the action (get their don’t-look-bits set to 1 permanently) fairly early, this can significantly reduce the time per operation. • Splay trees start beating arrays for random Euclidean instances on modern computers somewhere between N = 100,000 and N = 316,000. They are 40% faster when N = 1,000,000. • For more sophisticated algorithms, like Lin-Kernighan (to be discussed later), the transition point is much earlier: Splay trees are 13 times faster when N = 100,000. Beating Splay Trees in Practice: The Two-Level-Tree Approximately √N segments of length √N each Splay Trees versus Two-Level Trees • Two-Level Trees 2-3 times faster for N = 10,000 (not counting preprocessing time), declining to 11% at N = 1,000,000. • But does this matter? • In 1995, the time for N = 100,000 was 3 minutes versus 5 (LinKernighan). • Today it is 2.1 seconds versus 3.8. • What is this “preprocessing”? • We switched implementations in order to be able to compare tour representations – See next slide. The Neighbor-List Implementation • Can handle non-geometric instances. – – – – TSP in graphs X-ray crystallography Video compression Converted versions of asymmetric TSP instances • Can exploit geometry when it is present. • Because of the trade-offs it makes, it may be 0.4% worse for 2opt than the Bentley’s purely geometric implementation, but it will be substantially faster for sophisticated algorithms like LinKernighan, which otherwise would perform large numbers of fixed-radius searches. The Neighbor-List Implementation • Basic idea: Precompute, for each city, a list of the k closest other cities, ordered by increasing distance, and store the corresponding distances. • If we set k = N, we should find tours as good as Bentley’s geometric code, but would take Θ(N2log(N)) preprocessing time and Θ(N2) space. • Tradeoff: Take much smaller k (default is k=20). • For geometric instances, with a k-d tree constructed, we can compute the list for a given city in time “typically” O(logN + klogk)). • No longer need to do a fixed-radius search for t3 candidates. Merely examine cities on the list for city t2 in order until a city x with d(t2,x) > d(t1,t2) is reached. • As soon as we find an improving move for a given t1, we perform it and go on to the next choice for t1 (first choice of an improving move rather than best, although given our ordering of t3 candidates, it should tend to be better than a random improving move). • Requires Θ(kN) space, but this is not a problem on modern computers. • Also allows variants on the make-up of the neighbor-list that might be useful for non-uniform geometric instances. Problem with Non-Uniform Geometric Instances Even if k = 80, the nearest neighbor graph (with an edge between two cities if either is on the other’s nearest neighbor list) is not connected. Quad Neighbors k = 16 • Pick k/4 nearest neighbors in each quadrant centered at our city c. • If any quadrants have a shortfall, bring the total to k by adding the nearest remaining unselected cities, irrespective of quadrant. • This guarantees that the graph of nearest neighbors will be connected. • For N = 10,000 clustered instances, yielded a 1-3% improvement in tours under 2-opt, with no running time penalty (and no tour penalty for uniform data). One More Thing… Starting Tours N = 10,000 [Bentley, 1992] Starting Tour % Excess over HK 2-opt % excess Start Secs 2-opt Secs Total Secs Farthest Insertion 13.0 11.9 76 89 165 Farthest Addition+ 13.2 11.8 38 52 90 Random Insertion 14.8 12.3 57 72 129 Random Addition 15.2 11.8 16 31 47 Approx. Christofides 14.9 6.7 24 40 64 Greedy 15.7 5.8 14 30 44 Nearest Neighbor 24.2 8.7 4 27 31 Similar results for Savings under the neighbor-list implementation: Savings % Excess over HK: 11.8, 2-Opt % Excess with Savings Start: 8.6 Explanation? 1000 runs on on a fixed 1000-city instance using randomized versions of Greedy and Savings. X-axis is % excess for starting tour. Y-axis is % excess after 2-opting. Microseconds/N Estimating Running-Time Growth Rate for 2-Opt (Neighbor List Implementation) Microseconds/NlogN Microseconds/N1.25 Beyond 2-Opt • 3-Opt: Look for improving 3-opt moves, where three edges are deleted and we choose the cheapest way to reconnect the segment into a tour. [Includes 2-opt moves as a special case. Naïve implementation is O(N3) to find an improving move or confirm that none exists.] • 2.5 Opt [Bentley, 1992]. When doing a ball search about t2 to find a potential t3 with d(t2,t3) < d(t1,t2), also consider the following three other possible moves: – Insert t3 in the middle of edge {t1,t2}, – Insert t1 in the middle of tour edge ending with t3, or – Insert t1 in the middle of tour edge beginning with t3. Note that these are degenerate 3-opt moves: • Or-Opt: [Or, 1976]: Special case of 3-opt in which the moves are restricted to simply deleting a chain of 1, 2, or 3 consecutive tour vertices and inserting it elsewhere in the tour, possibly in the reverse direction for chains of 3 vertices. (Time O(N2) to find an improving move or confirm that none exists.) But the next theorem suggests that 3-Opt need not take Ω(N3) in practice. Partial Sum Theorem If a sequence x1, x2, …, xk has a positive sum S > 0, then there is a cyclic permutation π of these numbers, all of whose prefix sums are positive, that is, for all j, 1 ≤ j ≤ k, it satisfies Proof: Suppose that our original sequence does not satisfy this constraint. Let M denote the largest value such that = -M for some j, and h be the largest j such that this holds. We claim that the cyclic permutation that starts with h+1 is our desired permutation. By the maximality of h, we must have, for all j, h < j ≤ k, . We also have =M+S> M. Since, by definition of M, we have ≥ -M for all j, 1 ≤ j ≤ h, our chosen permutation will have all its prefix sums positive. +M 0 -M 1 h k π(1) π(k) (G* will be the value of the best move found so far.) For each t1 in our neighbor-list implementation, we perform the first improving move found unless it is a 2-opt move, in which case we take the first extension found to a better 3-opt move, and if none is found, perform the 2-opt move. Topological Issues Topology Valid The choices Between(a,b,c) for t5 areoperation circled, for is needed the cases in where the second t4 precedes case tottell us which or follows t5’s are t3 (right). valid. 3 (left) [Note: (Omitting One choice ofcase t6 in costs the left case,0.2% twoin choices in the right.] this about tour quality.) If G* > 0, this is more restrictive than the Theorem allows, but we’ve already found an improving move for this t1 and so can afford to be aggressive -- this is a speed-up trick from [Lin & Kernighan, 1973] In neighbor-list implementation, perform move and go to next t1. In neighbor-list implementation, perform move and go to next t1. In neighbor-list implementation, if G* > 0, the current choices of t2, t3, t4 must represent an improving 2-opt move. Perform it and go to next t1. Results • Tour quality for Neighbor-List 3-opt with k = 20 is equivalent to that for Bentley’s geometric 3-opt (as opposed to 0.4% behind for 2-opt). • Neighbor List Results (2-Level Tree Tour Representation): N= 2-Opt [20] % Excess 150 Mhz Secs* 3-Opt [20] % Excess 150 Mhz Secs* 103 104 105 106 4.9 5.0 4.9 4.9 0.32 3.8 56.7 928 3.1 3.0 3.0 3.0 3.8 4.6 66.1 1054 *Roughly half of time is spent generating neighbor lists and starting tour. Time on 3.06 Ghz Intel Core i3 processor at N = 106: 25.4 sec (2-opt), 29.5 sec (3-opt) Next Up • 4-Opt • Lin-Kernighan • and beyond….