#!/usr/bin/env python # coding: utf-8 # # The Traveling Salesperson Problem # Consider the [*Traveling Salesperson Problem*](http://en.wikipedia.org/wiki/Traveling_salesman_problem): # # > *Given a set of cities and the distances between each pair of cities, what is the shortest possible tour that visits each city exactly once, and returns to the starting city?* # # In this notebook we will develop some solutions to the problem, and more generally show *how to think about* solving a problem like this. [Elsewhere](https://research.googleblog.com/2016/09/the-280-year-old-algorithm-inside.html) you can read about how the algorithms developed here are used in serious applications that millions of people rely on every day. # # #
An example tour.
n expected time for `alltours_tsp(Cities(n))` #
10Covering 10! tours = 2 secs #
112 secs × 11! / 10! ≈ 22 secs #
122 secs × 12! / 10! ≈ 4 mins #
142 secs × 14! / 10! ≈ 13 hours #
162 secs × 16! / 10! ≈ 200 days #
182 secs × 18! / 10! ≈ 112 years #
252 secs × 25! / 10! ≈ 270 billion years #
```# [TCL]  33.23   87.62  Tuscaloosa,AL
# [FLG]  35.13  111.67  Flagstaff,AZ
# [PHX]  33.43  112.02  Phoenix,AZ
# ```
# # I also found a [blog post](http://www.randalolson.com/2015/03/08/computing-the-optimal-road-trip-across-the-u-s/) by Randal S. Olson who chose 50 landmarks across the states and found a tour based on actual road-travel distances, not straight-line distance. His data looks like this: # #
```# Mount Rushmore National Memorial, South Dakota 244, Keystone, SD	43.879102	-103.459067
# Toltec Mounds, Scott, AR	34.647037	-92.065143
# Ashfall Fossil Bed, Royal, NE	42.425000	-98.158611
# ```
Shortest EdgeUsage of edgeResulting Segments #
A; B; C; D; E; F; G #
CGJoin C to GA; B; CG; D; E; F #
DEJoin D to EA; B; CG; DE; F #
ACJoin A to CGB; ACG; DE; F #
GDJoin ACG to DB; ACGDE; F #
BFJoin B to FBF; ACGDE #
EFJoin ACGDE to FBACGDEFB #
# # Here is the code: # # In[79]: def greedy_tsp(cities): """Go through edges, shortest first. Use edge to join segments if possible.""" endpoints = {c: [c] for c in cities} # A dict of {endpoint: segment} for (A, B) in shortest_edges_first(cities): if A in endpoints and B in endpoints and endpoints[A] != endpoints[B]: new_segment = join_endpoints(endpoints, A, B) if len(new_segment) == len(cities): return new_segment # TO DO: functions: shortest_edges_first, join_endpoints # # **Note:** The `endpoints` dict is serving two purposes. First, the keys of the dict are all the cities that are endpoints of some segments, # making it possible to ask "`A in endpoints`" to see if city `A` is an endpoint. Second, the values of the dict are all the segments, making it possible to ask "`endpoints[A] != endpoints[B]`" to make sure that the two cities are endpoints of different segments, not of the same segment. # # The `shortest_edges_first` function is easy: generate all `(A, B)` pairs of cities, and sort by the distance between the cities. (Note: I use the conditional `if id(A) < id(B)` so that I won't have both `(A, B)` and `(B, A)` in my list of edges, and I won't ever have `(A, A)`.) # In[80]: def shortest_edges_first(cities): "Return all edges between distinct cities, sorted shortest first." edges = [(A, B) for A in cities for B in cities if id(A) < id(B)] return sorted(edges, key=lambda edge: distance(*edge)) # For the `join_endpoints` function, I first make sure that A is the last element of one segment and B is the first element of the other, by reversing segments if necessary. Then I add the B segment on to the end of the A segment. Finally, I update the `endpoints` dict. This is a bit tricky! My first thought was that A and B are no longer endpoints, because they have been joined together in the interior of the segment. However, that isn't always true. If A was the endpoint of a 1-city segment, then when you join it to B, A is still an endpoint. I could have had complicated logic to handle the case when A, B, or both, or neither were 1-city segments, but I decided on a different tactic: first unconditionally delete A and B from the endpoints dict, no matter what. Then add the two endpoints of the new segment (which is `Asegment`) to the endpoints dict. # In[81]: def join_endpoints(endpoints, A, B): "Join B's segment onto the end of A's and return the segment. Maintain endpoints dict." Asegment, Bsegment = endpoints[A], endpoints[B] if Asegment[-1] is not A: Asegment.reverse() if Bsegment[0] is not B: Bsegment.reverse() Asegment.extend(Bsegment) del endpoints[A], endpoints[B] # A and B are no longer endpoints endpoints[Asegment[0]] = endpoints[Asegment[-1]] = Asegment return Asegment # Let's try out the `greedy_tsp` algorithm on the two USA maps: # In[82]: plot_tsp(greedy_tsp, USA_map) # In[83]: plot_tsp(greedy_tsp, USA_big_map) # The greedy algorithm is worse than nearest neighbors, but it is fast (especially on the big map). Let's see if the *alteration* strategy can help: # In[84]: def altered_greedy_tsp(cities): "Run greedy TSP algorithm, and alter the results by reversing segments." return alter_tour(greedy_tsp(cities)) # In[85]: plot_tsp(altered_greedy_tsp, USA_map) # In[86]: plot_tsp(altered_greedy_tsp, USA_big_map) # That's the best result yet on the big map. Let's look at some benchmarks: # In[87]: algorithms = [altered_nn_tsp, altered_greedy_tsp, repeated_altered_nn_tsp] benchmarks(algorithms) print('-' * 100) benchmarks(algorithms, Maps(30, 120)) # So overall, the altered greedy algorithm looks slightly better than the altered nearest neighbor algorithm and runs in about the same time. However, the repeated altered nearest neighbor algorithm does best of all (although it takes much longer). # # What about a repeated altered greedy algorithm? That might be a good idea, but there is no obvious way to do it. We can't just start from a sample of cities, because the greedy algorithm doesn't have a notion of starting city. # # Visualizing the Greedy Algorithm # --- # # I would like to see how the process of joining segments unfolds. Although I dislike copy-and-paste (because it violates the *[Don't Repeat Yourself](http://en.wikipedia.org/wiki/Don%27t_repeat_yourself)* principle), I'll copy `greedy_tsp` and make a new version called `visualize_greedy_tsp` which adds one line to plot the segments several times as the algorithm is running: # In[88]: def visualize_greedy_tsp(cities, plot_sizes): """Go through edges, shortest first. Use edge to join segments if possible. Plot segments at specified sizes.""" edges = shortest_edges_first(cities) # A list of (A, B) pairs endpoints = {c: [c] for c in cities} # A dict of {endpoint: segment} for (A, B) in edges: if A in endpoints and B in endpoints and endpoints[A] != endpoints[B]: new_segment = join_endpoints(endpoints, A, B) plot_segments(endpoints, plot_sizes, distance(A, B)) # <<<< NEW if len(new_segment) == len(cities): return new_segment def plot_segments(endpoints, plot_sizes, dist): "If the number of distinct segments is one of plot_sizes, then plot segments." segments = set(map(tuple, endpoints.values())) if len(segments) in plot_sizes: for s in segments: plot_lines(s) plt.show() print('{} segments, longest edge = {:.0f}'.format( len(segments), dist)) # In[89]: visualize_greedy_tsp(USA_map, (50, 25, 10, 5, 2, 1)); # # Divide and Conquer Strategy # # The next general strategy to consider is *divide and conquer*. Suppose we have an algorithm, like `alltours_tsp`, that is inefficient for large *n* (the `alltours_tsp` algorithm is O(*n!*) for *n* cities). So we can't apply `alltours_tsp` directly to a large set of cities. But we can divide the problem into smaller pieces, and then combine those pieces: # # 1. Split the set of cities in half. # 2. Find a tour for each half. # 3. Join those two tours into one. # # The trick is that when *n* is small, then step 2 can be done directly. But when *n* is large, step 2 is done with a recursive call, breaking the half into two smaller halves. # # Let's work out by hand an example with a small map of just six cities. Here are the cities: # In[90]: cities = Cities(6) plot_labeled_lines(cities) # Step 1 is to divide this set in half. I'll divide it into a left half (blue circles) and a right half (black squares): # In[91]: plot_labeled_lines(list(cities), 'bo', [3, 4, 0], 'ks', [1, 2, 5]) # Step 2 is to find a tour for each half: # In[92]: plot_labeled_lines(list(cities), 'bo-', [0, 3, 4, 0], 'ks-', [1, 2, 5, 1]) # Step 3 is to combine the two halves. We do that by choosing an edge from each half to delete (the edges marked by red dashes) and replacing those two edges by two edges that connect the halves (the blue dash-dot edges). Note that there are two choices of ways to connect the new dash-dot lines. Pick the one that yields the shortest tour. # In[93]: # One way to connect the two segments, giving the tour [1, 3, 4, 0, 2, 5] plot_labeled_lines(list(cities), 'bo-', [0, 3, 4], 'ks-', [1, 2, 5], 'b-.', [0, 1], [4, 5], 'r--', [0, 4], [1, 5]) # Now we have a feel for what we have to do. Let's define the divide and conquer algorithm, which we will call `dq_tsp`. Like all `tsp` algorithms it gets a set of cities as input and returns a tour. If the size of the set of cities is 3 or less, then just listing the cities in any order produces an optimal tour. If there are more than 3 cities, then split the cities in half (using `split_cities`), find a tour for each half (using `dq_tsp` recursively), and join the two tours together (using `join_tours`): # In[94]: def dq_tsp(cities): """Find a tour by divide and conquer: if number of cities is 3 or less, any tour is optimal. Otherwise, split the cities in half, solve each half recursively, then join those two tours together.""" if len(cities) <= 3: return Tour(cities) else: Cs1, Cs2 = split_cities(cities) return join_tours(dq_tsp(Cs1), dq_tsp(Cs2)) # TO DO: functions: split_cities, join_tours # Let's verify that `dq_tsp` works for three cities: # In[95]: plot_tsp(dq_tsp, Cities(3)) # If we have more than 3 cities, how do we split them? My approach is to imagine drawing an axis-aligned rectangle that is just big enough to contain all the cities. If the rectangle is wider than it is tall, then order all the cities by *x* coordiante and split that ordered list in half. If the rectangle is taller than it is wide, order and split the cities by *y* coordinate. # In[96]: def split_cities(cities): "Split cities vertically if map is wider; horizontally if map is taller." width, height = extent([c.x for c in cities]), extent([c.y for c in cities]) key = 'x' if (width > height) else 'y' cities = sorted(cities, key=lambda c: getattr(c, key)) mid = len(cities) // 2 return frozenset(cities[:mid]), frozenset(cities[mid:]) def extent(numbers): return max(numbers) - min(numbers) # Let's show that split_cities is working: # In[97]: Cs1, Cs2 = split_cities(cities) plot_tour(dq_tsp(Cs1)) plot_tour(dq_tsp(Cs2)) # Now for the tricky part: joining two tours together. First we consider all ways of deleting one edge from each of the two tours. If we delete a edge from a tour we get a segment. We are representing segments as lists of cities, the same surface representation as tours. But there is a difference in their interpretation. The tour `[0, 2, 5]` is a triangle of three edges, but the segment `[0, 2, 5]` consists of only two edges, from `0` to `2` and from `2` to `5`. The segments that result from deleting an edge from the tour `[0, 2, 5]` are: # #
```# [0, 2, 5],    [2, 5, 0],    [5, 0, 2]
# ```
# # You may recognize these as the *rotations* of the segment `[0, 2, 5]`. So any candidate combined tour consists of taking a rotation of the first tour, and appending to it a rotation of the second tour, with one caveat: when we go to append the two segments, there are two ways of doing it: either keep the second segment as is, or reverse the second segment. # # **Note:** In Python, `sequence[::-1]` means to reverse the sequence. # In[98]: def join_tours(tour1, tour2): "Consider all ways of joining the two tours together, and pick the shortest." segments1, segments2 = rotations(tour1), rotations(tour2) tours = [s1 + s2 for s1 in segments1 for s in segments2 for s2 in (s, s[::-1])] return shortest_tour(tours) def rotations(sequence): "All possible rotations of a sequence." # A rotation is some suffix of the sequence followed by the rest of the sequence. return [sequence[i:] + sequence[:i] for i in range(len(sequence))] # Let's see if it works, first on the 6 city example: # In[99]: plot_tsp(dq_tsp, Cities(6)) # That is indeed the optimal tour, achieved by deleting the two dashed red lines and adding the dotted blue lines. # # Now for the USA map: # In[100]: plot_tsp(dq_tsp, USA_map) # Not quite as good as `altered_greedy_tsp`. Let's alter it! # In[101]: def altered_dq_tsp(cities): return alter_tour(dq_tsp(cities)) # In[102]: plot_tsp(altered_dq_tsp, USA_map) # Let's just remind ourselves how the algorithms behave on the standard test cases: # In[103]: algorithms = [nn_tsp, greedy_tsp, dq_tsp, altered_dq_tsp, altered_nn_tsp, altered_greedy_tsp, repeated_altered_nn_tsp] benchmarks(algorithms) # Of the non-altered algorithms (the first three lines), divide and conquer (`dq_tsp`) does best. But interestingly, divide and conquer is helped less by `alter_tour` than is the greedy algorithm or nearest neighbor algorithm. Perhaps it is because divide and conquer constructs its tour by putting together pieces that are already good, so `alter_tour` is less able to improve it. ALso, `dq_tsp` has a standard deviation that is much smaller than the other two—this suggests that `dq_tsp` is not producing really bad tours that can be easily improved by `alter_tour`. In any event, `altered_dq_tsp` is the worst of the `altered` algorithms, both in average tour length and in run time. # # `repeated_altered_nn_tsp` remains the best in tour length, although the worst in run time. # # Shoulders of Giants: Minimum Spanning Tree Algorithm: `mst_tsp` # # # # #
Joseph Kruskal (Wikipedia)
Held, Shareshian, Karp (Computer History Museum)
# # # #
xkcd 399
# # Another algorithm that shows up with a literature search is the [Held-Karp Dynamic Programming Algorithm](http://en.wikipedia.org/wiki/Held%E2%80%93Karp_algorithm), named after giants [Michael Held](http://www.computerhistory.org/collections/catalog/102650390) and [Richard Karp](http://en.wikipedia.org/wiki/Richard_M._Karp). It is an algorithm for finding optimal tours, not approximate ones, so it is not appropriate for large *n*. But even in its simplest form, without any programming tricks, it can go quite a bit further than `alltours_tsp`. That is because `alltours_tsp` is O(*n*!), while the Held-Karp algorithm is only O(*n*2 2*n*). How did Held and Karp achieve this speedup? They noticed that `alltours_tsp` wastes a lot of time with permutations that can't possibly be optimal tours. Consider the following 10-city problem, with a 6-city segment shown: # In[117]: plot_labeled_lines(cross, 'r-', [0, 4, 1, 3, 2, 9]) # The `alltours_tsp` would consider 4! = 24 different tours that start with those 6 cities. But that seems wasteful: there is no way that this segment could be part of an optimal tour, so why waste time on *any* continuation of it? The proof that this segment can never be part of an optimal tour comes down to two things. First, we demonstrate another tour that also starts in city 0 and ends in city 9, and along the way goes through cities 1 through 4, and is shorter: # In[118]: plot_labeled_lines(cross, [0, 1, 2, 3, 4, 9]) # Second, we need this key property: # # >*Given a start city A, an end city C, and a set of middle cities Bs, then out of all the possible segments that start in A, end in C, and go through all and only the cities in Bs, only the shortest of those segments could ever be part of an optimal tour. # # Of course, we don't know that the optimal tour goes through exactly those Bs cities before hitting C. But if it does, then we need only consider the permutation of Bs that leads to the shortest segment. So we can throw out the red zig-zag segment above, and keep the nice smooth blue segment. # # So far we have only been talking about segments. We know that the TSP is defined for tours, not segments. So even if we find the shortest possible segment, it might not be the shortest possible tour. But here's something we do know: a tour has to end somewhere. So just find the shortest segment from the start city, `A`, to every possible end city, `C`. That will give you *n*-2 segments. Out of those, don't choose the shortest *segment*, but rather choose the shortest *tour*. # # That gives us our algorithm: # In[119]: def hk_tsp(cities): """The H eld-Karpshortest tour of this set of cities. For each end city C, find the shortest segment from A (the start) to C. Out of all these shortest segments, pick the one that is the shortest tour.""" A = first(cities) return shortest_tour(shortest_segment(A, cities - {A, C}, C) for C in cities if C is not A) # TO DO: function: shortest_segment(A, Bs, C) # Now for `shortest_segment(A, Bs, C)`. It is defined to produce the shortest segment that starts in city `A`, ends in `C`, and visits some permutation of `Bs` cities in the middle. If there are no `Bs` cities, then of course the shortest segment is to go directly from `A` to `C`. If there are `Bs` cities, then one of them has to be the last `B` city visited (just before visiting `C`). So for each `B`, find the shortest segment that first goes from `A`, through all the other `Bs` cities, then to `B`, and finally to `C`. Out of all these candidate segments, return the one with the minimum segment length. # # **Note:** the decorator `@functools.lru_cache` makes this a **dynamic programming** algorithm, which is a fancy name meaning that we cache the results of sub-computations because we will re-use them multiple times. # In[120]: @functools.lru_cache(None) def shortest_segment(A, Bs, C): "The shortest segment starting at A, going through all Bs, and ending at C." if not Bs: return [A, C] else: segments = [shortest_segment(A, Bs - {B}, B) + [C] for B in Bs] return min(segments, key=segment_length) def segment_length(segment): "The total of distances between each pair of consecutive cities in the segment." # Same as tour_length, but without distance(tour[0], tour[-1]) return sum(distance(segment[i], segment[i-1]) for i in range(1, len(segment))) # That's all there is to it. Let's compare `alltours_tsp` with `hk_tsp` on 10 city tours: # In[121]: plot_tsp(alltours_tsp, Cities(10)) # In[122]: plot_tsp(hk_tsp, Cities(10)) # We see that `hk_tsp` returns the optimal tour, and it is a lot faster. We can take `hk_tsp` into uncharted territory well beyond the reach of `alltours_tsp`: # In[123]: plot_tsp(hk_tsp, Cities(14)) # In[124]: plot_tsp(hk_tsp, Cities(16)) # Not bad! In 11 seconds, we did what `alltours_tsp` would have taken an estimated 200 days to complete! Let's repeat the table of expected times, comparing the All Tours algorithm with the Held-Karp algorithm: # # #
n `alltours_tsp(Cities(n))``hk_tsp(Cities(n))` #
expected time ≈ O(n!)expected time ≈ O(n2 2n) #
10 10! tours = 2 secs 0.1 secs #
112 secs × 11! / 10! ≈ 22 secs 0.2 secs #
122 secs × 12! / 10! ≈ 4 mins 0.4 secs #
142 secs × 14! / 10! ≈ 13 hours 3 secs #
162 secs × 16! / 10! ≈ 200 days 162 216 tours = 11 secs #
182 secs × 18! / 10! ≈ 112 years # 11 secs × (18/16)2 2(18-16) ≈ 1 min #
252 secs × 25! / 10! ≈ 270 billion years # 11 secs × (25/16)2 2(25-16) ≈ 4 hours #
502 secs × 50! / 10! ≈ 5 × 1050 years11 secs × (50/16)2 2(50-16) ≈ 58,000 years #
# # So if we had some patience, we could find the optimal tour for a 25 city map, but we still can't handle the 50-city landmarks map. # (There are refinements to Held-Karp that can handle 50-city maps, and could do it even with 1960s-era computing power.) # # We're starting to run out of tricks, but we have one more general strategy to consider. # # Ensembles of Other Algorithms: `ensemble_tsp` # When we have several optimization algorithms and we're not sure which is best, we can always try them all and take the best result. We will define `ensemble_tsp`, to combine the algorithms we have previously developed. First, if the set of input cities is small enough, it solves the problem optimally with `hk_tsp`. If the set is too large, it tries a selection of algorithms, and chooses the best resulting tour. The result is guaranteed to be as good or better than any of the component algorithms; but the run time is guaranteed to be longer than any of the component algorithms. # In[125]: ensemble = [altered_dq_tsp, altered_greedy_tsp, altered_mst_tsp, repeated_altered_nn_tsp] def ensemble_tsp(cities, threshold=16, algorithms=ensemble): "Apply all algorithms to cities and take the shortest resulting tour." if len(cities) <= threshold: return hk_tsp(cities) else: return shortest_tour(tsp(cities) for tsp in algorithms) # Let's go to the benchmarks: # In[126]: benchmarks(ensemble + [ensemble_tsp]) # In[127]: benchmarks(ensemble + [ensemble_tsp], Maps(30, 120)) # In[128]: benchmarks(ensemble + [ensemble_tsp], Maps(10, 250)) # So the `ensemble_tsp` returns tours that are shortest, but the run time is slowest, as expected. It improves on `repeated_altered_nn_tsp` by less than 1%. # # Further Explorations # # # That's all I'm going to write for now. But there are still plenty of open questions for you to explore: # # * **Branch and Cut**: this is a technique to cut off a search early, when a partial solution is obviously not optimal. We saw how Held-Karp cuts off some permutations of cities when another permutation is better. A refinement on that is to keep track of, say, the best total length of the segment going through all the Bs cities. Then, any time you have a partial segment through some of the Bs cities that exceeds the best total, we can stop right there, before even finishing all the Bs. With this technique, you can find optimal tours for around 50 cities. # * **Linear programming**: Lookup the topic "linear programming" and see how it applies to TSP. # * **Heuristic Algorithms**: There are many approaches for using heurisitic estimates to find good (but not optimal) tours. For example, *ant colony optimization algorithms* make random choices of which edge to follow, and then the edges that occur in the best tours get reinforced with some virtual pheromones, and other ants tend to follow those pheromones. *Simulated annealing* takes its inspiration from metallurgy. # * The **[Lin-Kernighan heuristic](http://akira.ruc.dk/~keld/research/LKH/LKH-1.3/DOC/LKH_REPORT.pdf)** is one of the best. # * The **[Christofides algorithm](https://en.wikipedia.org/wiki/Christofides_algorithm)** gives a guarantee of 3/2 the optimal tour length (improving on the minimum-spanning-tree guarantee). # * `altered` as a function: we defined a lot of one-line functions that just called another algorithm, and then calls `alter_tour` on the result. Can you write a function, `altered(func)`, which takes a TSP algorithm as argument, and returns a TSP algorithm that does the original algorithm and then calls `alter_tour`? # * Why does `mst` produce an optimal result, while `greedy_tsp` does not, even though the two algorithms have similar structure in the way they iterate over `shortest_edges_first`? # * The code in this notebook was designed for clarity, not efficiency. Can you make the code faster? # * [William Cook](http://www.math.uwaterloo.ca/tsp/) maintains a great page on the TSP. # * William Cook also has a [draft chapter](http://www.math.uwaterloo.ca/~bico/papers/comp_chapter1.pdf) on Discrete Optimization featuring TSP. His algorithms are in C, and he chooses a different set of algorithms to explore. I find his explanation very beautiful and concise. # * If you are heavily into math, there's a [taxonomy](http://cstheory.stackexchange.com/questions/9241/approximation-algorithms-for-metric-tsp) of solutions. # * What else are you interested in?