Using genetic algorithms to solve the traveling salesperson problem

The notebook is better viewed rendered as slides. You can convert it to slides and view them by:

This and other related IPython notebooks can be found at the course github repository:

Traveling Salesperson Problem (TSP):

Given a set of cities, and the distances between each pair of cities, find a tour of the cities with the minimum total distance. A tour means you start at one city, visit every other city exactly once, and then return to the starting city.

  • This is a well-known intractable problem, meaning that there are no efficient solutions that work for a large number of cities.
  • We can create an inefficient algorithm that works fine for a small number of cites (about a dozen).
  • We can also find a nearly-shortest tour over thousands of cities.
  • Actually, the fact there is no efficient algorithm is liberating:

    This means that we can use a very simple, inefficient algorithm and not feel too bad about it.

The vocabulary of the problem:

  • City: For the purpose of this exercise, a city is "atomic" in the sense that we don't have to know anything about the components or attributes of a city, just how far it is from other cities.
  • Cities: We will need to represent a set of cities; Python's set datatype might be appropriate for that.
  • Distance: We will need the distance between two cities. If A and B are cities. This could be done with a function, distance(A, B), or with a dict, distance[A][B] or distance[A, B], or with an array if A and B are integer indexes. The resulting distance will be a real number (which Python calls a float).
  • Tour: A tour is an ordered list of cities; Python's list or tuple datatypes would work.
  • Total distance: The sum of the distances of adjacent cities in the tour. We will probably have a function, total_distance(tour).

We are doing this demonstration as an IPython notebook. Therefore, we need to perform some initialization.

In [1]:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx

import random, operator, time, itertools, math
import numpy

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rcParams['text.latex.preamble'] ='\\usepackage{libertine}\n\\usepackage[utf8]{inputenc}'

import seaborn
seaborn.set(style='whitegrid')
seaborn.set_context('notebook')

 First algorithm: find the tour with shortest total distance fro all possible tours

Generate all the possible tours of the cities, and choose the shortest one (the tour with the minimum total distance).

We can implement this as the Python function exact_TSP (TSP is the standard abbreviation for Traveling Salesperson Problem, and "exact" means that it finds the shortest tour, exactly, not just an approximation to the shortest tour). Here's the design philosophy we will use:

Write Python code that closely mirrors the English description of the algorithm. This will probably require some auxilliary functions and data structures; just assume we will be able to define them as well, using the same design philosophy.

In [2]:
def exact_TSP(cities):
    "Generate all possible tours of the cities and choose the shortest one."
    return shortest(alltours(cities))

def shortest(tours): 
    "Return the tour with the minimum total distance."
    return min(tours, key=total_distance)

Note 1: We have not yet defined the function total_distance, nor alltours.

Note 2: In Python min(collection,key=function) means to find the element x that is a member of collection such that function(x) is minimized. So shortest finds the tour whose total_distance in the minimal among the tours. So our Python code implements (and closely mimics) our English description of the algorithm. Now we need to define what a tour is, and how to measure total distance.

Representing Tours

  • A tour starts in one city, and then visits each of the other cities in order, before finally retirning to the start.
  • A natural representation of the set of available cities is a Python set, and a natural representation of a tour is a sequence that is a permutation of the set.
  • The tuple (1, 2, 3), for example, represents a tour that starts in city 1, moves to 2, then 3, and then returns to 1 to finish the tour.
In [3]:
alltours = itertools.permutations # The permutation function is already defined in the itertools module
cities = {1, 2, 3}
list(alltours(cities))
Out[3]:
[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]

Representing Cities and Distance

Now for the notion of distance. We define total_distance(tour) as the sum of the distances between consecutive cities in the tour; that part is shown below and is easy (with one Python-specific trick: when i is 0, then distance(tour[0], tour[-1]) gives us the wrap-around distance between the first and last cities, because tour[-1] is the last element of tour).

In [4]:
def total_distance(tour):
    "The total distance between each pair of consecutive cities in the tour."
    return sum(distance(tour[i], tour[i-1]) 
               for i in range(len(tour)))

Distance between cities

Before we can define distance(A, B), the distance between two cities, we have to make a choice. In the fully general version of the TSP problem, the distance between two cities could be anything: it could be the amount of time it takes to travel between cities, the number of dollars it costs, or anything else.

How will we represent a two-dimensional point? Here are some choices, with their pros and cons:

  • Tuple: A point (or city) is a two-tuple of (x, y) coordinates, for example, (300, 0).

    • Pro: Very simple, easy to break a point down into components. Reasonably efficient.
    • Con: doesn't distinguish points from other two-tuples. If p is a point, can't do p.x or p.y.
  • class: Define City as a custom class with x and y fields.

    • Pro: explicit, gives us p.x accessors.
    • Con: less efficient because of the overhead of creating user-defined objects.

Distance between cities (contd)

  • complex: Python already has the two-dimensional point as a built-in numeric data type, but in a non-obvious way: as complex numbers, which inhabit the two-dimensional (real × complex) plane. We can make this use more explicit by defining "City = complex", meaning that we can construct the representation of a city using the same constructor that makes complex numbers.

    • Pro: most efficient, because it uses a builtin type that is already a pair of numbers. The distance between two points is simple: the absolute value of their difference.
    • Con: it may seem confusing to bring complex numbers into play; can't say p.x.
  • subclass: Define "class Point(complex): pass", meaning that points are a subclass of complex numbers.

    • Pro: All the pros of using complex directly, with the added protection of making it more explicit that these are treated as points, not as complex numbers.
    • Con: less efficient than using complex directly; still can't do p.x or p.y.
  • subclass with properties: Define "class Point(complex): x, y = property(lambda p: p.real), property(lambda p: p.imag)".

    • Pro: All the pros of previous approach, and we can finally say p.x.
    • Con: less efficient than using complex directly.

From possible alternatives Peter chose to go with complex numbers:

In [5]:
City = complex # Constructor for new cities, e.g. City(300, 400)
In [6]:
def distance(A, B): 
    "The Euclidean distance between two cities."
    return abs(A - B)
In [7]:
A = City(300, 0)
B = City(0, 400)
distance(A, B)
Out[7]:
500.0
In [8]:
def generate_cities(n):
    "Make a set of n cities, each with random coordinates."
    return set(City(random.randrange(10, 890), 
                    random.randrange(10, 590)) 
               for c in range(n))
In [9]:
cities8, cities10, cities100, cities1000 = generate_cities(8), generate_cities(10), generate_cities(100), generate_cities(1000)
cities8
Out[9]:
{(214+482j),
 (211+326j),
 (573+536j),
 (758+252j),
 (618+407j),
 (354+187j),
 (169+174j),
 (861+341j)}

A cool thing is to be able to plot a tour

In [10]:
def plot_tour(tour, alpha=1, color=None):
    # Plot the tour as blue lines between blue circles, and the starting city as a red square.
    plotline(list(tour) + [tour[0]], alpha=alpha, color=color)
    plotline([tour[0]], style='gD', alpha=alpha, size=10)
    # plt.show()
    
def plotline(points, style='bo-', alpha=1, size=7, color=None):
    "Plot a list of points (complex numbers) in the 2-D plane."
    X, Y = XY(points)
    
    if color:
        plt.plot(X, Y, style, alpha=alpha, markersize=size, color=color)
    else:
        plt.plot(X, Y, style, alpha=alpha, markersize=size)
    
def XY(points):
    "Given a list of points, return two lists: X coordinates, and Y coordinates."
    return [p.real for p in points], [p.imag for p in points]

We are ready to test our algorithm

In [11]:
tour = exact_TSP(cities8)
In [12]:
plot_tour(tour)

Improving the algorithm: Try All Non-Redundant Tours

The permutation (1, 2, 3) represents the tour that goes from 1 to 2 to 3 and back to 1. You may have noticed that there aren't really six different tours of three cities: the cities 1, 2, and 3 form a triangle; any tour must connect the three points of the triangle; and there are really only two ways to do this: clockwise or counterclockwise. In general, with $n$ cities, there are $n!$ (that is, $n$ factorial) permutations, but only $(n-1)!$, tours that are distinct: the tours 123, 231, and 312 are three ways of representing the same tour.

So we can make our TSP program $n$ times faster by never considering redundant tours. Arbitrarily, we will say that all tours must start with the "first" city in the set of cities. We don't have to change the definition of TSP—just by making alltours return only nonredundant tours, the whole program gets faster.

(While we're at it, we'll make tours be represented as lists, rather than the tuples that are returned by permutations. It doesn't matter now, but later on we will want to represent partial tours, to which we will want to append cities one by one; that can only be done to lists, not tuples.)

In [13]:
def all_non_redundant_tours(cities):
    "Return a list of tours, each a permutation of cities, but each one starting with the same city."
    start = first(cities)
    return [[start] + list(tour)
            for tour in itertools.permutations(cities - {start})]

def first(collection):
    "Start iterating over collection, and return the first element."
    for x in collection: return x

def exact_non_redundant_TSP(cities):
    "Generate all possible tours of the cities and choose the shortest one."
    return shortest(all_non_redundant_tours(cities))
In [14]:
all_non_redundant_tours({1, 2, 3})
Out[14]:
[[1, 2, 3], [1, 3, 2]]

Results of the improvement

In [15]:
%timeit exact_TSP(cities8)
1 loop, best of 3: 233 ms per loop
In [16]:
%timeit exact_non_redundant_TSP(cities8)
10 loops, best of 3: 31.8 ms per loop
In [17]:
%timeit exact_non_redundant_TSP(cities10)
1 loop, best of 3: 3 s per loop

It takes a few seconds on my machine to solve this problem. In general, the function exact_non_redundant_TSP() looks at $(n-1)!$ tours for an $n$-city problem, and each tour has $n$ cities, so the time for $n$ cities should be roughly proportional to $n!$. This means that the time grows rapidly with the number of cities; we'd need longer than the age of the Universe to run exact_non_redundant_TSP() on just 24 cities:

n citiestime
103 secs
123 secs × 12 × 11 = 6.6 mins
146.6 mins × 13 × 14 = 20 hours
243 secs × 24! / 10! = 16 billion years

There must be a better way... or at least we need to look for it until quantum computing comes around.

Approximate (Heuristic) Algorithms

  • The general, exact Traveling Salesperson Problem is intractable;
  • there is no efficient algorithm to find the tour with minimum total distance.
  • But if we restrict ourselves to Euclidean distance and if we are willing to settle for a tour that is reasonably short but not the shortest, then the news is much better.

We will consider several approximate algorithms, which find tours that are usually within 10 or 20% of the shortest possible and can handle thousands of cities in a few seconds.

Greedy Nearest Neighbor (greedy_TSP)

Here is our first approximate algorithm:

Start at any city; at each step extend the tour by moving from the previous city to its nearest neighbor that has not yet been visited.

This is called a greedy algorithm, because it greedily takes what looks best in the short term (the nearest neighbor) even when that won't always be the best in the long term.

To implement the algorithm I need to represent all the noun phrases in the English description:

  • start: a city which is arbitrarily the first city;
  • the tour: a list of cities, initialy just the start city);
  • previous city: the last element of tour, that is, tour[-1]);
  • nearest neighbor: a function that, when given a city, A, and a list of other cities, finds the one with minimal distance from A); and
  • not yet visited: we will keep a set of unvisited cities; initially all cities but the start city are unvisited).

Once these are initialized, we repeatedly find the nearest unvisited neighbor, C, and add it to the tour and remove it from unvisited.

In [18]:
def greedy_TSP(cities):
    "At each step, visit the nearest neighbor that is still unvisited."
    start = first(cities)
    tour = [start]
    unvisited = cities - {start}
    while unvisited:
        C = nearest_neighbor(tour[-1], unvisited)
        tour.append(C)
        unvisited.remove(C)
    return tour
In [19]:
def nearest_neighbor(A, cities):
    "Find the city in cities that is nearest to city A."
    return min(cities, key=lambda x: distance(x, A))

(In Python, as in the formal mathematical theory of computability, lambda is the symbol for function, so "lambda x: distance(x, A)" means the function of x that computes the distance from x to the city A. The name lambda comes from the Greek letter λ.)

We can compare the fast approximate greedy_TSP algorithm to the slow exact_TSP algorithm on a small map, as shown below. (If you have this page in a IPython notebook you can repeatedly run the cell, and see how the algorithms compare. Cities(9) will return a different set of cities each time. I ran it 20 times, and only once did the greedy algorithm find the optimal solution, but half the time it was within 10% of optimal, and it was never more than 25% worse than optimal.)

In [20]:
cities = generate_cities(9)
In [21]:
%timeit exact_non_redundant_TSP(cities)
1 loop, best of 3: 386 ms per loop
In [22]:
plot_tour(exact_non_redundant_TSP(cities))
In [23]:
%timeit greedy_TSP(cities)
10000 loops, best of 3: 39.4 ┬Ás per loop
In [24]:
plot_tour(greedy_TSP(cities))

greedy_TSP() can handle bigger problems

In [25]:
%timeit greedy_TSP(cities100)
100 loops, best of 3: 2.51 ms per loop
In [26]:
plot_tour(greedy_TSP(cities100))
In [27]:
%timeit greedy_TSP(cities1000)
1 loop, best of 3: 210 ms per loop
In [28]:
plot_tour(greedy_TSP(cities1000))

But... don't be greedy!

A greedy algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum. In many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time.

For many problmes greedy algorithms fail to produce the optimal solution, and may even produce the unique worst possible solution. One example is the traveling salesman problem mentioned above: for each number of cities, there is an assignment of distances between the cities for which the nearest neighbor heuristic produces the unique worst possible tour.

A thought on computational complexity

from XKCD

Check out Peter Norvig's IPython notebook on the traveling salesperson problem on more alternatives for the TSP.

Nature-inspired metaheuristics

  • We have seen in class some examples of nature-inspired metaheuristics.
  • They are an option in which we dedicate a little more computational effort in order to produce better solutions than greedy_TSP().

We will be using the DEAP library to code this tackle this problem using a genetic algorithm.

In [29]:
from deap import algorithms, base, creator, tools

Elements to take into account solving problems with genetic algorithms

  • Individual representation (binary, floating-point, etc.);
  • evaluation and fitness assignment;
  • selection, that establishes a partial order of individuals in the population using their fitness function value as reference and determines the degree at which individuals in the population will take part in the generation of new (offspring) individuals.
  • variation, that applies a range of evolution-inspired operators, like crossover, mutation, etc., to synthesize offspring individuals from the current (parent) population. This process is supposed to prime the fittest individuals so they play a bigger role in the generation of the offspring.
  • stopping criterion, that determines when the algorithm shoulod be stopped, either because the optimum was reach or because the optimization process is not progressing.

Hence a 'general' evolutionary algorithm can be described as

def evolutionary_algorithm():
    'Pseudocode of an evolutionary algorithm'    
    populations = [] # a list with all the populations

    populations[0] =  initialize_population(pop_size)
    t = 0

    while not stop_criterion(populations[t]):
        fitnesses = evaluate(populations[t])
        offspring = matting_and_variation(populations[t],
                                          fitnesses)
        populations[t+1] = environmental_selection(           
                                          populations[t],
                                          offspring)
        t = t+1

Some preliminaries for the experiment

We will carry out our tests with a 30-cities problem.

In [30]:
num_cities = 30
cities = generate_cities(num_cities)

The toolbox stored the setup of the algorithm. It describes the different elements to take into account.

In [31]:
toolbox = base.Toolbox()

Individual representation and evaluation

  • Individuals represent possible solutions to the problem.
  • In the TSP case, it looks like the tour itself can be a suitable representation.
  • For simplicity, an individual can be a list with the indexes corresponding to each city.
  • This will simplify the crossover and mutation operators.
  • We can rely on the total_distance() function for evaluation and set the fitness assignment as to minimize it.
In [32]:
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)

Let's now define that our individuals are composed by indexes that referr to elements of cities and, correspondingly, the population is composed by individuals.

In [33]:
toolbox.register("indices", numpy.random.permutation, len(cities))
toolbox.register("individual", tools.initIterate, creator.Individual,
                 toolbox.indices)
toolbox.register("population", tools.initRepeat, list, 
                 toolbox.individual)

Defining the crossover and mutation operators can be a challenging task.

There are various crossover operators that have been devised to deal with ordered individuals like ours.

  • We will be using DEAP's deap.tools.cxOrdered() crossover.
  • For mutation we will swap elements from two points of the individual.
  • This is performed by deap.tools.mutShuffleIndexes().
In [34]:
toolbox.register("mate", tools.cxOrdered)
toolbox.register("mutate", tools.mutShuffleIndexes, indpb=0.05)

Evaluation can be easily defined from the total_distance() definition.

In [35]:
def create_tour(individual):
    return [list(cities)[e] for e in individual]
In [36]:
def evaluation(individual):
    '''Evaluates an individual by converting it into 
    a list of cities and passing that list to total_distance'''
    return (total_distance(create_tour(individual)),)
In [37]:
toolbox.register("evaluate", evaluation)

We will employ tournament selection with size 3.

In [38]:
toolbox.register("select", tools.selTournament, tournsize=3)

Lets' run the algorithm with a population of 100 individuals and 400 generations.

In [39]:
pop = toolbox.population(n=100)
In [40]:
%%time 
result, log = algorithms.eaSimple(pop, toolbox,
                             cxpb=0.8, mutpb=0.2,
                             ngen=400, verbose=False)
CPU times: user 12 s, sys: 225 ms, total: 12.2 s
Wall time: 12.7 s

We can now review the results

The best individual of the last population:

In [41]:
best_individual = tools.selBest(result, k=1)[0]
print('Fitness of the best individual: ', evaluation(best_individual)[0])
Fitness of the best individual:  3224.2906445915123
In [42]:
plot_tour(create_tour(best_individual))

It is interesting to assess how the fitness of the population changed as the evolution process took place.

We can prepare an deap.tools.Statistics instance to specify what data to collect.

In [43]:
fit_stats = tools.Statistics(key=operator.attrgetter("fitness.values"))
fit_stats.register('mean', numpy.mean)
fit_stats.register('min', numpy.min)

We are all set now but lets run again the genetic algorithm configured to collect the statistics that we want to gather:

In [44]:
result, log = algorithms.eaSimple(toolbox.population(n=100), toolbox,
                                  cxpb=0.5, mutpb=0.2,
                                  ngen=400, verbose=False,
                                  stats=fit_stats)

Plotting mean and minimium fitness as evolution took place.

In [45]:
plt.figure(figsize=(11, 4))
plots = plt.plot(log.select('min'),'c-', log.select('mean'), 'b-')
plt.legend(plots, ('Minimum fitness', 'Mean fitness'), frameon=True)
plt.ylabel('Fitness'); plt.xlabel('Iterations');

How has the population evolved?

Ok, but how the population evolved? As TSP solutions are easy to visualize, we can plot the individuals of each population the evolution progressed. We need a new Statistics instance prepared for that.

In [46]:
pop_stats = tools.Statistics(key=numpy.copy)
pop_stats.register('pop', numpy.copy) # -- copies the populations themselves
pop_stats.register('fitness', # -- computes and stores the fitnesses
                   lambda x : [evaluation(a) for a in x]) 

Note: I am aware that this could be done in a more efficient way.

In [47]:
result, log = algorithms.eaSimple(toolbox.population(n=100), toolbox,
                                  cxpb=0.5, mutpb=0.2,
                                  ngen=400, verbose=False,
                                  stats=pop_stats)

Plotting the individuals and their fitness (color-coded)

In [48]:
def plot_population(record, min_fitness, max_fitness):
    '''
    Plots all individuals in a population. 
    Darker individuals have a better fitness.
    '''
    pop = record['pop']
    fits = record['fitness']
    index = sorted(range(len(fits)), key=lambda k: fits[k])

    norm=colors.Normalize(vmin=min_fitness,
                          vmax=max_fitness)
    sm = cmx.ScalarMappable(norm=norm, 
                            cmap=plt.get_cmap('PuBu'))
    
    for i in range(len(index)):
        color = sm.to_rgba(max_fitness - fits[index[i]][0])
        plot_tour(create_tour(pop[index[i]]), alpha=0.5, color=color)
In [49]:
min_fitness = numpy.min(log.select('fitness'))
max_fitness = numpy.max(log.select('fitness'))

We can now plot the population as the evolutionary process progressed. Darker blue colors imply better fitness.

In [50]:
plt.figure(figsize=(11,11))
for i in range(0, 12):
    plt.subplot(4,3,i+1)
    it = int(math.ceil((len(log)-1.)/15))
    plt.title('t=' + str(it*i))
    plot_population(log[it*i], min_fitness, max_fitness)
plt.tight_layout()