# Childish Random TSP Approximation¶

Author: Vincent D. Warmerdam @ http://koaning.github.io

This notebook will explain a dead simple approach to TSP that approximates an optimal solution and runs in linear time.

### Background TSP¶

If you need a reminder of what a TSP is and why it is such an interesting problem in computer science I can recommend this this video.

## Packages¶

Always load the packages first. In this case, both the R-stuff (for plotting) and the python-stuff.

In [663]:
from ggplot import *
import pandas as pd
import numpy as np
%pylab inline

Populating the interactive namespace from numpy and matplotlib

WARNING: pylab import has clobbered these variables: ['xlim', 'sign', 'ylim']
%matplotlib prevents importing * from pylab and numpy

In [664]:
%%R
library(ggplot2)


## Example Data¶

I'll be using ggplot for the visualisation so most of my data will always need to be transported via dataframes.

In [665]:
x = np.random.uniform(0,1,200)
y = np.random.uniform(0,1,200)
df = pd.DataFrame({'x':x, 'y':y})
%Rpush df

In [666]:
%%R -w 1200 -o df
ggplot() + geom_point(aes(x,y), data=df, size=5) + ggtitle("TSP points")


Theoretically there are a lot of possible paths to choose from. All permutations of this problem leaves us with $99!$ options. First, let's just visualize a random permutation.

In [667]:
df = pd.DataFrame({'x':x, 'y':y})
perm = np.random.permutation(np.arange(0,df.shape[0],1))
df['order'] = perm
df_perm = df.sort('order')
%Rpush df_perm

In [668]:
%%R -w 1200 -o df_perm
p = ggplot() + geom_path(aes(x,y), data=df_perm, size=1)
p + geom_point(aes(x,y), data=df_perm, size=4) + ggtitle("TSP points")


Just by looking at this plot I get the idea that we might be able to make things better. Let's define a cost function so we can start to compare different permutations.

In [669]:
def distance(df):
dx = df['x'] - df['x'].shift()
dy = df['y'] - df['y'].shift()
return sum(sqrt(dx**2+dy**2))

rand_scores = []
for i in range(1000):
df['order'] = np.random.permutation(np.arange(0,df.shape[0],1))
df = df.sort('order')
rand_scores.append(distance(df))

dist_df = pd.DataFrame({'dist': np.array(rand_scores)})

%Rpush dist_df

In [670]:
%%R -w 1200 -o dist_df
p = ggplot() + geom_histogram(aes(dist), data=dist_df, binwidth=0.5)
p + ggtitle("Histogram of Random RSP Distances")


We can see that there is a bit of spread in just 1000 samples. This will help us to give a general impression on how well a tsp allocation performs.

## Block-Heuristic¶

Let's approximate blocks in the 2d space we've created and go through them.

Just as an approximation, we could visit all the points in one block before we move on to the next one.

It is a simple heuristic that immediately removes a lot of expensive city connections because it both locally and globally puts a limit on distances that are travelled. It is a very simple, childlike algorithm, but as we will see, it has some nice characteristics.

The pandas cut method helps out a lot with the implementation.

In [671]:
dim = 6
df_perm = df
df_perm['xcut'] = pd.cut(df_perm.x, dim, labels=range(dim))
df_perm['ycut'] = pd.cut(df_perm.y, dim, labels=range(dim))
df_perm = df_perm.sort(['xcut','ycut'])
df_perm.order = np.arange(0, df_perm.shape[0], 1)
print(distance(df_perm))
%Rpush df_perm

            x         y  order  xcut  ycut
6    0.166565  0.088334      0     0     0
159  0.038724  0.134862      1     0     0
173  0.081720  0.092598      2     0     0
117  0.048155  0.071911      3     0     0
13   0.130920  0.090656      4     0     0
43   0.104134  0.155605      5     0     0
112  0.100763  0.303163      6     0     1
103  0.078253  0.200736      7     0     1
63   0.005040  0.188670      8     0     1
50   0.057423  0.220988      9     0     1
131  0.160861  0.220250     10     0     1
133  0.010323  0.267152     11     0     1
32   0.140946  0.435007     12     0     2
55   0.106370  0.369894     13     0     2
71   0.023560  0.439148     14     0     2

[15 rows x 5 columns]
25.7295874131

In [672]:
%%R -w 1200 -o df_perm
p = ggplot() + geom_path(aes(x,y), data=df_perm, size=1)
p + geom_point(aes(x,y), data=df_perm, size=4) + ggtitle("TSP points")


Already, by just binning all the points together in order, we have gotten a result that is far better that something we got out of a random sample.

Notice that we have one variable that we could tune in this case, namely the dimensions by which we are cutting up our search space. Let's turn that into a function and check what the performance does.

In [673]:
def boxer1(dim):
df_perm = df
df_perm['xcut'] = pd.cut(df_perm.x, dim, labels=range(dim))
df_perm['ycut'] = pd.cut(df_perm.y, dim, labels=range(dim))
df_perm = df_perm.sort(['xcut','ycut'])
df_perm.order = np.arange(0, df_perm.shape[0], 1)

pltr = pd.DataFrame({'x':range(1,50),
'y':[distance(boxer1(i)) for i in range(1,50)]})

%Rpush pltr

In [674]:
%%R -w 1200 -o pltr
p = ggplot() + geom_path(aes(x,y), data=pltr, size=1)
p + ggtitle("TSP performance over box dimensions")


There seems to be a sweet spot at having a dimension around 8, after that the results seem to be getting worce. Let's compare a solution of dimension 8 and a solution of dimension 40.

In [675]:
pltr = boxer1(8)
%Rpush pltr

In [676]:
%%R -w 1200 -o pltr
p = ggplot() + geom_path(aes(x,y), data=pltr, size=1)
p + geom_point(aes(x,y), data=pltr, size=4) + ggtitle("TSP points")

In [677]:
pltr = boxer1(40)
%Rpush pltr

In [678]:
%%R -w 1200 -o pltr
p = ggplot() + geom_path(aes(x,y), data=pltr, size=1)
p + geom_point(aes(x,y), data=pltr, size=4) + ggtitle("TSP points")