This is a follow up on Randy Olson work on solving TSPs in Python: Computing the optimal road trip across the U.S.. The code below is commented at length in my blog entry on Computing The Really Optimal Tour Across The USA On The Cloud With Python
Some of the code is a derivative work of Randy Olson's code, namely the list of points to visits, and the html template used for rendering the tour on Google Map.
Can we do better than Randy, i.e. can we compute the shortest tour visiting each one of the 50 landmarks he has selected?
The answer is yes. I will use these tools to compute it.
No need to present this, is it? We will use it for gathering data and rendering the result on a map.
Concorde is the best known algorithm to solve TSPs, see my post for more background on it.
We use CPLEX indirectly as Concorde is built on top of it. In all fairness, Concorde can also be used with an alternative solver called QSopt. Using CPLEX is faster, but the difference becomes noticeable only for very large TSPs. For a 50 city TSP lke Randy Olson's tour, using CPLEX or QSopt is similar. Let's admit that I have a strong bias towards CPLEX in general!
This is what makes Concorde and CPLEX easy to consume in Python. NEOS is a server delivering Software as a Service (SaaS). Actually, we could say that NEOS delivers Solvers as a Service because it provides quite a few optimization solvers. In particular, Concorde on NEOS is available at: http://neos.mcs.anl.gov/neos/solvers/co:concorde/TSP.html
Enough discussion, let's start.
Let's first import packages we'll use in this notebook. We use Python 2.7 here. if you use Python 3 then you should rename xmlrpclib into xmlrpc.client .
import googlemaps
import time
import xmlrpclib
import re
from jinja2 import Template
For ease of reuse, let's state the versions of what we are using here. We will use the convenient watermark package. It can be installed with pip install ipyext
. Note that the package name is not watermark, for some reason.
Once installed, it is used as a magic function.
%load_ext watermark
%watermark -a 'JFPuget' -nmv --packages googlemaps,jinja2,requests
JFPuget Sat Mar 12 2016 CPython 2.7.11 IPython 4.0.3 googlemaps 2.4.2 jinja2 2.8 requests 2.6.0 compiler : MSC v.1500 64 bit (AMD64) system : Windows release : 7 machine : AMD64 processor : Intel64 Family 6 Model 42 Stepping 7, GenuineIntel CPU cores : 8 interpreter: 64bit
We are set, let's start real work. We start with the list of cities to be visited. The list below is from Randy Olson article. You can paste your own list to get another tour. Note that all points must be understandable by Google Maps, hence you should try to locate each of those points on it before running the script.
USA_50_points = ["USS Alabama, Battleship Parkway, Mobile, AL",
"Grand Canyon National Park, Arizona",
"Toltec Mounds, Scott, AR",
"San Andreas Fault, San Benito County, CA",
"Cable Car Museum, 94108, 1201 Mason St, San Francisco, CA 94108",
"Pikes Peak, Colorado",
"The Mark Twain House & Museum, Farmington Avenue, Hartford, CT",
"New Castle Historic District, Delaware",
"White House, Pennsylvania Avenue Northwest, Washington, DC",
"Cape Canaveral, FL",
"Okefenokee Swamp Park, Okefenokee Swamp Park Road, Waycross, GA",
"Craters of the Moon National Monument & Preserve, Arco, ID",
"Lincoln Home National Historic Site Visitor Center, 426 South 7th Street, Springfield, IL",
"West Baden Springs Hotel, West Baden Avenue, West Baden Springs, IN",
"Terrace Hill, Grand Avenue, Des Moines, IA",
"C. W. Parker Carousel Museum, South Esplanade Street, Leavenworth, KS",
"Mammoth Cave National Park, Mammoth Cave Pkwy, Mammoth Cave, KY",
"French Quarter, New Orleans, LA",
"Acadia National Park, Maine",
"Maryland State House, 100 State Cir, Annapolis, MD 21401",
"USS Constitution, Boston, MA",
"Olympia Entertainment, Woodward Avenue, Detroit, MI",
"Fort Snelling, Tower Avenue, Saint Paul, MN",
"Vicksburg National Military Park, Clay Street, Vicksburg, MS",
"Gateway Arch, Washington Avenue, St Louis, MO",
"Glacier National Park, West Glacier, MT",
"Ashfall Fossil Bed, Royal, NE",
"Hoover Dam, NV",
"Omni Mount Washington Resort, Mount Washington Hotel Road, Bretton Woods, NH",
"Congress Hall, Congress Place, Cape May, NJ 08204",
"Carlsbad Caverns National Park, Carlsbad, NM",
"Statue of Liberty, Liberty Island, NYC, NY",
"Wright Brothers National Memorial Visitor Center, Manteo, NC",
"Fort Union Trading Post National Historic Site, Williston, North Dakota 1804, ND",
"Spring Grove Cemetery, Spring Grove Avenue, Cincinnati, OH",
"Chickasaw National Recreation Area, 1008 W 2nd St, Sulphur, OK 73086",
"Columbia River Gorge National Scenic Area, Oregon",
"Liberty Bell, 6th Street, Philadelphia, PA",
"The Breakers, Ochre Point Avenue, Newport, RI",
"Fort Sumter National Monument, Sullivan's Island, SC",
"Mount Rushmore National Memorial, South Dakota 244, Keystone, SD",
"Graceland, Elvis Presley Boulevard, Memphis, TN",
"The Alamo, Alamo Plaza, San Antonio, TX",
"Bryce Canyon National Park, Hwy 63, Bryce, UT",
"Shelburne Farms, Harbor Road, Shelburne, VT",
"Mount Vernon, Fairfax County, Virginia",
"Hanford Site, Benton County, WA",
"Lost World Caverns, Lewisburg, WV",
"Taliesin, County Road C, Spring Green, Wisconsin",
"Yellowstone National Park, WY 82190"]
We need the distance between each pair of cities we want to visit. We will use Google map distance matrix API here, but we could provide the distance matrix in any way that suits us. We will see later than we can import an existing TSPLIB file for instance.
We must enable the Google map distance matrix API on the developer console. We will use the Python client library(documentation). We imported it above via import googlemaps
.
First step is to get a google maps API key, see instructions. Once we have a key we paste it below. I removed my actual key and you should paste yours if you want to reuse this code. We initiailize the client connection to the service with the key.
google_key = "paste your key here"
gmaps = googlemaps.Client(key=google_key)
We are using the free version of the service. The free API is limited to 2500 pairwise distances per 24 hours. Given we have 50 points to visit, we need 50x49/2 = 1225 pairwise distances, hence we can call this only twice a day or so. The free API is also limited to 100 pairwise distances per 10 seconds. We therefore have to wait about 123 seconds for getting all our distances.
We use the default parameters for the distance matrix api: units=metric, mode=driving, departure_time=now.
We return the result as a string in the TSPLIB format. The string must represent the distance matrix in lower triangular form. If the cities are numbered from 0 to 49, this distance matrix must look like the following (I only show the first few lines):
0
1-0 0
2-0 2-1 0
3-0 3-1 3-2 0
4-0 4-1 4-2 4-3 0
where i-j
represents the distance from city i
to city j
.
def get_tsp_data(points):
distances = ''
try:
for i in range(0, len(points)):
for j in range(0,i+1):
if i == j:
distances += '0\n'
else:
result1 = gmaps.distance_matrix(origins=points[i],destinations=points[j])
d = result1['rows'][0]['elements'][0]['distance']['value']
distances += '%d ' % d
except Exception as e:
print 'Error in getting distance between %s and %s' % (points[i], points[j])
return distances
Let's try it!
distances = get_tsp_data(USA_50_points)
Does it look good?
print(distances[:100])
0 2677783 0 708032 2066041 0 3679695 1086596 3007532 0 3854200 1261100 3182037 213466 0 2223711 9742
Seems it is correct.
We can now prepare data for NEOS. We will use the NEOS API for Concorde. It is not a REST API as is now customary for web services, because NEOS was developed before REST APIs were invented! NEOS API uses XML-RPC. All we need to provide is a XML file that contains the data of the problem plus few parameters for Concorde.
We first need to wrap data to get a valid TSPLIB format. This format has few fixed input lines describing the problem before the distance matrix. The string below contains the template for files containing a lower triangular distance matrix
tsp_template = """
TYPE : TSP
DIMENSION : %i
EDGE_WEIGHT_TYPE : EXPLICIT
EDGE_WEIGHT_FORMAT : LOWER_DIAG_ROW
EDGE_WEIGHT_SECTION
%s
EOF
"""
We insert the distance matrix we computed via the % operator.
tsp_data = tsp_template % (50, distances)
We save it in case we ned to rerun it. This will avoids waiting 5 minutes to get the distance matrix from Google map next time
with open('usa_50.tsp','wb') as file:
file.write(tsp_data)
If for some reason we lost our Python kernel, or if we restart it, we can get the data back from the file.
with open('usa_50.tsp','rb') as file:
tsp_data = file.read()
We now have to insert that data in the right xml file to submit it to NEOS. As explained in my post, I could have carefully read the documentation. I preferred to use another nice way NEOS offers: it is possible to generate the xml file using the dry run option on the NEOS submission page for Concorde. For that, I created a file named fake.tsp with the two character string '%s' as content. This content is only there for helping processing with Python later. I then entered the path to that file on the submission page, and I checked the dry run option before submitting. This does not run the job. It produces an xml content that I stored in the string below. I removed my actual email and you should put yours instead if you want to reuse it.
base_xml = """
<document>
<category>co</category>
<solver>concorde</solver>
<inputType>TSP</inputType>
<priority>long</priority>
<email> *** insert your email here ***</email>
<dat2><![CDATA[]]></dat2>
<dat1><![CDATA[]]></dat1>
<tsp><![CDATA[%s]]></tsp>
<ALGTYPE><![CDATA[con]]></ALGTYPE>
<RDTYPE><![CDATA[fixed]]></RDTYPE>
<PLTYPE><![CDATA[no]]></PLTYPE>
<comment><![CDATA[]]></comment>
</document>
"""
We can now insert the tsp data, again via the % operator
tsp_xml = base_xml % tsp_data
We are now ready to log into NEOS, submit the job, and get the result.
NEOS_HOST="www.neos-server.org"
NEOS_PORT=3332
neos = xmlrpclib.ServerProxy("http://%s:%d" % (NEOS_HOST, NEOS_PORT))
(jobNumber,password) = neos.submitJob(tsp_xml)
We now wait until NEOS has computed the shortest tour. It shouldn't be long.
status="Waiting"
while status == "Running" or status=="Waiting":
time.sleep(1)
status = neos.getJobStatus(jobNumber, password)
msg = neos.getFinalResults(jobNumber, password).data
What does the result look like?
print(msg)
/home/neos5/bin/concorde.cplex -s 99 -f sample.tsp Host: thales Current process id: 21793 Using random seed 99 Problem Type: TSP Number of Nodes: 50 Explicit Lengths (CC_MATRIXNORM) Set initial upperbound to 22243542 (from tour) LP Value 1: 21512609.000000 (0.00 seconds) LP Value 2: 22243542.000000 (0.00 seconds) New lower bound: 22243542.000000 Final lower bound 22243542.000000, upper bound 22243542.000000 Exact lower bound: 22243542.000000 DIFF: 0.000000 Final LP has 66 rows, 164 columns, 518 nonzeros Optimal Solution: 22243542.00 Number of bbnodes: 1 Total Running Time: 0.03 (seconds) *** *** *** You chose the Concorde(CPLEX) solver *** *** Cities are numbered 0..n-1 and each line shows a leg from one city to the next followed by the distance rounded to integers*** 50 50 0 17 232538 17 23 332001 23 41 382077 41 2 223579 2 35 599175 35 42 645010 42 30 732619 30 5 906931 5 49 1012309 49 11 479311 11 43 886721 43 1 420685 1 27 390501 27 3 762531 3 4 213466 4 36 1046462 36 46 169763 46 25 693577 25 33 755032 33 40 582428 40 26 599792 26 15 549197 15 14 332988 14 22 393573 22 48 401354 48 12 490113 12 24 168248 24 13 390955 13 16 194584 16 34 307293 34 21 419167 21 44 1077780 44 28 191765 28 18 369946 18 20 441365 20 38 119309 38 6 144579 6 31 191481 31 37 150785 37 29 147965 29 7 84693 7 19 99052 19 8 57073 8 45 29759 45 47 418990 47 32 633579 32 39 727874 39 10 378291 10 9 384913 9 0 880363
By searching for the string "Optimal Solution: " we see that we find a tour of length 22,313.083 kilometers.
We must now parse the results in order to print it with city names. We first look for the start of the section where each leg of the tour is printed. We then parse the tour using Python regular expressions, which gives us the list of the cities visited by the tour. However, what we get are the indices of the cities, because this is all we provided to Concorde. We need to translate these indices back to the original location names. We finally need to close the loop, i.e. add the starting point at the end.
def get_tour (num_points, msg):
num_points = 50
optimal_length_mention = re.findall(r'Optimal Solution: (\d+)',msg)
optimal_length = int(optimal_length_mention[0])
indices = re.findall(r'(\d+) \d+ \d+', msg)
tour = [USA_50_points[int(i)] for i in indices]
tour.append(tour[0])
return (optimal_length,tour)
(optimal_length,tour) = get_tour(50, msg)
print("Optimal Length (m) : %s" % optimal_length)
print(tour)
Optimal Length (m) : 22243542 ['USS Alabama, Battleship Parkway, Mobile, AL', 'French Quarter, New Orleans, LA', 'Vicksburg National Military Park, Clay Street, Vicksburg, MS', 'Graceland, Elvis Presley Boulevard, Memphis, TN', 'Toltec Mounds, Scott, AR', 'Chickasaw National Recreation Area, 1008 W 2nd St, Sulphur, OK 73086', 'The Alamo, Alamo Plaza, San Antonio, TX', 'Carlsbad Caverns National Park, Carlsbad, NM', 'Pikes Peak, Colorado', 'Yellowstone National Park, WY 82190', 'Craters of the Moon National Monument & Preserve, Arco, ID', 'Bryce Canyon National Park, Hwy 63, Bryce, UT', 'Grand Canyon National Park, Arizona', 'Hoover Dam, NV', 'San Andreas Fault, San Benito County, CA', 'Cable Car Museum, 94108, 1201 Mason St, San Francisco, CA 94108', 'Columbia River Gorge National Scenic Area, Oregon', 'Hanford Site, Benton County, WA', 'Glacier National Park, West Glacier, MT', 'Fort Union Trading Post National Historic Site, Williston, North Dakota 1804, ND', 'Mount Rushmore National Memorial, South Dakota 244, Keystone, SD', 'Ashfall Fossil Bed, Royal, NE', 'C. W. Parker Carousel Museum, South Esplanade Street, Leavenworth, KS', 'Terrace Hill, Grand Avenue, Des Moines, IA', 'Fort Snelling, Tower Avenue, Saint Paul, MN', 'Taliesin, County Road C, Spring Green, Wisconsin', 'Lincoln Home National Historic Site Visitor Center, 426 South 7th Street, Springfield, IL', 'Gateway Arch, Washington Avenue, St Louis, MO', 'West Baden Springs Hotel, West Baden Avenue, West Baden Springs, IN', 'Mammoth Cave National Park, Mammoth Cave Pkwy, Mammoth Cave, KY', 'Spring Grove Cemetery, Spring Grove Avenue, Cincinnati, OH', 'Olympia Entertainment, Woodward Avenue, Detroit, MI', 'Shelburne Farms, Harbor Road, Shelburne, VT', 'Omni Mount Washington Resort, Mount Washington Hotel Road, Bretton Woods, NH', 'Acadia National Park, Maine', 'USS Constitution, Boston, MA', 'The Breakers, Ochre Point Avenue, Newport, RI', 'The Mark Twain House & Museum, Farmington Avenue, Hartford, CT', 'Statue of Liberty, Liberty Island, NYC, NY', 'Liberty Bell, 6th Street, Philadelphia, PA', 'Congress Hall, Congress Place, Cape May, NJ 08204', 'New Castle Historic District, Delaware', 'Maryland State House, 100 State Cir, Annapolis, MD 21401', 'White House, Pennsylvania Avenue Northwest, Washington, DC', 'Mount Vernon, Fairfax County, Virginia', 'Lost World Caverns, Lewisburg, WV', 'Wright Brothers National Memorial Visitor Center, Manteo, NC', "Fort Sumter National Monument, Sullivan's Island, SC", 'Okefenokee Swamp Park, Okefenokee Swamp Park Road, Waycross, GA', 'Cape Canaveral, FL', 'USS Alabama, Battleship Parkway, Mobile, AL']
We can now display the result. I started from Randal Olson code. However, his code must be be edited manually each time one wants to display a different tour. Python ecosystem includes a evry convenient package for this: the jinja2 package. We will use this package to parameterize Randy's html code.
The parameter we need to pass is a split of the tour in subtours of length at most 10 because of Google map API limit. For each subtour we will need its index, its starting point, its ending point, and all the intermediate waypoints in a list. We can construct this information via a simple utility function.
def create_routes(tour):
routes = []
length = len(tour)
i = 1
while length >= 2:
rlength = min(8,length - 2)
start = tour[0]
end = tour[rlength+1]
route = tour[1:rlength+1]
routes.append( (i, start, end, route) )
i += 1
tour = tour[rlength+1:]
length = len(tour)
return routes
We can now create the html template. i started from randy Olson's template, and I replaced all input parts with Jinja2 loops. These loops look like the following:
{% for n in routes %}
some Javascript code here
{% endfor %}
tsp_html = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="initial-scale=1.0, user-scalable=no">
<meta name="description" content="The correct optimal road trip across the U.S.">
<meta name="author" content="Randal S. Olson; modified by Jean-Francois Puget">
<title>The correct optimal road trip across the U.S.</title>
<style>
html, body, #map-canvas {
height: 100%;
margin: 0px;
padding: 0px
}
</style>
<script src="https://maps.googleapis.com/maps/api/js?v=3.exp&signed_in=true"></script>
<script>
{% for n in routes %}
var directionsDisplay{{n[0]}};
{% endfor %}
var markerOptions = {icon: "http://maps.gstatic.com/mapfiles/markers2/marker.png"};
var directionsDisplayOptions = {preserveViewport: true,
markerOptions: markerOptions};
var directionsService = new google.maps.DirectionsService();
var map;
function initialize() {
var center = new google.maps.LatLng(39, -96);
var mapOptions = {
zoom: 5,
center: center
};
map = new google.maps.Map(document.getElementById('map-canvas'), mapOptions);
{% for n in routes %}
directionsDisplay{{n[0]}}.setMap(map);
{% endfor %}
}
function calcRoute(start, end, routes) {
switch (start) {
{% for n in routes %}
case "{{n[1]}}":
directionsDisplay{{n[0]}} = new google.maps.DirectionsRenderer(directionsDisplayOptions);
break;
{% endfor %}
}
var waypts = [];
for (var i = 0; i < routes.length; i++) {
waypts.push({
location:routes[i],
stopover:true});
}
var request = {
origin: start,
destination: end,
waypoints: waypts,
optimizeWaypoints: false,
travelMode: google.maps.TravelMode.DRIVING
};
directionsService.route(request, function(response, status) {
if (status == google.maps.DirectionsStatus.OK) {
switch (start) {
{% for n in routes %}
case "{{n[1]}}":
directionsDisplay{{n[0]}}.setDirections(response);
break;
{% endfor %}
}
}
});
}
google.maps.event.addDomListener(window, 'load', initialize);
{% for n in routes %}
calcRoute("{{n[1]}}","{{n[2]}}",{{n[3]}});
{% endfor %}
</script>
</head>
<body>
<div id="map-canvas"></div>
</body>
</html>
"""
We can now create the html file via rendering the above template with the actual tour we computed.
tsp_template = Template(tsp_html)
r = tsp_template.render(routes = create_routes(tour))
with open('tsp.html','wb') as file:
file.write(r)
We leverage Jupyter %%HTML
magic function to display the result inside the notebook. We may have to use the zoom/unzoom buttons on the lower left corner to adjust the image to the actual cell size.
%%HTML
<iframe width=100% height="500px" src='tsp.html'></iframe>
This looks nice, isn't it?
The tour we computed is different from the one Bill Cook published. The difference is due to the fact that Google Maps returned a distance matrix that was a bit different from the one Bill used. Let us check that our code is correct by using the same distance matrix as Bill.
We will repeat the above steps, starting this time from the europe50 file Bill created.
In order to do that, let's wrap everything into a simple to use function that expects a file containing a distance matrix in TSPLIB format with extension tsp
.
def tsp(file_name):
with open(file_name+'.tsp', 'rb') as file:
tsp_xml = base_xml % file.read()
(jobNumber,password) = neos.submitJob(tsp_xml)
status="Waiting"
while status == "Running" or status=="Waiting":
print(status)
time.sleep(1)
status = neos.getJobStatus(jobNumber, password)
msg = neos.getFinalResults(jobNumber, password).data
(optimal_length,tour) = get_tour(50, msg)
print("Optimal Length (m) : %s" % optimal_length)
tsp_template = Template(tsp_html)
r = tsp_template.render(routes = create_routes(tour))
with open(file_name+'.html','wb') as file:
file.write(r)
Note that this function isn't safe at all. It does not check for any formatting error in the input file for instance. But it is good enough when we are using files from the TSPLIB as these files are well formated. For some reason Bill named his file 'europe50' ...
tsp('europe50')
Waiting Running Running Running Running Optimal Length (m) : 22015038
We get the same length as Bill, which is good. How does this tour looks like?
%%HTML
<iframe width=100% height="500px" src='europe50.html'></iframe>
Main difference with the one we found is around Virginia.
Let us try out function with the distance matrix we just got from Google Maps.
tsp('usa_50')
Waiting Running Running Running Running Optimal Length (m) : 22243542
How does it look like?
%%HTML
<iframe width=100% height="500px" src='usa_50.html'></iframe>
I hope you enjoyed this. If you want to learn more then my blog entry on Computing The Really Optimal Tour Across The USA On The Cloud With Python contains useful links.