Calculate distances between Iditarod checkpoints along the trail

The following is a brief tutorial on how to solve the problem, using popular Python scientific and geospatial tools: Given the track of a race trail and the location of checkpoints, what are the distances between any two successive checkpoints. The example is the Iditarod dog sled race: the trail is approximated as a sequence of route points, or rather, the straight-line segments that connect them -- this is called a linestring. The checkpoints are given separately. It is assumed that both are available as (separate) files in ESRI Shapefile format.

First some libraries that will be needed later.

In [1]:
import math
import fiona
from shapely.geometry import shape, LineString, Point
from collections import OrderedDict

Let's define some helper functions. The first calculates the 3D Euclidian distance between two points in space, with the points represented as lists or tuples. I'm sure there's something usitable in the math library, but it's not costing us much.

The second function does something quite generally useful. First of all, it's not a function at all, but a generator: when repeatedly called with a sequence variable or iterator as an argument, it successively returns pairs (i, i+1) of elements. For example, calling pairs([1, 2, 3, 4]) yields successively:
1, 2
2, 3
3, 4

We will use this function to iterate through pairs of successive checkpoints.

Last, getnearest() is a function that determines the point on a linestring that is closest to a given point external to the line string. Such a function is needed because the checkpoint locations aren't actually on the trail linestring (which is only given as a discrete list of route points with imagined straight lines between them). The linestring is here simply a sequence variable or iterator made up of objects that can be compared to the point object via an appropriate distance function. The method is simplistic, but good enough if the line strings aren't too long. The main limitation is that it won't give the correct result if the line doubles back on itself and runs sections of the trail multiple times (think of a loop or an out-and-back). It would not be hard to write a more general function, but for the Iditarod, this one is good enough.

In [2]:
def dist(P3D1, P3D2):
    sqr = (P3D1[0] - P3D2[0])*(P3D1[0] - P3D2[0]) 
    sqr += (P3D1[1] - P3D2[1])*(P3D1[1] - P3D2[1])
    sqr += (P3D1[2] - P3D2[2])*(P3D1[2] - P3D2[2])
    return math.sqrt(sqr)
In [3]:
def pairs(seq):
    i = iter(seq)
    prev = item =
    for item in i:
        yield prev, item
        prev = item
In [4]:
def getnearest(point, line, distancefunc=None):
    Get nearest index and point on a line for a given point external to the line
    point: some object representing a point in space 
    line: iterable composed of objects of the same type as point
    distancefunc: a function taking two points and returning a number
    returns: index, point on line that is nearest to the input point
    Not efficient for very long lines, because it searches along the line.
    if not distancefunc:
            distancefunc = lambda x, y: (x - y) * (x - y)
            # if no distance function provided and objects can't be subtracted and multiplied, use constant
            distancefunc = lambda x, y: 1
    # This would be inefficient for long lines, but ok for simple applications
    # making sure we have an iterator; if line is an iterator, this doesn't make a difference
    xline = iter(line)
    # set nearest to first line element
    nearest = next(xline)
    nidx = 0
    # iterate through the rest of the line
    for idx, pt in enumerate(xline):
        if distancefunc(pt, point) < distancefunc(nearest, point):
            nearest = pt
            nidx = idx
    return nidx, nearest    

In order to measure distances, we want to be working in a projection that operates in metres and is reasonably regular. We use Alaska Albers with the NAD83 datum (EPSG:3338) for this purpose. The KML files are converted to ESRI shapefiles and reprojected from Lat/Long (GPS coordinates with WGS 83 datum - they identify themselves as EPSG:6326).

In [5]:
data_track ='Iditarod_routes_AKAlbers.shp', 'r')
data_checkpoints ='Iditarod_chkp_AKAlbers.shp', 'r')

This makes the data accessible as Fiona Collection objects. The data_track object, however, contains records, which means the trail is provided in 3 separate lines strings. In this case, it turns out the segments are: Willow to Ophir (common to both the northern or southern Iditarod route); then the northern route Ophir to Kaltag; last the second common bit, Kaltag to Nome.

In [6]:
print type(data_track)
print len(data_track)
<class 'fiona.collection.Collection'>

Luckily, the records are all of type LineString, and they are ordered end-to-end, so that we can stitch them together. This is verified by printing the first and last data point from each and checking that a line's end point coincides (or is at least very close to) the next line's starting point. Indeed, they coincide, which simplifies matters. We can simply stitch them together and then convert the result in a shapely linestring shape object.

In [7]:
for record in data_track:
    print record['geometry']['type']
    print record['geometry']['coordinates'][0], record['geometry']['coordinates'][-1]
(201864.3672577657, 1311075.3757290987, 0.0) (-126708.67456065794, 1466405.3248210528, 0.0)
(-126709.72589937256, 1466377.6315675275, 0.0) (-227464.665379459, 1604444.2870315968, 0.0)
(-227464.665379459, 1604444.2870315968, 0.0) (-545332.8008364791, 1662252.0556106432, 0.0)
In [8]:
fullinecoords = []
for line in data_track:

Now we have a single list of coordinates for the Iditarod trail (northern route) that we can convert into a Shapely LineString object.

In [9]:
fulline = {
    'geometry': {
        'coordinates': fullinecoords,
        'type': 'LineString'    
    'id': 0,
    'properties': OrderedDict([(u'Name', u'Iditarod_northern'), (u'descriptio', None), (u'timestamp', None), (u'begin', None), (u'end', None), (u'altitudeMo', None), (u'tessellate', 1), (u'extrude', -1), (u'visibility', -1)])    
fulltrack = shape(fulline['geometry'])

1. Explicitly find nearest point on track to each checkpoint/other point of interest

The first method to calculate distances is the more manual one. It has two steps: Place every checkpoint actually on the track, but finding the closest track route point to it, and then summing up distances along the track.

In [10]:
chkp_ontrack = {}
for checkpoint in data_checkpoints:
    name = checkpoint['properties']['Name']
    coords = checkpoint['geometry']['coordinates']
    nearestidx, nearestpoint = getnearest(coords, fulltrack.coords, dist)
    chkp_ontrack[name] = {}
    chkp_ontrack[name]['idx'] = nearestidx
    chkp_ontrack[name]['coords'] = nearestpoint
chkp_ontrack = OrderedDict(sorted(chkp_ontrack.items(), key=lambda item: item[1]['idx']))
print chkp_ontrack
OrderedDict([(u'Willow', {'coords': (201864.3672577657, 1311075.3757290987, 0.0), 'idx': 0}), (u'Yentna', {'coords': (175541.922551426, 1310422.667657584, 0.0), 'idx': 16}), (u'Skwentna', {'coords': (148273.63926342092, 1335305.5819313568, 0.0), 'idx': 69}), (u'Finger Lake', {'coords': (101143.48518480505, 1335331.4841845965, 0.0), 'idx': 81}), (u'Rainy Pass', {'coords': (66366.61204266077, 1346355.6871907613, 0.0), 'idx': 102}), (u'Rohn', {'coords': (32414.164815141416, 1369057.6355107727, 0.0), 'idx': 148}), (u'Nikolai', {'coords': (-19254.89342679152, 1450344.9621536026, 0.0), 'idx': 184}), (u'McGrath', {'coords': (-80500.13741392425, 1443687.8063012932, 0.0), 'idx': 219}), (u'Takotna', {'coords': (-104457.12558471515, 1447988.9224941842, 0.0), 'idx': 226}), (u'Ophir', {'coords': (-126743.78601976226, 1466445.5738682337, 0.0), 'idx': 241}), (u'Cripple', {'coords': (-90772.02370913935, 1554895.7141499869, 0.0), 'idx': 296}), (u'Ruby', {'coords': (-70857.30227587471, 1642094.6524683584, 0.0), 'idx': 464}), (u'Galena', {'coords': (-139023.34706871994, 1644154.2649788973, 0.0), 'idx': 512}), (u'Nulato', {'coords': (-194114.96857541226, 1643433.5101878615, 0.0), 'idx': 553}), (u'Kaltag', {'coords': (-228603.29277456927, 1604009.7401093692, 0.0), 'idx': 580}), (u'Unalakeet', {'coords': (-332360.1309182973, 1562768.0305000974, 0.0), 'idx': 600}), (u'Shaktoolik', {'coords': (-347377.80681277846, 1619099.1299162228, 0.0), 'idx': 622}), (u'Koyuk', {'coords': (-337874.247600846, 1681707.5706750432, 0.0), 'idx': 626}), (u'Elim', {'coords': (-394343.0533200941, 1652837.1998616038, 0.0), 'idx': 661}), (u'Golovin', {'coords': (-430995.2088565377, 1649786.6383352731, 0.0), 'idx': 692}), (u'White Mountains', {'coords': (-447215.0767230113, 1667242.719142894, 0.0), 'idx': 697}), (u'Safety', {'coords': (-514874.63125333586, 1653411.267474647, 0.0), 'idx': 737}), (u'Nome', {'coords': (-545209.0742850348, 1662290.4894755771, 0.0), 'idx': 767})])
In [11]:
totaldist = 0
for chkpt, nextchkpt in pairs(chkp_ontrack):
    idx, nextidx = chkp_ontrack[chkpt]['idx'], chkp_ontrack[nextchkpt]['idx']
    partialtrack = LineString(fulltrack.coords[idx:nextidx+1])
    totaldist = totaldist + partialtrack.length
    print '%s --> %s' % (chkpt, nextchkpt)
    print '  Distance in km: %s' % format(partialtrack.length/1000, '.1f')
    print '  Distance in miles: %s' % format(partialtrack.length/1600, '.1f')
print 'Total distance summed up: %s km -- %s miles' % (format(totaldist/1000, '.1f'), format(totaldist/1600, '.1f'))
print 'Total distance from full track: %s km -- %s miles' % (format(fulltrack.length/1000, '.1f'), format(fulltrack.length/1600, '.1f'))
Willow --> Yentna
  Distance in km: 49.2
  Distance in miles: 30.7
Yentna --> Skwentna
  Distance in km: 48.7
  Distance in miles: 30.4
Skwentna --> Finger Lake
  Distance in km: 47.3
  Distance in miles: 29.6
Finger Lake --> Rainy Pass
  Distance in km: 46.8
  Distance in miles: 29.2
Rainy Pass --> Rohn
  Distance in km: 54.4
  Distance in miles: 34.0
Rohn --> Nikolai
  Distance in km: 86.5
  Distance in miles: 54.1
Nikolai --> McGrath
  Distance in km: 98.2
  Distance in miles: 61.4
McGrath --> Takotna
  Distance in km: 26.6
  Distance in miles: 16.6
Takotna --> Ophir
  Distance in km: 29.5
  Distance in miles: 18.4
Ophir --> Cripple
  Distance in km: 115.7
  Distance in miles: 72.3
Cripple --> Ruby
  Distance in km: 111.7
  Distance in miles: 69.8
Ruby --> Galena
  Distance in km: 82.3
  Distance in miles: 51.5
Galena --> Nulato
  Distance in km: 77.5
  Distance in miles: 48.4
Nulato --> Kaltag
  Distance in km: 57.1
  Distance in miles: 35.7
Kaltag --> Unalakeet
  Distance in km: 118.0
  Distance in miles: 73.7
Unalakeet --> Shaktoolik
  Distance in km: 66.0
  Distance in miles: 41.2
Shaktoolik --> Koyuk
  Distance in km: 68.8
  Distance in miles: 43.0
Koyuk --> Elim
  Distance in km: 74.0
  Distance in miles: 46.2
Elim --> Golovin
  Distance in km: 43.0
  Distance in miles: 26.9
Golovin --> White Mountains
  Distance in km: 23.7
  Distance in miles: 14.8
White Mountains --> Safety
  Distance in km: 78.5
  Distance in miles: 49.1
Safety --> Nome
  Distance in km: 35.2
  Distance in miles: 22.0

Total distance summed up: 1438.7 km -- 899.2 miles
Total distance from full track: 1439.1 km -- 899.4 miles

OK, this looks rather reasonable. Unalakleet is misspelled, because it was misspelled in the original KML file. The first leg seems to be quite a bit shorter than the Iditarod Trail Committee says, and looking at the track, there seems to be a rather long straight line at the beginning. We're therefore underestimating the first leg by quite a bit. For the other legs, we will always underestimate them by a few percent, given that we only have less than a route point per mile on the average: 771 route points for 900+ miles.

In [12]:

2. Using the shapely API to project the waypoints on the track

Using the Shapely library, there is a more direct way: the project method of a LineString object, which takes a Point object as an argument, calculates the distance from the start point to a point obtained by projecting the Point onto the LineString. This method is also more precise.

In [13]:
waypoints = {}
import itertools
for checkpoint in data_checkpoints:
    name = checkpoint['properties']['Name']
    chkptshape = Point(checkpoint['geometry']['coordinates'])
    a = fulltrack.project(chkptshape)
    waypoints[name] = a
waypoints = OrderedDict(sorted(waypoints.items(), key=lambda x: x[1]))
print waypoints
OrderedDict([(u'Willow', 0.0), (u'Yentna', 50714.33033776395), (u'Skwentna', 98456.95832563685), (u'Finger Lake', 154847.81950651482), (u'Rainy Pass', 194917.18510199385), (u'Rohn', 247108.63569004112), (u'Nikolai', 350916.59367488633), (u'McGrath', 432930.69745211414), (u'Takotna', 459168.7339948695), (u'Ophir', 491825.4313515935), (u'Cripple', 605267.813128124), (u'Ruby', 714936.960658677), (u'Galena', 799741.3740328938), (u'Nulato', 875872.404119346), (u'Kaltag', 932207.0987561418), (u'Unalakeet', 1048066.017263612), (u'Shaktoolik', 1122485.1410288452), (u'Koyuk', 1187335.6547237583), (u'Elim', 1258850.0709805435), (u'Golovin', 1301511.4599364558), (u'White Mountains', 1326225.760966473), (u'Safety', 1404359.4452045814), (u'Nome', 1438842.1274170913)])
In [14]:
prev = 0
for pt1, pt2 in pairs(waypoints):
    print '%s --> %s' % (pt1, pt2)
    print '  Distance in km: %s' % format((waypoints[pt2] - waypoints[pt1])/1000, '.1f')
    print '  Distance in miles: %s' % format((waypoints[pt2] - waypoints[pt1])/1600, '.1f')
print 'Total distance: %s km -- %s miles' % (format(waypoints[pt2]/1000, '.1f'), format(waypoints[pt2]/1600, '.1f'))
Willow --> Yentna
  Distance in km: 50.7
  Distance in miles: 31.7
Yentna --> Skwentna
  Distance in km: 47.7
  Distance in miles: 29.8
Skwentna --> Finger Lake
  Distance in km: 56.4
  Distance in miles: 35.2
Finger Lake --> Rainy Pass
  Distance in km: 40.1
  Distance in miles: 25.0
Rainy Pass --> Rohn
  Distance in km: 52.2
  Distance in miles: 32.6
Rohn --> Nikolai
  Distance in km: 103.8
  Distance in miles: 64.9
Nikolai --> McGrath
  Distance in km: 82.0
  Distance in miles: 51.3
McGrath --> Takotna
  Distance in km: 26.2
  Distance in miles: 16.4
Takotna --> Ophir
  Distance in km: 32.7
  Distance in miles: 20.4
Ophir --> Cripple
  Distance in km: 113.4
  Distance in miles: 70.9
Cripple --> Ruby
  Distance in km: 109.7
  Distance in miles: 68.5
Ruby --> Galena
  Distance in km: 84.8
  Distance in miles: 53.0
Galena --> Nulato
  Distance in km: 76.1
  Distance in miles: 47.6
Nulato --> Kaltag
  Distance in km: 56.3
  Distance in miles: 35.2
Kaltag --> Unalakeet
  Distance in km: 115.9
  Distance in miles: 72.4
Unalakeet --> Shaktoolik
  Distance in km: 74.4
  Distance in miles: 46.5
Shaktoolik --> Koyuk
  Distance in km: 64.9
  Distance in miles: 40.5
Koyuk --> Elim
  Distance in km: 71.5
  Distance in miles: 44.7
Elim --> Golovin
  Distance in km: 42.7
  Distance in miles: 26.7
Golovin --> White Mountains
  Distance in km: 24.7
  Distance in miles: 15.4
White Mountains --> Safety
  Distance in km: 78.1
  Distance in miles: 48.8
Safety --> Nome
  Distance in km: 34.5
  Distance in miles: 21.6

Total distance: 1438.8 km -- 899.3 miles