Handling GPS data with Python

Dr. Florian Wilhelm

Senior Data Scientist @ inovex


  • Project work and general interest
  • Hard to find information about Python libraries
  • Interest in the mathematical algorithms of that domain
  • Needed a subject for a EuroPython talk ;-)

GPX: GPS Exchange Format

  • common GPS data format
  • based on XML schema
  • describes waypoints, routes and tracks
  • GPS coordinates based GS-84 ellipsoid, height in meters

<img src="./gfx/route_track_mod.png" widht=60%>

General structure

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<gpx version="1.1" creator="Creator of the file">
  <metadata> <!-- metadata of the file --> </metadata>
  <wpt lat="xx.xxx" lon="yy.yyy"><!-- ... --></wpt>
  <!-- more waypoints -->
    <!-- attributes of the route -->
    <rtept lat="xx.xxx" lon="yy.yyy"><!-- ... --></rtept>
    <!-- more route points -->
  <!-- more routes -->
    <!-- attributes of the track -->
      <trkpt lat="xx.xxx" lon="yy.yyy"><!-- ... --></trkpt>
      <!-- more track points -->
    <!-- more track segments -->
  <!-- more tracks -->

Example from Polar Flow

<?xml version="1.0" encoding="UTF-8"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="Polar Flow">
      <trkpt lat="53.560591" lon="9.9755985">
      <trkpt lat="53.560591" lon="9.9755985">
      <!-- ... many more points -->

gpxpy: GPX file parser

  • for reading and writing GPX files
  • licensed unter Apache 2.0
  • contains gpxinfo cli tool for basic stats
  • Python 3 compatible
  • written by Tomo Krajina
  • used by http://www.trackprofiler.com/
In [40]:
import gpxpy

with open('./gpx/hh_marathon.gpx') as fh:
    gpx_file = gpxpy.parse(fh)
In [41]:
segment = gpx_file.tracks[0].segments[0]
coords = pd.DataFrame([
        {'lat': p.latitude, 
         'lon': p.longitude, 
         'ele': p.elevation,
         'time': p.time} for p in segment.points])
coords.set_index('time', drop=True, inplace=True)
ele lat lon
2016-04-17 08:02:12 17.0 53.560591 9.975599
2016-04-17 08:02:13 17.0 53.560591 9.975599
2016-04-17 08:02:14 17.0 53.560561 9.975591

Plotting a track

In [42]:
plt.plot(coords['lon'].values, coords['lat'].values);

... and the actual GPS coordinates

In [43]:
plt.plot(coords['lon'].values, coords['lat'].values)
plt.plot(coords['lon'].values, coords['lat'].values, 'ro');

Ohh! Need to reduce the number of points!

In [44]:
plt.plot(coords['lon'].values, coords['lat'].values)
plt.plot(coords['lon'].values[::150], coords['lat'].values[::150], 'ro');

Simplifying GPS tracks

Most GPS sensors have a uniform sampling rate leading to overly many points on almost straight lines.

How can we reduce the number of points?

Ramer-Douglas-Peucker algorithm

In [45]:
simple_coords = rdp(coords[['lon', 'lat']].values, epsilon=1e-4)
print("{} points reduced to {}!".format(coords.shape[0], simple_coords.shape[0]))
plt.plot(simple_coords[:, 0], simple_coords[:, 1])
plt.plot(simple_coords[:, 0], simple_coords[:, 1], 'ro');
12072 points reduced to 193!

Elevation of a GPS track

In [47]:

... but what if the elevation was missing?

In [48]:
# Delete elevation data
for p in gpx_file.tracks[0].segments[0].points:
    p.elevation = None

srtm.py: add missing GPS elevation

In [49]:
import srtm
elevation_data = srtm.get_data()
elevation_data.add_elevations(gpx_file, smooth=True)
In [50]:
coords['srtm'] = [p.elevation for p in gpx_file.tracks[0].segments[0].points]

mplleaflet: interactive Leaflet web maps

  • converts a matplotlib plot into a pannable, zoomable slippy map
  • extremely easy to use
  • licensed under new-BSD
  • integrates well with Jupyter
  • Python 3 compatible
  • tracks should be simplified with Ramer-Douglas-Peucker algorithm first

First we start with a matplotlib plot...

In [51]:
fig = plt.figure()
plt.plot(simple_coords[:,0], simple_coords[:,1])
plt.plot(simple_coords[:,0], simple_coords[:,1], 'ro');

... and project it onto OpenStreetMap

In [52]:
import mplleaflet
mplleaflet.display(fig=fig) # .show(fig=fig) to display in new window
In [54]: