GeoPandas and OpenStreetMap

The GeoPandas osm module makes it easy to dynamically query and analyze arbitrary OpenStreetMap (OSM) data. It's a powerful combination.

This example finds the number of traffic lights that are along an arbitrary running route, using just a few lines of code. Note that OpenStreetMap doesn't have lots of traffic lights, but they are pretty good around Cambridge, MA where this example is located.

I used RunKeeper to download a recorded run from my phone. You can find the raw file here:

https://gist.githubusercontent.com/jwass/4ae78872e30a21b34a44/raw/e48b18797e2dd9fb653d38329df574fed1852057/RK_gpx%20_2014-07-10_2054.gpx

I removed a bunch of random points near the start and end so you can't exactly figure out where I live :)

In [1]:
import numpy as np
import geopandas as gpd
import geopandas.io.osm as osm

Load the trace

First load up the GPX file. Specifying the tracks layer loads the data as a single row containing a LineString or MultiLineString. For other uses you can specify layer=track_points to have a row for each recorded point.

In [2]:
df = gpd.read_file('RK_gpx_2014-07-10_2054.gpx', layer='tracks')
df
Out[2]:
cmt desc geometry link1_href link1_text link1_type link2_href link2_text link2_type name number src type
0 None None (LINESTRING (-71.106996 42.36279, -71.10707499... None None None None None None Running 7/10/14 8:54 pm None None None

Load the OpenStreetMap data

Now load all the OpenStreetMap traffic lights in the route's bounding box. Traffic lights in OSM are represented as nodes where the highway tag is traffic_signals. The query_osm function takes care of formulating the query and returning a GeoDataFrame ready to go with all the points.

In [3]:
df_lights = osm.query_osm('node', bbox=df.total_bounds, tags='highway=traffic_signals')
df_lights
Out[3]:
highway id traffic_signals geometry
0 traffic_signals 61283269 NaN POINT (-71.09488469999999 42.3601506)
1 traffic_signals 61283293 NaN POINT (-71.0965425 42.3561028)
2 traffic_signals 61317400 NaN POINT (-71.1135493 42.3622591)
3 traffic_signals 61318066 NaN POINT (-71.1046257 42.3535672)
4 traffic_signals 61321123 blinker POINT (-71.11036199999999 42.357534)
5 traffic_signals 61321134 blinker POINT (-71.10724089999999 42.3581054)
6 traffic_signals 61321277 NaN POINT (-71.09943370000001 42.3634087)
7 traffic_signals 61322052 NaN POINT (-71.110533 42.363275)
8 traffic_signals 61323022 NaN POINT (-71.0960005 42.3608275)
9 traffic_signals 61323032 NaN POINT (-71.0975461 42.361748)
10 traffic_signals 61327037 NaN POINT (-71.113873 42.3644915)
11 traffic_signals 61327067 NaN POINT (-71.101387 42.364001)
12 traffic_signals 61327121 NaN POINT (-71.0967089 42.3632016)
13 traffic_signals 61328029 blinker POINT (-71.1127706 42.3601424)
14 traffic_signals 61328070 NaN POINT (-71.11107869999999 42.3591161)
15 traffic_signals 61329632 NaN POINT (-71.10118799999999 42.363889)
16 traffic_signals 61329844 NaN POINT (-71.099593 42.362982)
17 traffic_signals 61331756 NaN POINT (-71.1095894 42.3582726)
18 traffic_signals 579060230 NaN POINT (-71.0935624 42.3590304)

Project both GeoDataFrames to the Massachusetts state plane (EPSG:26986) (Hopefully in the future GeoPandas will have geography methods and we can skip the projections).

Then just compute the distance from the lights to the line using distance().

In [4]:
epsg = 26986  # MA mainland state plane
df_p = df.to_crs(epsg=epsg)
df_lights_p = df_lights.to_crs(epsg=epsg)

df_lights['d'] = df_lights_p.distance(df_p['geometry'].iloc[0])
df_lights['d']
Out[4]:
0      13.632391
1       2.799502
2     382.609745
3      11.127261
4     150.686201
5     308.685223
6     129.537163
7     259.754943
8      17.562312
9      30.430179
10    555.348314
11     86.198503
12    206.003460
13    182.843900
14      5.179193
15     90.214907
16     83.550469
17    147.394822
18     12.741685
Name: d, dtype: float64

Visualization

Now visualize the data. I'm using geojsonio.py which opens http://geojsonio.io. We can take advantage of simplestyle-spec by setting a few properties on the GeoDataFrame to control plotting behavior

Use the marker-color property - traffic lights that are close to the route (< 20 meters) are green and the others will be red. The light at Massachusetts Ave. & Landsdowne St. is a false negative due to noise in the GPS trace.

You can click on each marker and inspect the d property to see how far it is from the route.

In [6]:
# Shouldn't need the 'set_geometry()' call below, this is a problem in GeoPandas
combined = df.append(df_lights).set_geometry('geometry')
combined['marker-color'] = np.where(combined['d'] <= 20, '#4daf4a', '#e41a1c')

import geojsonio
geojsonio.embed(combined.to_json(na='drop'))
Out[6]: