GeoSpatial Analysis of NYC Taxi data

Inspired by Geospatial Operations at Scale with Dask we're going to look at a the NYC Taxi dataset from a geospatial angle. This also helps us prove out recent performance improvements to GeoPandas (which was recently cythonized) and a new dask-geopandas parallel implementation.

In [1]:
from dask.distributed import Client
client = Client(processes=False)
client
Out[1]:

Client

Cluster

  • Workers: 1
  • Cores: 4
  • Memory: 16.69 GB

Load Taxi Zone data with GeoPandas

We will segment our data by the official taxi zones available here.

In [2]:
import geopandas as gpd
zones = gpd.read_file('taxi_zones.shp').to_crs({'init' :'epsg:4326'})
zones['zone'] = zones.zone.astype('category')
zones['borough'] = zones.borough.astype('category')
zones.head()
I am densified (external_values, 263 elements)
I am densified (5 elements)
I am densified (5 elements)
Out[2]:
OBJECTID Shape_Leng Shape_Area zone LocationID borough geometry
0 1 0.116357 0.000782 Newark Airport 1 EWR POLYGON ((-74.18445299999996 40.6949959999999,...
1 2 0.433470 0.004866 Jamaica Bay 2 Queens (POLYGON ((-73.82337597260663 40.6389870471767...
2 3 0.084341 0.000314 Allerton/Pelham Gardens 3 Bronx POLYGON ((-73.84792614099985 40.87134223399991...
3 4 0.043567 0.000112 Alphabet City 4 Manhattan POLYGON ((-73.97177410965318 40.72582128133705...
4 5 0.092146 0.000498 Arden Heights 5 Staten Island POLYGON ((-74.17421738099989 40.56256808599987...

Plot with matplotlib

We can easily plot with matplotlib, which is builtin to geopandas

In [3]:
%matplotlib inline
zones.plot(column='borough', categorical=True)
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f5762442668>

Plot with Bokeh

But for most of this notebook we'll be using Bokeh

In [4]:
from bokeh.io import output_notebook
from bokeh.models import GeoJSONDataSource, HoverTool, CategoricalColorMapper, LinearColorMapper
from bokeh.plotting import figure, show
from bokeh.palettes import Category10
output_notebook()
Loading BokehJS ...
In [5]:
geo_source = GeoJSONDataSource(geojson=zones.to_json())

factors = zones.borough.drop_duplicates()
color_mapper = CategoricalColorMapper(factors=factors.tolist(), palette=Category10[10])

fig = figure()
fig.patches(xs='xs', ys='ys', alpha=0.9, source=geo_source, 
          color={'field': 'borough', 'transform': color_mapper},
         )

hover = HoverTool(
    point_policy='follow_mouse',
    tooltips='<p><b>Borough</b>: @borough</p><p><b>Zone</b>: @zone</p>'
)
fig.add_tools(hover)

fig.xaxis.visible = False
fig.yaxis.visible = False
fig.grid.visible = False

show(fig)