Citibike is NYC's new(ish) and very popular bike share program. Citibike customers can take a bike from any station, and leave it at any other station. It's natural to assume that after some time, certain areas of the city will end up with a high density of bikes, and others areas will have few bikes. Moreover, this relative density is likely to fluctuate in both space and time.
The following is an attempt to visualize the approximate density of available bikes throughout the day, using Citibike's own system data available at: http://citibikenyc.com/system-data
# data handling and formatting
import pandas as pd
import numpy as np
import datetime as dt
from copy import copy, deepcopy
# plotting
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import matplotlib.figure as fig
import matplotlib.image as mpimg
# image creation / modification
import scipy.ndimage as ndi
from scipy import misc
from matplotlib.colors import LinearSegmentedColormap
# mapping / shapefile stuff
from mpl_toolkits.basemap import Basemap
from shapelib import ShapeFile
import dbflib
Set the path to the Citibike data file. This is the first available file on http://citibikenyc.com/system-data , and spans all of July 2013 so not all of the current citibike stations are going to be active in this file.
data_path = '2013-07-CitiBikeTripData.csv'
The parameter time_increment
refers to the amount of time (minutes) by which we step forward in each frame of the animation, and time_window_size
is the size of the time window we're looking at in each frame. number_of_days
is the number of days' worth of data that should be used in the visualization.
time_increment = 5
time_window_size = 10
number_of_days = 3 # the number of days worth of data that should be used in the visualization
The boundary lat/lon values of the map are chosen to give a decent bounding box around lower Manhattan and much of northwestern brooklyn. This is the area we'll be looking in.
lon0 = -74.0417
lon1 = -73.8982
lat0 = 40.6851
lat1 = 40.7721
The following parameters are for use with a grid-density gaussian filter algorithm, where image_width
and image_height
refer to the width and height (in pixels) of a grid (2D array) that we will use to represent the geographic area we're looking at.
The gaussian_filter_radius
parameter is the radius we'll be feeding into the scipy.ndimage.gaussianfilter()
method.
image_width = 200
image_height = 200
gaussian_filter_radius = 3
The data comes in the form of a large CSV file, so use Pandas' read_csv
method to read the data.
# read in the undocking data
undocking_dataframe = pd.read_csv(data_path, usecols=[1,5,6], index_col=0, parse_dates=True)
#read in the docking datax
docking_dataframe_unsorted = pd.read_csv(data_path, usecols=[2,9,10], index_col=0, parse_dates=True)
docking_dataframe = docking_dataframe_unsorted.sort_index()
We now have two dataframes, undocking_dataframe
and docking_dataframe
each indexed by a timestamp (sorted earliest to latest), and containing a latitude and a longitude. See below for an example.
undocking_dataframe.head()
start station latitude | start station longitude | |
---|---|---|
starttime | ||
2013-07-01 00:00:00 | 40.753231 | -73.970325 |
2013-07-01 00:00:02 | 40.749718 | -74.002950 |
2013-07-01 00:01:04 | 40.730287 | -73.990765 |
2013-07-01 00:01:06 | 40.718939 | -73.992663 |
2013-07-01 00:01:10 | 40.734927 | -73.992005 |
5 rows × 2 columns
Now we need a map on which to plot our geographic data. We use mpl_toolkits.basemap.Basemap
to instantiate a map, using our latitude and longitude bounds.
m = Basemap(llcrnrlon=lon0, llcrnrlat=lat0, urcrnrlon=lon1, urcrnrlat=lat1, resolution='h', lat_0=lat0, lon_0=lon0)
To create the effect of a more detailed map, we add a background image to the basemap.
osm_image = misc.imread('map.png')
#plot it onto the map
m.imshow(osm_image, extent=[lon0, lat0, lon1, lat1], origin='upper', aspect='auto')
<matplotlib.image.AxesImage at 0x10dbaa450>
Create a custom colormap for the m.imshow()
method to use. This will display high numbers as red, low numbers as blue, and in-the-middle numbers as clear.
bwr_cmap = plt.get_cmap('bwr')
cdict_new = bwr_cmap._segmentdata.copy()
cdict_new['alpha'] = ((0.0, 1.0, 1.0),
# (0.25,1.0, 1.0),
(0.6, 0.4, 0.0),
# (0.75,1.0, 1.0),
(1.0, 1.0, 1.0))
blue_red1 = LinearSegmentedColormap('BlueRed1', cdict_new)
The method here is adapted from this StackOverflow question: http://stackoverflow.com/questions/6652671/efficient-method-of-calculating-density-of-irregularly-spaced-points
As a slight modification, the algorithm is broken into three methods:
grid_density_gaussian_filter_img
applies a gaussian filter to the input image
(which is a 2D array) using scipy.ndimage.gaussian_filter
and returns the result.
iterate_image
takes a 2D-array (previous_img
) and the docking/undocking data (positive_data
and negative_date
). The 'grid density' algorithm simply increments a pixel in previous_img
for each corresponding point in positive_data
, or decrements a pixel in previous_img
for each corresponding point in negative_date
.
deacy_img
is an optionally-used method which takes the current image array (current_img
) and reduces the absolute value of every nonzero pixel in the array by 1. The resulting image is returned. This prevents areas of really high density from swamping out all the other areas.
# return the gaussian filtered data for a given current_img
def grid_density_gaussian_filter_img(current_img, radius):
r = radius
return ndi.gaussian_filter(current_img, (r,r)) #apply gaussian filter to the image and return the ndimage
# go through and bring all image data closer to zero
def decay_img(current_img):
for index,value in np.ndenumerate( current_img ):
if value > 0:
current_img[index] -= 1
if value < 0:
current_img[index] += 1
return current_img
# add 'positive_data' and subtract 'negative data' to previous_img and return a new image with those values
def iterate_image(previous_img, x0, y0, x1, y1, w, h, positive_data, negative_data):
imgw = previous_img.shape[0]
imgh = previous_img.shape[1]
new_img = copy(previous_img)
kx = (imgw - 1) / (x1 - x0)
ky = (imgh - 1) / (y1 - y0)
for x, y in positive_data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
if 0 <= ix < imgw and 0 <= iy < imgh:
new_img[iy][ix] += 1
for x, y in negative_data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
if 0 <= ix < imgw and 0 <= iy < imgh:
new_img[iy][ix] -= 1
return new_img
To generate an animated heatmap of all the docking or undocking events in the CitiBike system, one approach is to generate an array of objects of the type matplotlib.image.AxesImage
and then call Matplotlib's animation.ArtistAnimation
method on that array to generate an animation. Below is the code for the hard part -- building the array of AxesImages from our data.
# find the earliest undocking timestamp in the database -- this will be the start time of the data used for the animation
initial_start_date_undocking = undocking_dataframe.index[0]
# ims will hold the series of images that make up the animation
ims = []
# determine the number of frames in the animation
num_frames = (number_of_days*24*60)/time_increment
for i in range(num_frames):
# find the indices of the next time windows for undocking events
next_start_date_undocking_index = undocking_dataframe.index.searchsorted( initial_start_date_undocking + dt.timedelta(minutes=time_increment*i) )
next_end_date_undocking_index = undocking_dataframe.index.searchsorted( initial_start_date_undocking + dt.timedelta(minutes=(time_window_size + time_increment*i) ) )
# grab the next time windows for undocking events
next_start_date_undocking = undocking_dataframe.index[next_start_date_undocking_index]
next_end_date_undocking = undocking_dataframe.index[next_end_date_undocking_index]
# find the indices of the next time windows for re-docking events
next_start_date_docking_index = docking_dataframe.index.searchsorted( next_start_date_undocking )
next_end_date_docking_index = docking_dataframe.index.searchsorted( next_end_date_undocking )
# grab the next time windows for undocking events
next_start_date_docking = docking_dataframe.index[next_start_date_docking_index]
next_end_date_docking = docking_dataframe.index[next_end_date_docking_index]
# get the next few undocking events
next_data_undocking = undocking_dataframe[next_start_date_undocking:next_end_date_undocking]
x_udata = next_data_undocking['start station longitude']
y_udata = next_data_undocking['start station latitude']
x_undocking, y_undocking = m(x_udata,y_udata)
# get the next few docking events
next_data_docking = docking_dataframe[next_start_date_docking:next_end_date_docking]
x_ddata = next_data_docking['end station longitude']
y_ddata = next_data_docking['end station latitude']
x_docking, y_docking = m(x_ddata,y_ddata)
# get the new heatmap image
if i == 0:
new_heatmap_image = np.zeros((image_width, image_height))
else:
new_heatmap_image = iterate_image(heatmap_image, lon0, lat0, lon1, lat1, image_width, image_height, zip(x_docking,y_docking), zip(x_undocking,y_undocking))
if i % 100 == 0:
new_heatmap_image = decay_img(new_heatmap_image)
# generate the z dimension with the newlt iterated image
z_dimension = grid_density_gaussian_filter_img(new_heatmap_image, gaussian_filter_radius)
#reset the heatmap_image variable
heatmap_image = new_heatmap_image
# produce the im from this data
heatmap_im = m.imshow(z_dimension , origin='lower', extent=[lon0, lat0, lon1, lat1], alpha=.4, vmin=-1, vmax=1, cmap=blue_red1, aspect='auto')
# find the text string for displaying time/date
current_date = initial_start_date_undocking + dt.timedelta(minutes=time_increment*i)
current_date_string = current_date.strftime('%I:%M%p on %a %b %d')
current_date_plotted = plt.text(-73.9509, 40.6909, current_date_string, fontsize=14)
# append all Artist instances to the array of drawn images
ims.append([current_date_plotted,heatmap_im])
# set the figure size
plt.gcf().set_size_inches(15.0, 12.0)
plt.gcf().subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=None, hspace=None)
# run the animation
anim = animation.ArtistAnimation(plt.gcf(), ims, interval=50, blit=True, repeat=True)
#plot all of these points
plt.show()