Diagnosing bike street accidents of DC

The purpose of this notebook is to delve deeper to gain a deeper understanding of the nature of bike crashes, using the Crashes In DC and Crash Details Table datasets of DC. Here, we'll see

  1. A map highlighting the magnitude of bike accidents in DC

  2. Assessing if all these accidents happen on dedicated bike lanes

  3. Learning some more about the nature of these bike crashes

But why should I care?

  • According to the American Community Survey, about 13,000 D.C. residents bike to work each day, with an increase of 1,200 new cyclists each year.
  • In 2015, bicyclists had the largest increase of fatalities (over 800, ie 12%) among all the roadway user groups, including drivers and pedestrian
  • Only 53% of all fatal bike crashes get reported in newspapers, and usually as only one story about the fatal crash itself.
In [1]:
import pandas as pd

The crash data for DC is collected in two separate tables

  1. Crashes in DC - providing location and injury data for the crashes

  2. Crash Details Table providing supplementary data about the people and vehicles involved in the crash

Datasets used here can be found on opendata.dc.gov

Reading in the DC crash dataset

In [2]:
#Reading 'Crashes in DC'
crash = pd.read_csv('D:\Data\Crashes_In_DC\Crashes_in_DC.csv', low_memory=False)
crash.shape
Out[2]:
(140773, 49)

Let's now explore the various fields in the crash table

In [3]:
crash.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 140773 entries, 0 to 140772
Data columns (total 49 columns):
X                             140106 non-null float64
Y                             140106 non-null float64
OBJECTID                      140773 non-null int64
CRIMEID                       140773 non-null int64
CCN                           140773 non-null object
REPORTDATE                    139736 non-null object
ROUTEID                       140144 non-null object
MEASURE                       140144 non-null float64
OFFSET                        140144 non-null float64
STREETSEGID                   140144 non-null float64
ROADWAYSEGID                  140144 non-null float64
FROMDATE                      140773 non-null object
TODATE                        0 non-null float64
MARID                         140773 non-null int64
ADDRESS                       140745 non-null object
LATITUDE                      140773 non-null float64
LONGITUDE                     140773 non-null float64
XCOORD                        140773 non-null float64
YCOORD                        140773 non-null float64
WARD                          140772 non-null object
EVENTID                       140773 non-null object
MAR_ADDRESS                   135852 non-null object
MAR_SCORE                     140773 non-null float64
MAJORINJURIES_BICYCLIST       140751 non-null float64
MINORINJURIES_BICYCLIST       140751 non-null float64
UNKNOWNINJURIES_BICYCLIST     140751 non-null float64
FATAL_BICYCLIST               140773 non-null int64
MAJORINJURIES_DRIVER          140751 non-null float64
MINORINJURIES_DRIVER          140751 non-null float64
UNKNOWNINJURIES_DRIVER        140751 non-null float64
FATAL_DRIVER                  140773 non-null int64
MAJORINJURIES_PEDESTRIAN      140751 non-null float64
MINORINJURIES_PEDESTRIAN      140751 non-null float64
UNKNOWNINJURIES_PEDESTRIAN    140751 non-null float64
FATAL_PEDESTRIAN              140773 non-null int64
TOTAL_VEHICLES                140751 non-null float64
TOTAL_BICYCLES                140773 non-null int64
TOTAL_PEDESTRIANS             140751 non-null float64
PEDESTRIANSIMPAIRED           140773 non-null int64
BICYCLISTSIMPAIRED            140773 non-null int64
DRIVERSIMPAIRED               140773 non-null int64
TOTAL_TAXIS                   140751 non-null float64
TOTAL_GOVERNMENT              140751 non-null float64
SPEEDING_INVOLVED             140751 non-null float64
NEARESTINTROUTEID             133240 non-null object
NEARESTINTSTREETNAME          133136 non-null object
OFFINTERSECTION               140144 non-null float64
INTAPPROACHDIRECTION          133240 non-null object
LOCERROR                      0 non-null float64
dtypes: float64(28), int64(10), object(11)
memory usage: 52.6+ MB
In [4]:
crash.head()
Out[4]:
X Y OBJECTID CRIMEID CCN REPORTDATE ROUTEID MEASURE OFFSET STREETSEGID ... BICYCLISTSIMPAIRED DRIVERSIMPAIRED TOTAL_TAXIS TOTAL_GOVERNMENT SPEEDING_INVOLVED NEARESTINTROUTEID NEARESTINTSTREETNAME OFFINTERSECTION INTAPPROACHDIRECTION LOCERROR
0 -77.012152 38.919710 18335710 26614138 11024141 2011-02-22T05:00:00.000Z 11000102 3151.632 0.0 10744.0 ... 0 0 0.0 0.0 0.0 11009032 ADAMS ST NW 58.541660 South NaN
1 -77.012158 38.915732 18335711 26614272 11020933 2011-02-16T05:00:00.000Z 11000102 2709.977 0.0 4878.0 ... 0 0 0.0 0.0 0.0 11075462 RHODE ISLAND AVE NW 0.076475 Northeast NaN
2 -77.012145 38.926289 18335712 27266870 17054267 2017-04-04T00:50:44.000Z 11000102 3881.933 0.0 11579.0 ... 0 0 0.0 1.0 0.0 11060582 MICHIGAN AVE NW 0.034064 South NaN
3 -77.012177 38.901874 18335713 25171264 14165295 2014-10-25T05:00:00.000Z 11000102 1171.651 0.0 1268.0 ... 0 0 0.0 0.0 0.0 11047772 I ST NW 60.933379 North NaN
4 -77.012174 38.900391 18335714 25143825 14052411 2014-04-16T05:00:00.000Z 11000102 1006.145 0.0 15414.0 ... 0 0 0.0 0.0 0.0 11042442 H ST NW 20.706903 North NaN

5 rows × 49 columns

1. A map highlighting the magnitude of bike accidents in DC

The dataset records around 140k crashes, mostly from 2015 through September 2017. With an alarming number like that, let's see how frequent bike accidents are.

Here, we'll plot a heatmap using the Python package Bokeh, to qualitatively gauge the frequency of bike crashes in DC.

In [5]:
#Importing necessary packages
from bokeh.models import BoxZoomTool
from bokeh.plotting import figure, output_notebook, show
import datashader as ds
from datashader.bokeh_ext import InteractiveImage
from functools import partial
from datashader.utils import export_image
from datashader import transfer_functions as tf
from datashader.colors import colormap_select, Greys9, Hot, inferno
Greys9_r = list(reversed(Greys9))[:-2]
C:\Users\manu9321\AppData\Local\Continuum\Anaconda3\lib\site-packages\odo\backends\pandas.py:102: FutureWarning: pandas.tslib is deprecated and will be removed in a future version.
You can access NaTType as type(pandas.NaT)
  @convert.register((pd.Timestamp, pd.Timedelta), (pd.tslib.NaTType, type(None)))

We start by creating an empty plot container

In [6]:
output_notebook()

DC = x_range, y_range = ((-77.113633,-76.910012), (38.812061,38.993699))

plot_width  = int(750)
plot_height = int(plot_width//1.2)

def base_plot(tools='pan,wheel_zoom,reset',plot_width=plot_width, plot_height=plot_height, **plot_args):
    p = figure(tools=tools, plot_width=plot_width, plot_height=plot_height,
        x_range=x_range, y_range=y_range, outline_line_color=None,
        min_border=0, min_border_left=0, min_border_right=0,
        min_border_top=0, min_border_bottom=0, **plot_args)
    
    p.axis.visible = False
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None
    
    p.add_tools(BoxZoomTool(match_aspect=True))
    
    return p
    
options = dict(line_color=None, fill_color='blue', size=5)
Loading BokehJS ...

Let's now create a column in the table, BIKE_TOTAL, that records all the bike injuries/fatalities.

In [7]:
crash['BIKE_TOTAL'] = crash['MAJORINJURIES_BICYCLIST'] + crash['MINORINJURIES_BICYCLIST'] + crash['UNKNOWNINJURIES_BICYCLIST'] + crash['FATAL_BICYCLIST'] + crash['BICYCLISTSIMPAIRED']
In [8]:
background = "black"
export = partial(export_image, export_path="export", background=background)
cm = partial(colormap_select, reverse=(background=="black"))

def create_image(x_range, y_range, w=plot_width, h=plot_height):
    cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
    agg = cvs.points(crash, 'X', 'Y',  ds.count('BIKE_TOTAL'))
    img = tf.shade(agg, cmap=Hot, how='eq_hist')
    return tf.dynspread(img, threshold=0.5, max_px=4)

p = base_plot(background_fill_color=background)
export(create_image(*DC),"DC_bikeCrashes")
InteractiveImage(p, create_image)
Out[8]: