For better experience, view this notebook in nbviewer here

Docker image used: jupyter/datascience-notebook

To connect to a dockerized Jupyter server:

docker pull jupyter/datascience-notebook
docker run -it -p host_port_number:8888 -v /your/host/directory:/home/jovyan/work --name your_container_name jupyter/datascience-notebook
install additional packages as needed: docker exec your_container_name pip install package_name

Exploratory Data Analysis - Traffic Collisions in the City of Los Angeles¶

Context¶

The World Health Organization (WHO) and U.S. Department of Transportation statistics show that approximately 1.35 million people in the world or more than 35,000 people in the U.S. lost their lives due to traffic collisions each year. In Los Angeles alone, more than 200 people were killed each year while trying to move around throughout the city. Bikers, pedestrians, children, and older adults were particularly vulnerable on the city’s streets. However, traffic deaths are predictable and preventable, and everyone has the right to move safely in their communities. Thus, the City of Los Angeles has committed to the Vision Zero initiative, aiming to eliminate all traffic deaths and series injuries by 2025. In order to achieve the grand vision of Vision Zero, we first need to understand our existing collisions better. Thanks to the California Highway Patrol (CHP), we have data for all reported traffic collisions in the City of Los Angeles from 2009 to 2018. This project includes an exploratory data analysis on traffic collisions in the City of Los Angeles, with a light touch of machine learning with scikit-learn, aiming to answer questions such as:

when and where do collisions happen?
what are their risk factors?
who are the victims?
where are the most dangerous intersections at which severe collisions tend to happen?
...

The City of Los Angeles Data Portal provides up-to-date collision data from 2010 to present, but with limited attributes for each incident. Although not used in this project, I connected the dataset to Carto just to show a rough idea of what we are dealing with here.

In [1]:

%%HTML
# embed link

Import packages & set plot style¶

In [2]:

import os
import pandas as pd
import pandas as pd
import numpy as np
import googlemaps

from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.preprocessing import StandardScaler

import cartoframes
from cartoframes.auth import set_default_context, Context
from cartoframes import Credentials
from cartoframes import Layer, BaseMap, styling
from cartoframes.viz import Map, basemaps

import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns

%matplotlib inline

In [3]:

USERNAME = '----'
APIKEY = '----'
creds = Credentials(username=USERNAME, key=APIKEY)
cc = cartoframes.CartoContext(creds=creds)

In [4]:

plt.style.use('ggplot')
plt.rcParams['axes.facecolor'] = '#F5F5F5'

Get the data ready¶

Load the dataset¶

In [5]:

file_path = '/home/jovyan/work/'
file_name_v = 'victim_2009_2018'
input_data_v = os.path.join(file_path, file_name_v + '.csv')
input_data_v

Out[5]:

'/home/jovyan/work/victim_2009_2018.csv'

In [6]:

file_path = '/home/jovyan/work/'
file_name = 'collision_2009_2018'
input_data = os.path.join(file_path, file_name+'.csv')
input_data

Out[6]:

'/home/jovyan/work/collision_2009_2018.csv'

In [7]:

collision = pd.read_csv(input_data, parse_dates=(['COLLISION_DATE']))
collision.head()

Out[7]:

	CASE_ID	ACCIDENT_YEAR	PROC_DATE	JURIS	COLLISION_DATE	COLLISION_TIME	OFFICER_ID	REPORTING_DISTRICT	DAY_OF_WEEK	CHP_SHIFT	...	COUNT_PED_KILLED	COUNT_MC_KILLED	PRIMARY_RAMP	SECONDARY_RAMP	LATITUDE	LONGITUDE
0	3918943	2009	20091202	1942	2009-03-23	1745	35046	1283	1	5	...	0	0	-	-	NaN	NaN
1	3984219	2009	20090115	1942	2009-01-03	540	33829	1701	6	5	...	0	0	-	-	NaN	NaN
2	3984274	2009	20090126	1942	2009-01-07	630	38614	152	3	5	...	1	0	-	-	NaN	NaN
3	3984281	2009	20090126	1942	2009-01-08	50	32896	1985	4	5	...	0	0	-	-	NaN	NaN
4	3984352	2009	20100914	1942	2009-01-11	1450	24517	2014	7	5	...	0	1	-	-	NaN	NaN

5 rows × 76 columns

In [8]:

victim = pd.read_csv(input_data_v)
victim.head()

Out[8]:

	CASE_ID	PARTY_NUMBER	VICTIM_ROLE	VICTIM_SEX	VICTIM_AGE	VICTIM_DEGREE_OF_INJURY	VICTIM_SEATING_POSITION	VICTIM_SAFETY_EQUIP_1	VICTIM_SAFETY_EQUIP_2
0	3918943	2	1	M	25	4	1	N	G
1	3984219	1	1	M	19	1	1	L	G
2	3984274	2	3	F	58	1	9	-	-
3	3984281	1	1	M	29	1	1	L	H
4	3984352	1	2	F	998	0	3	L	G

Make column headers lower case¶

In [9]:

collision.columns = map(str.lower, collision.columns)
collision.columns = collision.columns.str.replace(' ', '_')
collision.columns

Out[9]:

Index(['case_id', 'accident_year', 'proc_date', 'juris', 'collision_date',
       'collision_time', 'officer_id', 'reporting_district', 'day_of_week',
       'chp_shift', 'population', 'cnty_city_loc', 'special_cond', 'beat_type',
       'chp_beat_type', 'city_division_lapd', 'chp_beat_class', 'beat_number',
       'primary_rd', 'secondary_rd', 'distance', 'direction', 'intersection',
       'weather_1', 'weather_2', 'state_hwy_ind', 'caltrans_county',
       'caltrans_district', 'state_route', 'route_suffix', 'postmile_prefix',
       'postmile', 'location_type', 'ramp_intersection', 'side_of_hwy',
       'tow_away', 'collision_severity', 'number_killed', 'number_injured',
       'party_count', 'primary_coll_factor', 'pcf_code_of_viol',
       'pcf_viol_category', 'pcf_violation', 'pcf_viol_subsection',
       'hit_and_run', 'type_of_collision', 'mviw', 'ped_action',
       'road_surface', 'road_cond_1', 'road_cond_2', 'lighting',
       'control_device', 'chp_road_type', 'pedestrian_accident',
       'bicycle_accident', 'motorcycle_accident', 'truck_accident',
       'not_private_property', 'alcohol_involved', 'stwd_vehtype_at_fault',
       'chp_vehtype_at_fault', 'count_severe_inj', 'count_visible_inj',
       'count_complaint_pain', 'count_ped_killed', 'count_ped_injured',
       'count_bicyclist_killed', 'count_bicyclist_injured', 'count_mc_killed',
       'count_mc_injured', 'primary_ramp', 'secondary_ramp', 'latitude',
       'longitude'],
      dtype='object')

In [10]:

victim.columns = map(str.lower, victim.columns)
victim.columns = victim.columns.str.replace(' ', '_')
victim.columns

Out[10]:

Index(['case_id', 'party_number', 'victim_role', 'victim_sex', 'victim_age',
       'victim_degree_of_injury', 'victim_seating_position',
       'victim_safety_equip_1', 'victim_safety_equip_2', 'victim_ejected'],
      dtype='object')

Drop unnecessary attributes for the EDA and set index column¶

In [11]:

to_drop = [
    'juris',   # all the same since city of Los Angeles
    'proc_date',  # process date
    'population',  # incoporated over 250000
    'officer_id', 
    'reporting_district', 
    'chp_shift', 
    'cnty_city_loc', 
    'special_cond',
    'beat_type', 
    'chp_beat_type', 
    'city_division_lapd',
    'chp_beat_class', 
    'beat_number', 
    'stwd_vehtype_at_fault',
    'chp_vehtype_at_fault',
    'caltrans_county', 
    'caltrans_district',
    'weather_2', # only if a second description is necessary
    'primary_ramp',
    'secondary_ramp'
]

collision.drop(to_drop, axis=1, inplace=True)
collision.set_index('collision_date', inplace=True)
collision.head()

Out[11]:

	case_id	accident_year	collision_time	day_of_week	primary_rd	secondary_rd	distance	direction	intersection	weather_1	...	count_visible_inj	count_complaint_pain	count_ped_killed	count_ped_injured	count_bicyclist_killed	count_bicyclist_injured	count_mc_killed	count_mc_injured	latitude	longitude
collision_date
2009-03-23	3918943	2009	1745	1	WESTERN AV	CENTURY BL	55.0	N	N	A	...	0	1	0	0	0	0	0	0	NaN	NaN
2009-01-03	3984219	2009	540	6	TAMPA AV	PINE VALLEY AV	420.0	N	N	-	...	0	0	0	0	0	0	0	0	NaN	NaN
2009-01-07	3984274	2009	630	3	5TH ST	FLOWER ST	0.0	NaN	Y	A	...	0	0	1	0	0	0	0	0	NaN	NaN
2009-01-08	3984281	2009	50	4	VAN NUYS BL	VESPER AV	0.0	NaN	Y	A	...	0	0	0	0	0	0	0	0	NaN	NaN
2009-01-11	3984352	2009	1450	7	HARVARD BL	3RD ST	0.0	NaN	Y	A	...	0	0	0	0	0	0	1	0	NaN	NaN

5 rows × 55 columns

Replace labels - collision dataset¶

In [12]:

# day of week
dowdict = {
    1: 'Monday',
    2: 'Tuesday',
    3: 'Wednesday',
    4: 'Thursday',
    5: 'Friday',
    6: 'Saturday',
    7: 'Sunday'
}

collision['day_of_week'].replace(
    to_replace=[i for i in dowdict.keys()], 
    value=[i for i in dowdict.values()],
    inplace=True
)


# type of collision
tocdict = {
    'A': 'Head-On',
    'B': 'Sideswipe',
    'C': 'Rear End',
    'D': 'Broadside',
    'E': 'Hit Object',
    'F': 'Overturned',
    'G': 'Vehicle/Pedestrian',
    'H': 'Other',
    '-': 'Not Stated'
}

collision['type_of_collision'].replace(
    to_replace=[i for i in tocdict.keys()],
    value=[i for i in tocdict.values()],
    inplace=True
)


# violation category 
vcdict = {
    '0': 'Unknown',
    '1': 'Driving or Bicycling Under the Influence of Alcohol or Drug',
    '2': 'Impeding Traffic',
    '3': 'Unsafe Speed',
    '4': 'Following Too Closely',
    '5': 'Wrong Side of Road',
    '6': 'Improper Passing',
    '7': 'Unsafe Lane Change',
    '8': 'Improper Turning',
    '9': 'Automobile Right of Way',
    '10': 'Pedestrian Right of Way',
    '11': 'Pedestrian Violation',
    '12': 'Traffic Signals and Signs',
    '13': 'Hazardous Parking',
    '14': 'Lights',
    '15': 'Brakes',
    '16': 'Other Equipment',
    '17': 'Other Hazardous Violation',
    '18': 'Other Than Driver (or Pedestrian)',
    '19': None,
    '20': None,
    '21': 'Unsafe Starting or Backing',
    '22': 'Other Improper Driving',
    '23': 'Pedestrian or "Other" Under the Influence of Alcohol or Drug',
    '24': 'Fell Asleep',
    '- ': 'Not Stated'
}

collision['pcf_viol_category'].replace(
    to_replace=[i for i in vcdict.keys()],
    value=[i for i in vcdict.values()],
    inplace=True
)


# collission severity 
csdict = {
    0: 'Property Damage',
    1: 'Fatal',
    2: 'Severe',
    3: 'Other Visible Injuries',
    4: 'Complaint of Pain'
}

collision['collision_severity'].replace(
    to_replace=[i for i in csdict.keys()],
    value=[i for i in csdict.values()],
    inplace=True
)


# location type
ltdict = {
    'H': 'Highway',
    'I': 'Intersection',
    'R': 'Ramp'
}

collision['location_type'].replace(
    to_replace=[i for i in ltdict.keys()],
    value=[i for i in ltdict.values()],
    inplace=True
)


# weather
weatherdict = {
    'A': 'Clear',
    'B': 'Cloudy',
    'C': 'Raining',
    'D': 'Snowing',
    'E': 'Fog',
    'F': 'Other',
    'G': 'Wind',
    '- ': 'Not stated'
}

collision['weather_1'].replace(
    to_replace=[i for i in weatherdict.keys()], 
    value=[i for i in weatherdict.values()], 
    inplace=True
)


# bike & ped collision
def bike_ped(row):
    if (row.bicycle_accident == 'Y') | (row.pedestrian_accident == 'Y'):
        return 'Y'
    else:
        return 'N'
collision['bike_ped_collision'] = collision.apply(bike_ped, axis=1)

Replace labels - victim dataset¶

In [13]:

# degree of injury
doidict = {
    0: 'No Injury',
    1: 'Killed',
    2: 'Severe Injury',
    3: 'Other Visible Injury',
    4: 'Complaint of Pain',
    5: 'Other',
    6: 'Other',
    7: 'Other'
}

victim.victim_degree_of_injury.replace(
    to_replace=[i for i in doidict.keys()],
    value=[i for i in doidict.values()],
    inplace=True
)

victim age & gender clean-up¶

In [14]:

# age 998 is not stated - replace with NaN
# age 999 is fatal fetus - replace with age 0 for now; there are 6 of them :(
victim['victim_age'].replace({998: np.nan, 999: 0}, inplace=True)
victim['victim_sex'].replace('-', np.nan, inplace=True)

Add time period attributes based on collision time¶

Early morning: 12AM - 6AM
AM Peak: 6AM - 9:30AM
Midday: 9:30AM - 2PM
PM Peak: 2PM - 6:30PM
Evening: 6:30PM - 12AM

In [15]:

ranges = [0, 600, 930, 1400, 1830, 2400]
labels = ['Early Morning', 'AM Peak', 'Midday', 'PM Peak', 'Evening']

collision['time_period'] = pd.cut(
    collision.collision_time, 
    ranges, 
    labels=labels
).astype('category')

collision[['collision_time', 'time_period']].head()

Out[15]:

	collision_time	time_period
collision_date
2009-03-23	1745	PM Peak
2009-01-03	540	Early Morning
2009-01-07	630	AM Peak
2009-01-08	50	Early Morning
2009-01-11	1450	PM Peak

Add hour of day attributes based on collision time¶

In [16]:

ranges = list(np.arange(0, 2401, 100))
labels = list(np.arange(0, 24))
collision['hour_of_day'] = pd.cut(
    collision.collision_time, 
    ranges, 
    labels=labels
).astype('category')

collision[['collision_time', 'hour_of_day']].head()

Out[16]:

	collision_time	hour_of_day
collision_date
2009-03-23	1745	17
2009-01-03	540	5
2009-01-07	630	6
2009-01-08	50	0
2009-01-11	1450	14

In [17]:

# collision.reset_index().to_csv('collision_09_18_cleaned.csv', index=None)

Exploratory Data Analysis¶

What are we dealing with?¶

In [18]:

print(
    "The dataset has {:,} rows/collisions and {:,} attributes/columns.".format(
        collision.shape[0], 
        collision.shape[1]
    )
)


print(
    "Among the {A:,} collisions: \n\
    {B:,.0f} were fatal... \n\
    {C:,.0f} involved a cyclist... \n\
    {D:,.0f} involved a pedestrian...".format(
        A = collision.shape[0], 
        B = len(collision[collision.collision_severity=='Fatal']),
        C = len(collision[collision.bicycle_accident=='Y']),
        D = len(collision[collision.pedestrian_accident=='Y'])
    )
)

The dataset has 355,488 rows/collisions and 58 attributes/columns.
Among the 355,488 collisions: 
    1,885 were fatal... 
    21,663 involved a cyclist... 
    28,593 involved a pedestrian...

Overall collision trend by year (matplotlib)¶

What collision severity levels do we have?¶

In [19]:

collision.collision_severity.unique()

Out[19]:

array(['Complaint of Pain', 'Fatal', 'Property Damage',
       'Other Visible Injuries', 'Severe'], dtype=object)

group by year¶

In [20]:

# agg(list(tuple())) to create new columns 
# agg(dict()) to apply functions to existing columns
severe_fatal_year = collision.groupby('accident_year').collision_severity.agg([
    ('All Collisions', len),
    ('Severe Collisions', lambda x: ((x == 'Severe') | (x == 'Fatal')).sum()),
    ('Percent of Severe Collisions', lambda x: ((x == 'Severe') | (x == 'Fatal')).mean())
    ]
)
severe_fatal_year

Out[20]:

	All Collisions	Severe Collisions	Percent of Severe Collisions
accident_year
2009	35822	1109	0.030959
2010	34446	1038	0.030134
2011	33422	1057	0.031626
2012	33562	1145	0.034116
2013	31937	1114	0.034881
2014	33148	1133	0.034180
2015	36961	1142	0.030897
2016	38487	1425	0.037025
2017	39094	1480	0.037857
2018	38609	1459	0.037789

Collision by year¶

In [21]:

ax = severe_fatal_year[['All Collisions', 'Severe Collisions']].plot(
    kind='bar', 
    rot=0, 
    figsize=(12, 6), 
    color=['navy', 'orangered']
)

ax.set_title('Collisions by Year (2009 - 2018)', fontsize=14)
ax.set_ylabel('Number of Collisions')
ax.set_xlabel(' ')
ax.set_ylim([0, 45000])
ax.legend(loc='upper left')

fmt = '{x:,.0f}'
ax.yaxis.set_major_formatter(mtick.StrMethodFormatter(fmt))

for tick in ax.yaxis.get_major_ticks():
    tick.label.set_fontsize(12)
for tick in ax.xaxis.get_major_ticks():
    tick.label.set_fontsize(12)

ax.axhline(severe_fatal_year['All Collisions'].mean(), color='red', linestyle='dashed', linewidth=1)

pass

Overall collision trend by year - plotly (figures as dictionaries)¶

In [22]:

fig = {
    "data": [{"type": "bar",
              "x": severe_fatal_year.index,
              "y": severe_fatal_year['All Collisions'],
              "name": "All",
              "marker_color": "rgb(0, 0, 128)",
              "hoverinfo": "y"
             },
             {"type": "bar",
              "x": severe_fatal_year.index,
              "y": severe_fatal_year['Severe Collisions'],
              "name": "Severe",
              "marker_color": "rgb(255, 69, 0)",
              "hoverinfo": "y"
             }],
    "layout": {"title": {"text": "Collisions by Year (2009 - 2018)", "font_size": 20},
               "xaxis": {"tickfont_size": 12, 
                         "type": "category"},
               "yaxis": {"title": "Nnumber of Collisions",  
                         "tickfont_size": 12},
               "barmode": "group",
               "bargap": 0.25,
               "bargroupgap": 0.1,
               "plot_bgcolor": "#F5F5F5"
              }
}

pio.show(fig)

Overall collision trend by year - plotly with plot control (figures as graph objects)¶

In [23]:

data = [severe_fatal_year[i] for i in ['All Collisions', 'Severe Collisions']]
names = ["Collisions", "Severe Collisions"]
colors = ["rgb(0,0,128)", "rgb(255, 69, 0)"]
plot_data = zip(data, names, colors)

# Create figures and annotations
fig = go.Figure()
annotation_list = []

for item in plot_data:
    fig.add_trace(
    go.Scatter(
        x=severe_fatal_year.index,
        y=item[0],
        name=item[1],
        marker_color=item[2]),
)
    
    annotation_text = "{:,.0f} {} <br> happend in {}".format(
        item[0].max(),
        item[1],
        item[0].idxmax())

    annotation = [
        dict(
            x=item[0].idxmax(),
            y=item[0].max(),
            xref="x", yref="y",
            text=annotation_text,
            ax=0, 
            ay=-40)]

    annotation_list.append(annotation)

# Create drop-downs and plot
fig.update_layout(
    updatemenus=[
        go.layout.Updatemenu(
            active=0,
            buttons=list([
                dict(label="All Collisions",
                     method="update",
                     args=[{"visible": [True, False]},
                           {"title": "All Collisions from 2009 to 2018",
                            "annotations": annotation_list[0]}]),
                dict(label="Severe Collisions",
                     method="update",
                     args=[{"visible": [False, True]},
                           {"title": "Severe Collisions from 2009 to 2018",
                            "annotations": annotation_list[1]}])]),
            bordercolor="rgb(192,192,192)",
            x=1,
            xanchor="right",
            y=1.12,
            yanchor="top"
        )],
    plot_bgcolor="#F5F5F5",
    legend_orientation="h",
    title_text="Collisions by Year (2009 - 2018)")

Bike & ped collisions by year¶

In [24]:

bp_collision = collision[collision.bike_ped_collision=='Y'][['accident_year', 'bike_ped_collision', 'collision_severity']]
bp_collision = bp_collision.groupby('accident_year').agg({
    'bike_ped_collision': 'count',
    'collision_severity': lambda x: ((x=='Severe') | (x=='Fatal')).sum()
    }
)

bp_collision['Percent BP Severe Collisions'] = bp_collision.collision_severity / bp_collision.bike_ped_collision
bp_collision.rename(columns={
    'bike_ped_collision': 'Bike & Ped Collision',
    'collision_severity': 'Severe Collisions'
    }, inplace=True
)

bp_collision

Out[24]:

	Bike & Ped Collision	Severe Collisions	Percent BP Severe Collisions
accident_year
2009	4578	515	0.112495
2010	4710	515	0.109342
2011	4803	483	0.100562
2012	5058	537	0.106168
2013	5124	536	0.104606
2014	5133	516	0.100526
2015	5110	519	0.101566
2016	5074	638	0.125739
2017	5225	697	0.133397
2018	5151	695	0.134925

In [25]:

# Concatenate with all collisions
collision_compare = pd.concat([bp_collision, severe_fatal_year], axis=1)
collision_compare

Out[25]:

	Bike & Ped Collision	Severe Collisions	Percent BP Severe Collisions	All Collisions	Severe Collisions	Percent of Severe Collisions
accident_year
2009	4578	515	0.112495	35822	1109	0.030959
2010	4710	515	0.109342	34446	1038	0.030134
2011	4803	483	0.100562	33422	1057	0.031626
2012	5058	537	0.106168	33562	1145	0.034116
2013	5124	536	0.104606	31937	1114	0.034881
2014	5133	516	0.100526	33148	1133	0.034180
2015	5110	519	0.101566	36961	1142	0.030897
2016	5074	638	0.125739	38487	1425	0.037025
2017	5225	697	0.133397	39094	1480	0.037857
2018	5151	695	0.134925	38609	1459	0.037789

In [26]:

fig = {
    "data": [{"type": "scatter",
              "x": collision_compare.index,
              "y": collision_compare['Percent BP Severe Collisions'],
              "name": "% Severe bike & ped collisions",
              "marker_color": "rgb(250, 128, 114)",
              "hoverinfo": "y"
             },
             {"type": "scatter",
              "x": collision_compare.index,
              "y": collision_compare['Percent of Severe Collisions'],
              "name": "% Severe collisions",
              "marker_color": "rgb(255, 165, 0)",
              "hoverinfo": "y"
             }],
    "layout": {"title": {"text": "Percent of Severe Collisions by Year (2009 - 2018)", "font_size": 20},
               "xaxis": {"tickfont_size": 12, 
                         "type": "category"},
               "yaxis": {"title": "Ratio",  
                         "tickfont_size": 12,
                         "tickformat": ",.0%"},
               "barmode": "group",
               "bargap": 0.25,
               "bargroupgap": 0.1,
               "legend_orientation": "h"
              }
}

pio.show(fig)

Collisions by Day of Week (10-yr)¶

In [27]:

# 10-year total
collision_dow = pd.Series(collision.groupby('day_of_week').count()['case_id'])
collision_dow = collision_dow.reindex([
    'Monday', 
    'Tuesday', 
    'Wednesday', 
    'Thursday', 
    'Friday', 
    'Saturday', 
    'Sunday'
])

pal = sns.color_palette('Reds', 7) 
colordict = {
    'Friday': pal[6],
    'Saturday': pal[5],
    'Thursday': pal[4],
    'Wednesday': pal[3],
    'Tuesday': pal[2],
    'Sunday': pal[0],
    'Monday': pal[1]
}

ax = collision_dow.plot(figsize=(12, 6), kind='bar', color=[colordict[i] for i in collision_dow.index], width=0.5, rot=0)

ax.set_title('Collisions by Day of Week (10 year Total)', fontsize=14)
ax.set_ylim((40000, 60000))
ax.set_ylabel('Number of Collisions')
ax.set_xlabel(' ')

fmt = '{x:,.0f}'
ax.yaxis.set_major_formatter(mtick.StrMethodFormatter(fmt))

for tick in ax.yaxis.get_major_ticks():
    tick.label.set_fontsize(12)
for tick in ax.xaxis.get_major_ticks():
    tick.label.set_fontsize(12)
    
    
    
# Daily total distribution
collision_dow_measure = collision.groupby(
    ['collision_date', 'day_of_week'], as_index=False
).count()[['day_of_week', 'case_id']]

collision_dow_measure.rename(
    columns={
        'day_of_week': 'Day of Week',
        'case_id': 'Number of Collisions per Day'
        }, inplace=True
)

collision_dow_measure['Day of Week'] = pd.Categorical(
    collision_dow_measure['Day of Week'],
    categories=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 
    ordered=True
)

pal = sns.color_palette('Reds', 7) 
colordict = {
    'Friday': pal[6],
    'Saturday': pal[5],
    'Thursday': pal[4],
    'Wednesday': pal[3],
    'Tuesday': pal[2],
    'Sunday': pal[0],
    'Monday': pal[1]
}

plt.figure(figsize=(12, 6))
ax = sns.boxplot(x='Day of Week', y='Number of Collisions per Day', data=collision_dow_measure, palette=colordict)
ax.set_title('Collisions by Day of Week (Daily Total Distribution)', fontsize=14)

for tick in ax.yaxis.get_major_ticks():
    tick.label.set_fontsize(12)
for tick in ax.xaxis.get_major_ticks():
    tick.label.set_fontsize(12)
    
pass

In [28]:

severe_collision_dow = pd.Series(collision.groupby('day_of_week')['collision_severity'].agg(lambda x: ((x== 'Severe') | (x == 'Fatal')).mean()))
severe_collision_dow = severe_collision_dow.reindex([
    'Monday', 
    'Tuesday', 
    'Wednesday', 
    'Thursday', 
    'Friday', 
    'Saturday', 
    'Sunday'
])

pal = sns.color_palette("Reds", 7) 
colordict = {'Friday': pal[4],
             'Saturday': pal[5],
             'Thursday': pal[1],
             'Wednesday': pal[3],
             'Tuesday': pal[1],
             'Sunday': pal[6],
             'Monday': pal[0]}

ax = severe_collision_dow.plot(figsize=(10, 6), kind='bar', color=[colordict[i] for i in severe_collision_dow.index], width=0.5, rot=0)

ax.set_title('Percent Severe Collisions by Day of the Week over 10 years', fontsize=14)
ax.set_ylabel('Percentage of Severe Collisions')
plt.ylim([0, 0.05])
ax.set_xlabel(' ')

fmt = '{x:,.0%}'
ax.yaxis.set_major_formatter(mtick.StrMethodFormatter(fmt))

for tick in ax.yaxis.get_major_ticks():
    tick.label.set_fontsize(12)
for tick in ax.xaxis.get_major_ticks():
    tick.label.set_fontsize(12)
    
    
for rect in ax.patches:
    y_value = rect.get_height()
    x_value = rect.get_x() + rect.get_width() / 2
    space = 5
    va = 'bottom'
    label = '{:.0%}'.format(y_value)
    ax.annotate(label,
                 (x_value, y_value),
                 xytext=(0, space),
                 textcoords='offset points',
                 ha='center',
                 va=va)

pass

Let's bring in time of day¶

In [29]:

collision_dow_hod = collision.groupby(['hour_of_day', 'day_of_week'])['case_id'].count().unstack()
collision_dow_hod = collision_dow_hod[['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']]

temp = collision[(collision.collision_severity=='Fatal') | (collision.collision_severity=='Severe')]
collision_dow_hod_severe = temp.groupby(['hour_of_day', 'day_of_week'])['case_id'].count().unstack()
collision_dow_hod_severe = collision_dow_hod_severe[['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']]

data = [collision_dow_hod, collision_dow_hod_severe]
title = [
    'Collision by Day of Week and Time of Day', 
    'Severe Collisions by Day of Week and Time of Day'
]
chart_data = zip(data, title)

for item in chart_data:
    
    fig = {
        "data": [{"type": "heatmap",
                  "x": item[0].columns,
                  "y": item[0].index,
                  "z": item[0].values.tolist(),
                 }],
        "layout": {"title": {"text": item[1], "font_size": 20},
                   "xaxis": {"title": "Day of Week",
                             "tickfont_size": 12, 
                             "type": "category"},
                   "yaxis": {"title": "Hour of Day",  
                             "tickfont_size": 12,
                             "type": "category"},
                   "height": 650
                  }
    }

    pio.show(fig)

Collisions by type¶

In [30]:

# by year
t1 = collision.groupby(['accident_year', 'type_of_collision'])['case_id'].count().unstack()
t1

Out[30]:

type_of_collision	Broadside	Head-On	Hit Object	Not Stated	Other	Overturned	Rear End	Sideswipe	Vehicle/Pedestrian
accident_year
2009	10268	2836	2558	519	1201	158	8625	7384	2273
2010	9892	2848	2567	476	1177	101	8259	6871	2255
2011	9800	2799	2287	278	1139	134	8052	6778	2155
2012	9137	2705	2440	375	1500	125	7931	6981	2368
2013	8855	2656	2310	392	1421	147	7168	6644	2344
2014	9096	2651	2269	444	1519	117	7583	7027	2442
2015	10191	3201	2486	620	1518	158	8386	7875	2526
2016	10634	3375	2399	739	1237	178	8771	8510	2644
2017	10910	3586	2185	770	1250	184	8656	8800	2753
2018	10850	3423	2194	740	1297	193	8360	8723	2829

In [31]:

# by weather type
t2 = collision.groupby(['type_of_collision', 'weather_1'])['case_id'].count().unstack()
t2.drop(['Clear', 'Cloudy'], axis=1, inplace=True)
t2

Out[31]:

weather_1	-	Fog	Other	Raining	Snowing	Wind
type_of_collision
Broadside	251.0	98.0	37.0	2574.0	5.0	56.0
Head-On	127.0	65.0	18.0	971.0	3.0	20.0
Hit Object	86.0	95.0	23.0	1226.0	1.0	33.0
Not Stated	368.0	3.0	5.0	130.0	NaN	3.0
Other	44.0	18.0	20.0	258.0	1.0	6.0
Overturned	4.0	2.0	1.0	61.0	NaN	1.0
Rear End	290.0	98.0	40.0	2554.0	9.0	39.0
Sideswipe	267.0	97.0	66.0	1767.0	2.0	24.0
Vehicle/Pedestrian	87.0	36.0	15.0	838.0	3.0	12.0

In [32]:

# by weather type in terms of percentages
t2.div(t2.sum(axis=1)/100, axis=0).round()

Out[32]:

weather_1	-	Fog	Other	Raining	Snowing	Wind
type_of_collision
Broadside	8.0	3.0	1.0	85.0	0.0	2.0
Head-On	11.0	5.0	1.0	81.0	0.0	2.0
Hit Object	6.0	6.0	2.0	84.0	0.0	2.0
Not Stated	72.0	1.0	1.0	26.0	NaN	1.0
Other	13.0	5.0	6.0	74.0	0.0	2.0
Overturned	6.0	3.0	1.0	88.0	NaN	1.0
Rear End	10.0	3.0	1.0	84.0	0.0	1.0
Sideswipe	12.0	4.0	3.0	79.0	0.0	1.0
Vehicle/Pedestrian	9.0	4.0	2.0	85.0	0.0	1.0

In [33]:

# by severity
t3 = collision.groupby(['type_of_collision', 'collision_severity'])['case_id'].count().unstack()
t3.drop(['Property Damage', 'Other Visible Injuries', 'Complaint of Pain'], axis=1, inplace=True)
t3.drop(['Vehicle/Pedestrian'], axis=0, inplace=True)
t3

Out[33]:

collision_severity	Fatal	Severe
type_of_collision
Broadside	443	3144
Head-On	144	1234
Hit Object	205	624
Not Stated	16	166
Other	81	497
Overturned	18	78
Rear End	88	750
Sideswipe	68	757

In [34]:

title = ['Collision Type by Year', 'Collision Type by Weather', 'Collision Type by Severity']
ylabel = ['Year', 'Weather', 'Severity']
t_data = [t1.T, t2, t3]

for t in zip(title, ylabel, t_data):
    plt.figure(figsize = (8,6))
    sns.heatmap(t[2], cmap='Blues', alpha=0.8)
    plt.title(t[0])
    plt.xlabel(' ')
    plt.ylabel(' ')

Let's look at alcohol use a bit more¶

remember the interesting bright spots (severe collisions) we saw earlier on Friday nights and early Sunday mornings?

Alcohol use among recorded incidents¶

In [35]:

collision.alcohol_involved.replace(np.nan, 'N', inplace=True) # values for [alcohol_used] are either 'Y' or blank
al_dow = collision.groupby(['day_of_week', 'alcohol_involved'])['case_id'].count().unstack()
al_dow = al_dow.reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
al_dow.rename(columns={'Y': 'Alcohol Involved', 'N': 'None Reported',}, inplace=True)
al_dow

Out[35]:

alcohol_involved	None Reported	Alcohol Involved
day_of_week
Monday	45211	3843
Tuesday	46896	3326
Wednesday	46810	3532
Thursday	47233	4006
Friday	51001	5289
Saturday	44335	7158
Sunday	39199	7649

Alcohol use among recorded incidents in percentage terms¶

In [36]:

al_dow_pct = al_dow.div(al_dow.sum(axis=1)/100, axis=0).round()
al_dow_pct

Out[36]:

alcohol_involved	None Reported	Alcohol Involved
day_of_week
Monday	92.0	8.0
Tuesday	93.0	7.0
Wednesday	93.0	7.0
Thursday	92.0	8.0
Friday	91.0	9.0
Saturday	86.0	14.0
Sunday	84.0	16.0

In [38]:

ax = al_dow_pct.plot.bar(
    stacked=True, 
    figsize=(12,6), 
    color = ['orangered', 'navy'], 
    y=['Alcohol Involved', 'None Reported'], 
    rot=0
)

ax.set_title('Collisions by Day of Week & Alcohol Involvement')
ax.set_ylabel('Percentage')
ax.set_xlabel('Day of Week (over 10 years)')
plt.legend(bbox_to_anchor = (1.01, 0.5), loc = 'center left')

fmt = '{x:,.0f}'
ax.yaxis.set_major_formatter(mtick.StrMethodFormatter(fmt))

for tick in ax.yaxis.get_major_ticks():
    tick.label.set_fontsize(12)
for tick in ax.xaxis.get_major_ticks():
    tick.label.set_fontsize(12)

plt.tight_layout()

Ok, we do see a higher alcohol involvement in percentage terms on weekends. Let's bring in time of day.¶

Recorded, alcohol-involved incidents by day of week and time of day¶

In [39]:

# Number of recorded, alcogol-involved incidents by day of week and time of day 
al = collision[collision.alcohol_involved == 'Y']
al_dow_tod = al.groupby(['day_of_week', 'time_period'])['case_id'].count().unstack()
al_dow_tod = al_dow_tod.reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
al_dow_tod = al_dow_tod.T
al_dow_tod

Out[39]:

day_of_week	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday	Sunday
time_period
Early Morning	1193	687	828	912	1318	2775	3219
AM Peak	205	223	228	202	219	235	311
Midday	330	333	329	343	375	469	444
PM Peak	718	677	694	763	878	929	1102
Evening	1378	1387	1444	1777	2485	2728	2543

In percentage terms¶

In [40]:

al_dow_tod_pct = al_dow_tod.div(al_dow_tod.sum(axis=1)/100, axis=0).round()
al_dow_tod_pct

Out[40]:

day_of_week	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday	Sunday
time_period
Early Morning	11.0	6.0	8.0	8.0	12.0	25.0	29.0
AM Peak	13.0	14.0	14.0	12.0	13.0	14.0	19.0
Midday	13.0	13.0	13.0	13.0	14.0	18.0	17.0
PM Peak	12.0	12.0	12.0	13.0	15.0	16.0	19.0
Evening	10.0	10.0	11.0	13.0	18.0	20.0	19.0

Is it a coincidence that the bright spots for % alcohol-involved aligns with % severe collisions?¶

In [42]:

fig = {
    "data": [{"type": "heatmap",
              "x": al_dow_tod_pct.columns,
              "y": al_dow_tod_pct.index,
              "z": al_dow_tod_pct.values.tolist(),
             }],
    "layout": {"title": {"text": "% Alcohol Involved in Recorded Collisions by Day of Week & Time of Day", "font_size": 20},
               "xaxis": {"title": "Day of Week",
                         "tickfont_size": 12, 
                         "type": "category"},
               "yaxis": {"title": "Time Period",  
                         "tickfont_size": 12,
                         "type": "category"},
               "height": 650
              }
}

pio.show(fig)

Victim characteristics¶

Age distribution¶

In [43]:

severe_victim = victim[
    (victim['victim_degree_of_injury'] == 'Killed') |\
    (victim['victim_degree_of_injury'] == 'Severe Injury')
]

fatal_victim = victim[
    (victim['victim_degree_of_injury'] == 'Killed')
]

fig = plt.figure(figsize=(10, 10))
ax1 = fig.add_subplot(211)
ax1 = victim['victim_age'].hist(bins=40, color='skyblue', edgecolor='grey')
ax2 = fig.add_subplot(212, sharex=ax1)
ax2 = fatal_victim['victim_age'].hist(bins=40, color='salmon', edgecolor='darkred')

for ax in [ax1, ax2]:
    ax.set_ylabel('Number of Collisions')
    fmt = '{x:,.0f}'
    ax.yaxis.set_major_formatter(mtick.StrMethodFormatter(fmt))

    for tick in ax.yaxis.get_major_ticks():
        tick.label.set_fontsize(12)
    for tick in ax.xaxis.get_major_ticks():
        tick.label.set_fontsize(12)
        
ax1.set_title("Victim's Age Distribution")
ax1.axvline(victim['victim_age'].mean(), color='red', linestyle='dashed', linewidth=1)
ax1.text(victim['victim_age'].mean() + victim['victim_age'].mean()/10,
        40000, 'Mean Age: {:.0f}'.format(victim['victim_age'].mean()), fontsize=12)
ax1.text(110, 3000, 'All Victims', fontsize=12)

ax2.axvline(fatal_victim['victim_age'].mean(), color='darkred', linestyle='dashed', linewidth=1)
ax2.text(fatal_victim['victim_age'].mean() + fatal_victim['victim_age'].mean()/10,
        120, 'Mean Age: {:.0f}'.format(fatal_victim['victim_age'].mean()), fontsize=12)
ax2.text(110, 10, 'Killed Victims', fontsize=12)
ax2.set_xlabel('Age')

pass

Gender¶

In [44]:

victim.groupby('victim_sex')['case_id'].count()

Out[44]:

victim_sex
F    215779
M    223444
Name: case_id, dtype: int64

In [45]:

victim_injury = victim.groupby(['victim_sex', 'victim_degree_of_injury'])['case_id'].count().unstack(level=0)
victim_injury.drop(index=['Complaint of Pain', 'No Injury', 'Other'], inplace=True)
victim_injury

Out[45]:

victim_sex	F	M
victim_degree_of_injury
Killed	536	1440
Other Visible Injury	29513	46603
Severe Injury	3371	7600

In [46]:

fig = {
    "data": [{"type": "bar",
              "x": victim_injury.index,
              "y": victim_injury['F'],
              "name": "Female",
              "marker_color": "rgb(0, 0, 128)",
              "hoverinfo": "y"
             },
             {"type": "bar",
              "x": victim_injury.index,
              "y": victim_injury['M'],
              "name": "Male",
              "marker_color": "rgb(255, 69, 0)",
              "hoverinfo": "y"
             }],
    "layout": {"title": {"text": "Degree of Injury by Gender (10-yr Total)", "font_size": 20},
               "xaxis": {"tickfont_size": 12, 
                         "type": "category"},
               "yaxis": {"title": "Nnumber of Collisions",  
                         "tickfont_size": 12},
               "barmode": "group",
               "bargap": 0.25,
               "bargroupgap": 0.1,
               "plot_bgcolor": "#F5F5F5"
              }
}

pio.show(fig)

Seat belt?¶

In [47]:

def condition(x):
    if (x.victim_safety_equip_1 in ['C', 'E', 'G']) or (x.victim_safety_equip_2 in ['C', 'E', 'G']):
        return 'Seat Belts'
    else:
        return 'No Seat Belts'

victim['safety'] = victim.apply(condition, axis=1)
victim.head()

Out[47]:

	case_id	party_number	victim_role	victim_sex	victim_age	victim_degree_of_injury	victim_seating_position	victim_safety_equip_1	victim_safety_equip_2	safety
0	3918943	2	1	M	25.0	Complaint of Pain	1	N	G	Seat Belts
1	3984219	1	1	M	19.0	Killed	1	L	G	Seat Belts
2	3984274	2	3	F	58.0	Killed	9	-	-	No Seat Belts
3	3984281	1	1	M	29.0	Killed	1	L	H	No Seat Belts
4	3984352	1	2	F	NaN	No Injury	3	L	G	Seat Belts

In [48]:

sb_injury = victim.groupby(['safety', 'victim_degree_of_injury'])['case_id'].count().unstack(level=0)
sb_injury.drop(index=['Complaint of Pain', 'No Injury', 'Other', 'Other Visible Injury'], inplace=True)
sb_injury

Out[48]:

safety	No Seat Belts	Seat Belts
victim_degree_of_injury
Killed	1609	370
Severe Injury	8000	3156

In [49]:

fig = {
    "data": [{"type": "bar",
              "x": sb_injury.index,
              "y": sb_injury['No Seat Belts'],
              "name": "No Seat Belts",
              "marker_color": "rgb(0, 0, 128)",
              "hoverinfo": "y"
             },
             {"type": "bar",
              "x": sb_injury.index,
              "y": sb_injury['Seat Belts'],
              "name": "Seat Belts",
              "marker_color": "rgb(255, 69, 0)",
              "hoverinfo": "y"
             }],
    "layout": {"title": {"text": "Degree of Injury by Seat Belt Status (10-yr Total)", "font_size": 20},
               "xaxis": {"tickfont_size": 12, 
                         "type": "category"},
               "yaxis": {"title": "Nnumber of Collisions",  
                         "tickfont_size": 12},
               "barmode": "group",
               "bargap": 0.25,
               "bargroupgap": 0.1,
               "plot_bgcolor": "#F5F5F5"
              }
}

pio.show(fig)

Geocoding (Google Maps Geocoding API)¶

Let's geocode all severe and fatal intersection collisions

In [51]:

api_key = '----'
gmaps = googlemaps.Client(key=api_key)

In [52]:

fatal_severe_collision = collision[(collision.collision_severity == 'Fatal') | (collision.collision_severity == 'Severe')]
int_fatal_severe_collision = fatal_severe_collision[fatal_severe_collision.intersection == 'Y']

In [53]:

address = (
    int_fatal_severe_collision.primary_rd 
    + ' and ' 
    + int_fatal_severe_collision.secondary_rd 
    + ', Los Angeles, California').tolist()

address_formatted = []
lat = []
lon = []

for add in address:
    try:
        geocode_result = gmaps.geocode(add)

        ad = geocode_result[0]['formatted_address']
        latitude = geocode_result[0]['geometry']['location']['lat']
        longitude = geocode_result[0]['geometry']['location']['lng']

        address_formatted.append(ad)
        lat.append(latitude)
        lon.append(longitude)
    except:
        address_formatted.append('NA')
        lat.append('NA')
        lon.append('NA')

In [54]:

int_fatal_severe_collision['lat'] = lat
int_fatal_severe_collision['lon'] = lon
int_fatal_severe_collision['address'] = address_formatted

Now we XYs for all severe and fatal intersections collision locations¶

In [57]:

int_fatal_severe_collision[['primary_rd', 'secondary_rd', 'lat', 'lon']].head()

Out[57]:

	primary_rd	secondary_rd	lat	lon
0	SAN PEDRO ST	32ND ST	33.715175	-118.289782
1	22ND ST	VILLA CABRILLO MARINA	33.717084	-118.282379
2	ALMA ST	30TH ST	33.718315	-118.299416
3	PACIFIC AV	26TH ST	33.720614	-118.287991
4	25TH ST	MEYLER ST	33.721563	-118.296843

Density-based Clustering Identification with DBSCAN (scikit-learn)¶

We need to specify the following hyperparameters:

how far apart should the collisions be to be considered in the same cluster (epsilon)?
how many collisions should be in the radius of a data point (min_samples)?

Here, I use the haversine metric and ball tree algorithm to calculate great circle distances between points. Thus, I will convert the epsilon and collision location coordinates to radians.

In [59]:

int_fatal_severe_collision.columns

Out[59]:

Index(['case_id', 'accident_year', 'collision_time', 'day_of_week',
       'primary_rd', 'secondary_rd', 'distance', 'direction', 'intersection',
       'weather_1', 'state_hwy_ind', 'state_route', 'route_suffix',
       'postmile_prefix', 'postmile', 'location_type', 'ramp_intersection',
       'side_of_hwy', 'tow_away', 'collision_severity', 'number_killed',
       'number_injured', 'party_count', 'primary_coll_factor',
       'pcf_code_of_viol', 'pcf_viol_category', 'pcf_violation',
       'pcf_viol_subsection', 'hit_and_run', 'type_of_collision', 'mviw',
       'ped_action', 'road_surface', 'road_cond_1', 'road_cond_2', 'lighting',
       'control_device', 'chp_road_type', 'pedestrian_accident',
       'bicycle_accident', 'motorcycle_accident', 'truck_accident',
       'not_private_property', 'alcohol_involved', 'count_severe_inj',
       'count_visible_inj', 'count_complaint_pain', 'count_ped_killed',
       'count_ped_injured', 'count_bicyclist_killed',
       'count_bicyclist_injured', 'count_mc_killed', 'count_mc_injured',
       'latitude', 'longitude', 'bike_ped_collision', 'time_period',
       'hour_of_day', 'lat', 'lon', 'address'],
      dtype='object')

In [60]:

int_fatal_severe_collision.dropna(subset=['lat', 'lon'], inplace=True)

In [61]:

radius = 0.1/6371.0088 # converting 100m to radians (The radius of the earth is 6371.0088 km)
min_event = 3

x = np.column_stack((int_fatal_severe_collision['lat'], int_fatal_severe_collision['lon']))
db = DBSCAN(eps=radius, min_samples=min_event, algorithm='ball_tree', metric='haversine').fit(x)
labels = np.where(db.labels_==-1, "Noise", "Cluster").tolist()
int_fatal_severe_collision['cluster'] = labels

Now each collision location has a label (cluster or noise)¶

In [62]:

int_fatal_severe_collision[['primary_rd', 'secondary_rd', 'lat', 'lon', 'cluster']].head()

Out[62]:

	primary_rd	secondary_rd	lat	lon	cluster
0	SAN PEDRO ST	32ND ST	33.715175	-118.289782	Noise
1	22ND ST	VILLA CABRILLO MARINA	33.717084	-118.282379	Noise
2	ALMA ST	30TH ST	33.718315	-118.299416	Noise
3	PACIFIC AV	26TH ST	33.720614	-118.287991	Noise
4	25TH ST	MEYLER ST	33.721563	-118.296843	Noise

Let's see it on a map¶

Not all intersections are created equal; some are more dangerous than others
On the map below, darker red represents intersections where severe and fatal collisions tend to concentrate

In [63]:

cc.write(int_fatal_severe_collision, 'fatal_severe_collision', lnglat=('lon', 'lat'), overwrite=True)

Params: encode_geom, geom_col and everything in kwargs are deprecated and not being used any more
Table successfully written to CARTO: https://chenliny1117.carto.com/dataset/fatal_severe_collision

Out[63]:

<cartoframes.dataset.Dataset at 0x7f020eb8c9e8>

In [64]:

cc.map(layers=[Layer('fatal_severe_collision',
                     color={'column': 'cluster', 'scheme': styling.burg(2, bin_method='category')},
                     size=5.5),
               BaseMap(source='light')],
       interactive=True,
       zoom=10,
      lng=-118.4148,
      lat=34.1038
      )

Out[64]:

What have we found so far?¶

Number of collisions has been rising; 2013 saw the least amount of collisions.
Fridays are the worse in terms of collision occurances (both in terms of 10-yr total or daily distribution), yet the percentage of severe collisions is much higher on weekends, especially on Sundays.
PM peak hours on Fridays are the most dangerous time period (in terms of collision occurances).
Severe collisions (fatal or severe injury) tend to happen in early weekend mornings (12 AM - 6 AM); percentage of alcohol involvement in recorded incidents also peaks during the same time period.
Victims with seat belt on are less likely to suffer high degree of injury from collisions.
Mean age of fatal collision victims is higher than that of all victims.
Broadside collisions are more deadly than others; collisions of all types are more likely to happen on rainy days.
Not all intersections are created equal; some tend to be more dangerous than others.
...
...

---------------------------To be continued------------------------------¶

In [ ]: