How often do people use Toronto Bike Share together in pairs or groups? Where are they most likely to go? Does this tell us anything about tourist destinations in Toronto?
Of these pair/group trips, how many are of mixed membership types? This would tell us the types of trips- a local showing a non-local around, two (or more!) locals travelling together, two visitors travelling together.
How far do people tend to bike?
import pandas as pd
from datetime import datetime
from datetime import timedelta
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv("C:/Data/Projects/Toronto_Bikeshare/2016_Bike_Share_Toronto_Ridership_Q3.csv")
# Set index to be the trip id- is useful later
df.set_index("trip_id",inplace = True)
df = df.sort_index(ascending = True)
df.head(5)
trip_start_time | trip_stop_time | trip_duration_seconds | from_station_name | to_station_name | user_type | |
---|---|---|---|---|---|---|
trip_id | ||||||
24008 | 7-1-16 0:00 | 7-1-16 0:08 | 505 | College St W / Huron St | Queens Park / Bloor St W | Member |
24009 | 7-1-16 0:00 | 7-1-16 0:10 | 603 | Wellington St W / Bay St | King St W / Spadina Ave | Member |
24010 | 7-1-16 0:00 | 7-1-16 0:42 | 2487 | Bay St / Queens Quay W (Ferry Terminal) | York St / Queens Quay W | Casual |
24011 | 7-1-16 0:01 | 7-1-16 0:07 | 399 | Trinity St /Front St E | Princess St / Adelaide St | Member |
24012 | 7-1-16 0:01 | 7-1-16 0:12 | 662 | Simcoe St / Queen St W | Queen St W / Spadina Ave | Member |
def parse_date(data):
dt = datetime.strptime(data, "%m-%d-%y %H:%M")
parsed_date = datetime.strftime(dt, "%m-%d-%y")
return parsed_date
def to_datetime(data):
dt = datetime.strptime(data, "%m-%d-%y %H:%M")
return dt
df["trip_date"] = df["trip_start_time"].apply(parse_date)
df["trip_start_datetime"] = df["trip_start_time"].apply(to_datetime)
df.dropna(inplace = True)
df["trip_duration_seconds"] = df["trip_duration_seconds"].astype(int)
# Logic: Same first and last station, and similar start time- start on the same day and within 60 seconds
same_first_station = (df["from_station_name"] == (df.shift(-1)["from_station_name"]))
same_last_station = (df["to_station_name"] == (df.shift(-1)["to_station_name"]))
similar_start = (df["trip_start_datetime"] <= ((df.shift(-1)["trip_start_datetime"])+timedelta(seconds = 60))) & \
(df["trip_start_datetime"] <= ((df.shift(-1)["trip_start_datetime"])-timedelta(seconds = 60)))
# Get index of the first trip in the potential pair/group
# Add one to these indices to get all the group trip id's
first_in_group_index = df[same_first_station & same_last_station & similar_start].index
add_to_index = first_in_group_index + 1
# Get all group trips before splitting up by membership type
group_trips = df.ix[add_to_index | first_in_group_index]
group_trips.dropna(inplace = True)
keep_cols = ["from_station_name", "to_station_name", "user_type", "trip_date", "trip_duration_seconds"]
group_trips = group_trips[keep_cols]
# Look at membership type
# Mixed group
mixed_type_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) &
(group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
(group_trips["user_type"] != group_trips.shift(-1)["user_type"])].index
mixed_trips = df.ix[(mixed_type_index+1) | mixed_type_index]
mixed_trips_index = mixed_trips.index
# Members only
member_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) &
(group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
(group_trips["user_type"] == "Member") &
(group_trips.shift(-1)["user_type"] == "Member")].index
member_trips = df.ix[(member_index+1) | member_index]
member_trips_index = member_trips.index
# Casual only
casual_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) &
(group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
(group_trips["user_type"] == "Casual") &
(group_trips.shift(-1)["user_type"] == "Casual")].index
casual_trips = df.ix[(casual_index+1) | casual_index]
casual_trips_index = casual_trips.index
# group_trips dataframe for some reason has some single trips-
# filter the group_trip df by those that are in the member_trip, casual_trip, or mixed_trip df.
# This gets rid of the single line cases.
group_trips = group_trips.ix[casual_trips_index | member_trips_index | mixed_trips_index]
# Trip numbers
num_group_trips = len(group_trips)
num_mixed_trips = len(mixed_trips)
num_member_trips = len(member_trips)
num_casual_trips = len(casual_trips)
total_trips = len(df)
# Percentages
pct_group_trips = round(100*num_group_trips/total_trips)
pct_mixed_trips = round(100*num_mixed_trips/num_group_trips)
pct_member_trips = round(100*num_member_trips/num_group_trips)
pct_casual_trips = round(100*num_casual_trips/num_group_trips)
# Print summary statistics
print("There were {num_group_trips} group trips during Q3 2016, {pct_group_trips} percent of the total {total_trips} trips.".
format(num_group_trips = num_group_trips,
pct_group_trips = pct_group_trips,
total_trips = total_trips))
print("Of the {num_group_trips} group trips, {pct_mixed_trips} percent of these ({num_mixed_trips} trips) are of mixed membership type.".
format(num_group_trips = num_group_trips,
pct_mixed_trips = pct_mixed_trips,
num_mixed_trips = num_mixed_trips))
print("{pct_member_trips} percent ({num_member_trips} trips) are of the membership type.".
format(pct_member_trips = pct_member_trips,
num_member_trips = num_member_trips))
print("{pct_casual_trips} percent or ({num_casual_trips} trips) are of the casual type.".
format(pct_casual_trips = pct_casual_trips,
num_casual_trips = num_casual_trips))
There were 5942 group trips during Q3 2016, 2 percent of the total 367957 trips. Of the 5942 group trips, 4 percent of these (242 trips) are of mixed membership type. 12 percent (724 trips) are of the membership type. 84 percent or (4977 trips) are of the casual type.
# How long is the average trip?
avg_mixed = round(mixed_trips["trip_duration_seconds"].agg("mean")/60)
avg_member = round(member_trips["trip_duration_seconds"].agg("mean")/60)
avg_casual = round(casual_trips["trip_duration_seconds"].agg("mean")/60)
print("The average mixed trip is {avg_mixed} minutes, \
while the average member trip is {avg_member} minutes. \
The average casual trip is {avg_casual} minutes.".
format(avg_casual = avg_casual, avg_member = avg_member, avg_mixed = avg_mixed))
The average mixed trip is 17 minutes, while the average member trip is 14 minutes. The average casual trip is 32 minutes.
# Where are the most popular starting and ending points?
def make_trip_df(data, input_col, output_col):
# Get all the stations
stations = data[input_col].value_counts().index.tolist()
# Get number of trips
values = data[input_col].value_counts().tolist()
# Create dataframe of just stations and their total number of trips
# Set index to the station name
trip_df = pd.DataFrame({"station": stations, output_col: values})
trip_df.set_index("station", inplace = True)
return trip_df
mixed_departures = make_trip_df(mixed_trips, "from_station_name", "total_departures")
member_departures = make_trip_df(member_trips, "from_station_name", "total_departures")
casual_departures = make_trip_df(casual_trips, "from_station_name", "total_departures")
mixed_arrivals = make_trip_df(mixed_trips, "to_station_name", "total_arrivals")
member_arrivals = make_trip_df(member_trips, "to_station_name", "total_arrivals")
casual_arrivals = make_trip_df(casual_trips, "to_station_name", "total_arrivals")
from IPython.display import display_html
def display_side_by_side(*args):
html_str=''
for df in args:
html_str+=df.to_html()
display_html(html_str.replace('table','table style="display:inline"'),raw=True)
display_side_by_side(mixed_departures.head(5), mixed_arrivals.head(5))
total_departures | |
---|---|
station | |
Elizabeth St / Edward St (Bus Terminal) | 8 |
Strachan Ave / Princes' Blvd | 8 |
York St / Queens Quay W | 8 |
Madison Ave / Bloor St W | 6 |
Augusta Ave / Denison Sq | 6 |
total_arrivals | |
---|---|
station | |
Queen St W / Ossington Ave | 8 |
Dundas St / Yonge St | 8 |
York St / Queens Quay W | 6 |
161 Bleecker St (South of Wellesley) | 6 |
Bay St / Queens Quay W (Ferry Terminal) | 6 |
display_side_by_side(casual_departures.head(5), casual_arrivals.head(5))
total_departures | |
---|---|
station | |
York St / Queens Quay W | 194 |
Bay St / Queens Quay W (Ferry Terminal) | 186 |
HTO Park (Queen's Quay W) | 154 |
Bremner Blvd / Rees St | 136 |
Queens Quay W / Lower Simcoe St | 130 |
total_arrivals | |
---|---|
station | |
Bay St / Queens Quay W (Ferry Terminal) | 190 |
York St / Queens Quay W | 159 |
HTO Park (Queen's Quay W) | 148 |
Queens Quay W / Lower Simcoe St | 137 |
Dockside Dr / Queens Quay E (Sugar Beach) | 113 |
display_side_by_side(member_departures.head(5), member_arrivals.head(5))
total_departures | |
---|---|
station | |
Dockside Dr / Queens Quay E (Sugar Beach) | 20 |
King St W / Douro St | 14 |
College St W / Markham St | 14 |
Queen St W / Portland St | 14 |
University Ave / King St W | 14 |
total_arrivals | |
---|---|
station | |
Princess St / Adelaide St | 20 |
Beverly St / Dundas St W | 20 |
Euclid Ave / Bloor St W | 16 |
University Ave / Elm St | 16 |
Queen St W / Portland St | 14 |
For mixed trips, Young-Dundas Square, the waterfront, and Queen Street West are all popular. For casual trips, more people seem to enjoy biking along the waterfront. For member trips, it's a bit more residential, which seems to imply that member group trips in the summer are more done by people who live in Toronto and are trying going around with their friend(s) in their neighbourhood, not so much for sightseeing purposes.