How often do people use Toronto Bike Share together in pairs or groups? Where are they most likely to go? Does this tell us anything about tourist destinations in Toronto?

Of these pair/group trips, how many are of mixed membership types? This would tell us the types of trips- a local showing a non-local around, two (or more!) locals travelling together, two visitors travelling together.

How far do people tend to bike?

In [1]:

import pandas as pd
from datetime import datetime
from datetime import timedelta
import warnings
warnings.filterwarnings('ignore')

In [2]:

df = pd.read_csv("C:/Data/Projects/Toronto_Bikeshare/2016_Bike_Share_Toronto_Ridership_Q3.csv")
# Set index to be the trip id- is useful later
df.set_index("trip_id",inplace = True)
df = df.sort_index(ascending = True)
df.head(5)

Out[2]:

	trip_start_time	trip_stop_time	trip_duration_seconds	from_station_name	to_station_name	user_type
trip_id
24008	7-1-16 0:00	7-1-16 0:08	505	College St W / Huron St	Queens Park / Bloor St W	Member
24009	7-1-16 0:00	7-1-16 0:10	603	Wellington St W / Bay St	King St W / Spadina Ave	Member
24010	7-1-16 0:00	7-1-16 0:42	2487	Bay St / Queens Quay W (Ferry Terminal)	York St / Queens Quay W	Casual
24011	7-1-16 0:01	7-1-16 0:07	399	Trinity St /Front St E	Princess St / Adelaide St	Member
24012	7-1-16 0:01	7-1-16 0:12	662	Simcoe St / Queen St W	Queen St W / Spadina Ave	Member

In [3]:

def parse_date(data):
    dt = datetime.strptime(data, "%m-%d-%y %H:%M")
    parsed_date = datetime.strftime(dt, "%m-%d-%y")
    return parsed_date

def to_datetime(data):
    dt = datetime.strptime(data, "%m-%d-%y %H:%M")
    return dt

In [4]:

df["trip_date"] = df["trip_start_time"].apply(parse_date)
df["trip_start_datetime"] = df["trip_start_time"].apply(to_datetime)
df.dropna(inplace = True)
df["trip_duration_seconds"] = df["trip_duration_seconds"].astype(int)

In [5]:

# Logic: Same first and last station, and similar start time- start on the same day and within 60 seconds
same_first_station = (df["from_station_name"] == (df.shift(-1)["from_station_name"]))
same_last_station = (df["to_station_name"] == (df.shift(-1)["to_station_name"]))
similar_start = (df["trip_start_datetime"] <= ((df.shift(-1)["trip_start_datetime"])+timedelta(seconds = 60))) & \
                (df["trip_start_datetime"] <= ((df.shift(-1)["trip_start_datetime"])-timedelta(seconds = 60)))

In [6]:

# Get index of the first trip in the potential pair/group
# Add one to these indices to get all the group trip id's
first_in_group_index = df[same_first_station & same_last_station & similar_start].index
add_to_index = first_in_group_index + 1
# Get all group trips before splitting up by membership type
group_trips = df.ix[add_to_index | first_in_group_index]
group_trips.dropna(inplace =  True)
keep_cols = ["from_station_name", "to_station_name", "user_type", "trip_date", "trip_duration_seconds"]
group_trips = group_trips[keep_cols]

In [7]:

# Look at membership type
# Mixed group
mixed_type_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) & 
            (group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
            (group_trips["user_type"] != group_trips.shift(-1)["user_type"])].index
mixed_trips = df.ix[(mixed_type_index+1) | mixed_type_index]
mixed_trips_index = mixed_trips.index
# Members only
member_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) & 
            (group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
            (group_trips["user_type"] == "Member") & 
            (group_trips.shift(-1)["user_type"] == "Member")].index
member_trips = df.ix[(member_index+1) | member_index]
member_trips_index = member_trips.index
# Casual only
casual_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) & 
            (group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
            (group_trips["user_type"] == "Casual") & 
            (group_trips.shift(-1)["user_type"] == "Casual")].index
casual_trips = df.ix[(casual_index+1) | casual_index]
casual_trips_index = casual_trips.index

In [8]:

# group_trips dataframe for some reason has some single trips- 
# filter the group_trip df by those that are in the member_trip, casual_trip, or mixed_trip df.
# This gets rid of the single line cases.
group_trips = group_trips.ix[casual_trips_index | member_trips_index | mixed_trips_index]

In [9]:

# Trip numbers
num_group_trips = len(group_trips)
num_mixed_trips = len(mixed_trips)
num_member_trips = len(member_trips)
num_casual_trips = len(casual_trips)
total_trips = len(df)
# Percentages
pct_group_trips = round(100*num_group_trips/total_trips)
pct_mixed_trips = round(100*num_mixed_trips/num_group_trips)
pct_member_trips = round(100*num_member_trips/num_group_trips)
pct_casual_trips = round(100*num_casual_trips/num_group_trips)
# Print summary statistics
print("There were {num_group_trips} group trips during Q3 2016, {pct_group_trips} percent of the total {total_trips} trips.".
     format(num_group_trips = num_group_trips, 
            pct_group_trips = pct_group_trips,
           total_trips = total_trips))
print("Of the {num_group_trips} group trips, {pct_mixed_trips} percent of these ({num_mixed_trips} trips) are of mixed membership type.".
     format(num_group_trips = num_group_trips,
           pct_mixed_trips = pct_mixed_trips,
           num_mixed_trips =  num_mixed_trips))
print("{pct_member_trips} percent ({num_member_trips} trips) are of the membership type.".
     format(pct_member_trips = pct_member_trips,
           num_member_trips =  num_member_trips))
print("{pct_casual_trips} percent or ({num_casual_trips} trips) are of the casual type.".
     format(pct_casual_trips = pct_casual_trips,
           num_casual_trips =  num_casual_trips))

There were 5942 group trips during Q3 2016, 2 percent of the total 367957 trips.
Of the 5942 group trips, 4 percent of these (242 trips) are of mixed membership type.
12 percent (724 trips) are of the membership type.
84 percent or (4977 trips) are of the casual type.

In [10]:

# How long is the average trip?
avg_mixed = round(mixed_trips["trip_duration_seconds"].agg("mean")/60)
avg_member = round(member_trips["trip_duration_seconds"].agg("mean")/60)
avg_casual = round(casual_trips["trip_duration_seconds"].agg("mean")/60)
print("The average mixed trip is {avg_mixed} minutes, \
while the average member trip is {avg_member} minutes. \
The average casual trip is {avg_casual} minutes.".
format(avg_casual = avg_casual, avg_member = avg_member, avg_mixed = avg_mixed))

The average mixed trip is 17 minutes, while the average member trip is 14 minutes. The average casual trip is 32 minutes.

In [11]:

# Where are the most popular starting and ending points? 
def make_trip_df(data, input_col, output_col):
    # Get all the stations
    stations = data[input_col].value_counts().index.tolist()
    # Get number of trips
    values = data[input_col].value_counts().tolist()
    # Create dataframe of just stations and their total number of trips
    # Set index to the station name
    trip_df = pd.DataFrame({"station": stations, output_col: values})
    trip_df.set_index("station", inplace = True)
    return trip_df

mixed_departures = make_trip_df(mixed_trips, "from_station_name", "total_departures")
member_departures = make_trip_df(member_trips, "from_station_name", "total_departures")
casual_departures = make_trip_df(casual_trips, "from_station_name", "total_departures")
mixed_arrivals = make_trip_df(mixed_trips, "to_station_name", "total_arrivals")
member_arrivals = make_trip_df(member_trips, "to_station_name", "total_arrivals")
casual_arrivals = make_trip_df(casual_trips, "to_station_name", "total_arrivals")

In [12]:

from IPython.display import display_html
def display_side_by_side(*args):
    html_str=''
    for df in args:
        html_str+=df.to_html()
    display_html(html_str.replace('table','table style="display:inline"'),raw=True)

In [13]:

display_side_by_side(mixed_departures.head(5), mixed_arrivals.head(5))

	total_departures
station
Elizabeth St / Edward St (Bus Terminal)	8
Strachan Ave / Princes' Blvd	8
York St / Queens Quay W	8
Madison Ave / Bloor St W	6
Augusta Ave / Denison Sq	6

	total_arrivals
station
Queen St W / Ossington Ave	8
Dundas St / Yonge St	8
York St / Queens Quay W	6
161 Bleecker St (South of Wellesley)	6
Bay St / Queens Quay W (Ferry Terminal)	6

In [14]:

display_side_by_side(casual_departures.head(5), casual_arrivals.head(5))

	total_departures
station
York St / Queens Quay W	194
Bay St / Queens Quay W (Ferry Terminal)	186
HTO Park (Queen's Quay W)	154
Bremner Blvd / Rees St	136
Queens Quay W / Lower Simcoe St	130

	total_arrivals
station
Bay St / Queens Quay W (Ferry Terminal)	190
York St / Queens Quay W	159
HTO Park (Queen's Quay W)	148
Queens Quay W / Lower Simcoe St	137
Dockside Dr / Queens Quay E (Sugar Beach)	113

In [15]:

display_side_by_side(member_departures.head(5), member_arrivals.head(5))

	total_departures
station
Dockside Dr / Queens Quay E (Sugar Beach)	20
King St W / Douro St	14
College St W / Markham St	14
Queen St W / Portland St	14
University Ave / King St W	14

	total_arrivals
station
Princess St / Adelaide St	20
Beverly St / Dundas St W	20
Euclid Ave / Bloor St W	16
University Ave / Elm St	16
Queen St W / Portland St	14

For mixed trips, Young-Dundas Square, the waterfront, and Queen Street West are all popular. For casual trips, more people seem to enjoy biking along the waterfront. For member trips, it's a bit more residential, which seems to imply that member group trips in the summer are more done by people who live in Toronto and are trying going around with their friend(s) in their neighbourhood, not so much for sightseeing purposes.