Exploritory Data Analysis¶

Abstract¶

We examine the NSW train Exit and Entry traffic for the Kings Cross, Newtown, Parramatta, Circular Quay, Museum, Central and Town Hall stations for evidence of changes due to the introduction of the NSW Lockout Law on 24-Feb-2014.

Using Baysian Change Point detection we found:

No evidence of changes to Kings Cross or Parramatta Exit traffic from the introduction of the lockout law.
Evidence of strong growth in the Parramatta Friday night Exit traffic that is unrelated of the lockout laws and which has increased traffic by 200% since Jan-2013
Evidence of changes in the Newtown Friday night Exit traffic as a result of the lockout laws and which has increased traffic by 300% since the law came into effect.

The Data¶

We were provided with train turnstile "validation" data by Transport for NSW. This data is a summary of Exit and Entry traffic by Station, Hour and Date.

The data covers the period 2013-02-01 to 2016-07-31, for Friday, Saturday and Sunday nights from 5PM to 2AM. Since this spans the transition from Magnetic tickets to OPAL cards the data is further divided by Source.

In the notebook 1_Data_Transformation we transform the data from its raw form to one more suitable for our analysis. Specifically we need observations between midnight and the last train (2pm) to have the Date as the previous day. This means all observation in the period 5pm-2am will have the same Date and be considered part of the same Night.

In [86]:

import numpy as np
import pandas as pd
from datetime import timedelta, datetime

import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 2)

import seaborn as sns
sns.set(font_scale=0.6)

import outlier
import utilities as util

In [2]:

df_raw = pd.read_pickle("TrainValidationData/df_raw.pkl") 
stations= ['Kings Cross Station', 'Newtown Station', 'Parramatta Station',
           'Circular Quay Station', 'Museum Station',
           'Central Station', 'Town Hall Station']

In the notebook 2_Cleaning we examine the data in more detail. Unsurprisingly we found some data is missing, however we conclude the impact is minimal and we can work with this data.

Counting the number of whole days of missing data we find:

In [3]:

daily = (df_raw.query('Station in @stations').
            reset_index().
            pivot_table(index = 'Date', columns = ['Station'], values='Exit', aggfunc=sum).dropna(axis='columns', how='all'))
print("Number of days", len(daily))
for station in daily.columns:
    print("{:>22}: {:}".format(station, daily[station].isnull().sum()))

Number of days 549
       Central Station: 2
 Circular Quay Station: 13
   Kings Cross Station: 14
        Museum Station: 16
       Newtown Station: 58
    Parramatta Station: 6
     Town Hall Station: 0

We observe

Town Hall has no missing data and Central just two days.
Newtown is missing 10% (40 days), which needs further investigation.

We examine the missing data more closely in 2_Cleaning and discover the missing data for:

Kings Cross and Parramatta is a few whole days on Saturday which is probably due to the station being closed for station or trackwork.
Newtown is largely the first 5 months of Saturday data from Feb to June of 2013. This is probably a result of the station upgrade works started in 2013.

Counting the missing data by Hour of the day we get:

In [4]:

(df_raw.query('Station in @stations').
     reset_index().
     pivot_table(index = 'Date', columns = ['Station', 'Hour'], values='Exit', aggfunc=sum).
     dropna(axis='columns', how='all').
     isnull().
     sum().reset_index().
     pivot('Station', 'Hour', 0))

Out[4]:

Hour	5PM	6PM	7PM	8PM	9PM	10PM	11PM	12AM	1AM	2AM
Station
Central Station	2	2	2	2	2	5	6	6	6	33
Circular Quay Station	27	27	27	30	30	33	32	20	214	528
Kings Cross Station	27	26	28	29	29	30	29	23	134	499
Museum Station	32	36	36	35	36	119	118	127	290	536
Newtown Station	93	92	96	112	111	178	197	184	319	470
Parramatta Station	15	12	15	13	18	19	20	15	76	403
Town Hall Station	0	0	0	0	0	0	0	0	36	435

We for observe for most stations the missing data does not change by Hour, where as Town Hall and Museum the count increases later at night possibly indicating that the station closes early or the gates are left open and people do not swipe off.

Analysis of Hourly Traffic¶

In this section we look at the traffic patterns at the hourly level. This data is created from the raw data by adding the Exit/Entry traffic for Magnetic Tickets and OPAL for each Station/Date/Hour.

In [5]:

df_hourly = pd.read_pickle("TrainValidationData/df_hourly.pkl")

Hourly averages¶

The trellis plot below list Nights from left to right and Stations from top to bottom using a common and y-axis sale. Each plot displays the average Exit traffic by hour from 5PM - 2AM

In [6]:

sns.set(font_scale=0.85)
df_hourly['Year']    = df_hourly.index.year

g = sns.factorplot(data=df_hourly, x='Hour', y='Exit', kind="bar", 
               row="Station", row_order=stations[:5], col = 'Night',
               size=1.75, aspect=2, ci = None, margin_titles = True, sharey=True)
util.set_titles(g);

We plot Central and Townhall separately due to the disparity of the y-axis scales.

In [7]:

sns.set(font_scale=0.85)
df_hourly['Year']    = df_hourly.index.year

g = sns.factorplot(data=df_hourly, x='Hour', y='Exit', kind="bar", 
               row="Station", row_order=stations[5:], col = 'Night',
               size=1.75, aspect=2, ci = None, margin_titles = True, sharey=True)
util.set_titles(g);

We observe the:

Exit traffic declines by Hour as expected.
Pattern of decline is different for each night.
the 5pm and 6pm Friday night traffic is much higher than Saturday or Sunday and (with the exception of Newtown) is disproportionate to the rest of the Friday night. We suspect this to be commuters heading home from work.

For this reason subsequent analysis only considers data from 7pm onwards.

Hourly time series¶

The following trellis plot list Stations from left to right and Hours from top to bottom starting at 7PM. The X and Y scales are common across all the plots to make comparison easy.

Each each plot of the time series shows:

The Exit traffic (blue line).
A trend line (dotted green), calculated using linear regression (ordinary least squares). Note this is indicative only and does not indicate linear growth.
Outliers (red star), found using the Interquartile Range (IQR) method.

In [8]:

util.plot_stations_hourly(df_hourly,stations[:3])

In [9]:

util.plot_stations_hourly(df_hourly,stations[3:5])

In [10]:

util.plot_stations_hourly(df_hourly,stations[5:])

From these plots we observe that each time series:

Has 2-3 outliers per year (5%). As this does not impact our analysis they have not been investigated further.
All stations show growth over the years, with Newtown and Parramatta are growing about 2 times faster than Kings Cross.
Is noisy. While the Exit traffic generally increases over time there are significant weekly variations (20-30%). The autocorrelation plot of the residues (below) indicates the variations are non-periodic, i.e., random rather than seasonal.

In the notebook 3_Exploratory_Data_Analysis we look at autocorrelation plots for each of the stations and conclude that with the exception of Circular Quay there are no clear seasonal effects and the short term movement around the trend is random. For Circular Quay we see a yearly (52 week) effect which is likely to be due to NYE.

Analysis of Daily Traffic¶

In this section we examine the daily Exit traffic. Daily data is constructed for each Station/Date from the hourly data by adding together the Exit traffic between 7PM and the last train. We exclude Exit traffic before 7PM as it is heavily influenced by commuter traffic

In [11]:

df_daily = pd.read_pickle("TrainValidationData/df_daily.pkl")

The trellis plot below list Stations from left to right and Nights from top to bottom.

In [12]:

util.plot_stations_daily(df_daily,stations[:5])

We plot Central and Townhall separately due to the disparity of the y-axis scales.

In [13]:

util.plot_stations_daily(df_daily,stations[5:])

As with the hourly data we note there are some outliers. This varies from 5% -15%. We note that on Fridays King Cross has low traffic growth compared to Parramatta and Newtown.

In 3_Exploratory_Data_Analysis we evaluate the autocorrelation plots for all stations and conclude that with the exception of Circular Quay (shown below) there aren't clear seasonal effects and the movement around the trend is random. For Circular Quay we see a yearly (52 week) effect which is likely to be due to NYE.

In [14]:

ts = (df_daily.query('Station == "Circular Quay Station" and Night == "Friday"').
          groupby(level='Date')['Exit', 'Entry'].sum())

plt.figure(figsize=(16,4))
util.plot_autocorrelation(outlier.residue(ts.Exit), "Circular Quay Friday Night Exit Traffic")

<matplotlib.figure.Figure at 0x7fbba45f9278>

Change Points¶

To test if the traffic changed due to the lockout laws we use Baysian Change Point (BCP) detection. This is well suited to our needs as the algorithm works by detecting changes in the underlying mechanism that generates the time series. The output is a probability that a change occurred at each point in the time series.

To visualise the Change Points (see below) we plot the time series with the change probability directly below the time axes. We then manually annotate the plot with:

A grey dotted vertical line to show the date the lockout laws were introduced.
Orange vertical lines were each change is predicted, i.e., peak in the change probability.
Trend lines between each change point.

Computing the Baysian Change Point across all the Stations takes about 30 min and is done in the notebook 4_Change_Point_Detection and stored for later use.

In [15]:

df_daily_changes = pd.read_pickle("TrainValidationData/df_daily_changes.pkl")

Kings Cross¶

Friday¶

This BCP plot below indicates a change (30%) in Sept 2014, with the most probable date being 19-Sept. From the trend lines we see the Exit traffic up to 19-Sep-2014 (green) is is flat then jumps about 500 Exits/day and is nearly flat from then (red).

We specifically note there is no indication of a change immediately before or after the lock law date.

In [162]:

ts, axes = util.plot_bcp(df_daily_changes, "Kings Cross Station", "Friday")
util.segment(ts.Exit, ['19-Sep-2014'], axes)

This a surprising finding as it is 6 months after the lockout law, however a visual inspection of the time series support this and as the probability (30%) is reasonably high it needs investigating.

One possible cause is the switch from the old Magnetic tickets to OPAL cards. This is plotted below.

In [165]:

ax = util.plot_source(df_raw, "Kings Cross Station", "Friday")
util.mark(['19-Sep-2014'], [ax]);
plt.legend(loc='lower left');

We observe the BCP algorithm detected a change when OPAL card usage exceeded Magnetic ticket. Finding this change is consistent with the way the BCP algorithm works, i.e., it finds changes in the underlying mechanism that generates the data. In this case the change in mechanism is the relative increase in OPAL card Exits vs Magnetic ticket.

These two ticketing mechanisms produce subtly different Exit traffic as OPAL forces riders to "tap off" where as Magnetic tickets do not. This means for a constant number of riders as the ratio of OPAL to Magnetic tickets increases the Exit traffic will increase due to higher reporting rates. This suggests the jump in Sep is due to the higher Exit reporting from OPAL and the switch from flat to slow growth in trend is probably an artifact of the relative increase in OPAL usage.

Saturday¶

The BCP algorithm detects a low change probability (3%) spread across a 5 month period (Oct-2014 to Feb-2015), which is less clear that for the Friday.

In [166]:

ts, axes = util.plot_bcp(df_daily_changes, "Kings Cross Station", "Saturday")
util.segment(ts.Exit, ['9-Oct-2014', '1-Mar-2015'], axes)

Examining the Source plot below we conclude the detected change is also due to the switch from Magnetic tickets to OPAL, however the BCP plot has a lower, broader change probability profile as this switch is more gradual than on Fridays.

In [167]:

ax = util.plot_source(df_raw, "Kings Cross Station", "Saturday")
util.mark(['9-Oct-2014', '1-Mar-2015'], [ax])
plt.legend(loc='lower left');

Parramatta¶

Friday¶

From the BCP and Source plots below we conclude there isn't a change in Exit traffic from the lockout law, however the BCP algorithm has again found the crossover for Magnetic ticket and OPAL usage.

We note there appears to be a real growth in Parramatta Exit traffic of around 500 per year, which over the three and a half years 0 is a 100% growth. This increase appears to be relatively constant from start to finish.

In [170]:

ts, axes = util.plot_bcp(df_daily_changes, "Parramatta Station", "Friday")
util.segment(ts.Exit, ['1-Aug-2014'], axes)

In [171]:

ax = util.plot_source(df_raw, "Parramatta Station", "Friday")
util.mark('1-Aug-2014', [ax])
plt.legend(loc='best');

Saturday¶

From the BCP and Source plots below we again conclude there isn't a change in Exit traffic from the lockout law, however the BCP algorithm again has again found the crossover for Magnetic ticket and OPAL usage.

In [172]:

ts, axes = util.plot_bcp(df_daily_changes, "Parramatta Station", "Saturday")
util.segment(ts.Exit, ['16-Aug-2014'], axes)

In [175]:

ax = util.plot_source(df_raw, "Parramatta Station", "Saturday")
util.mark('16-Aug-2014', [ax])
plt.legend(loc='best');

Newtown¶

Friday¶

The BCP plot is more complex than for other stations with five significant change probability peaks. We conclude that

the strong change probability peaks in 15-Feb-2013 and 10-May-2013 are due to the poor/missing data, probably from station works.
the change at 22-Aug-2014 is again due to OPAL usage overtaking Magnetic tickets.
The unmarked peak in Feb 2015 is due to a data issue that causes the wide deep drop in the time series.
the peak at 21-Feb-2014 is evidence of a change from the lockout laws. Specifically there is a modest change probability peak at which the trend lines change from flat to growth. We expect the probability peak would be more pronounced if it were not for the OPAL transition at 22-Aug-2014.

In short, we conclude that after the lockout law the traffic to Newtown on a Friday night has changed.

In [198]:

ts, axes = util.plot_bcp(df_daily_changes, "Newtown Station", "Friday")
util.segment(ts.Exit, ['10-May-2013','21-Feb-2014', '22-Aug-2014'], axes)

In [199]:

ax = util.plot_source(df_raw, "Newtown Station", "Friday")
util.mark(['10-May-2013','21-Feb-2014', '22-Aug-2014'], [ax])
plt.legend(loc='best');

Saturday¶

The BCP plot for Saturday is difficult to interpret due large amount of missing data.

In [200]:

ts, axes = util.plot_bcp(df_daily_changes, "Newtown Station", "Saturday")
util.mark('6-Sep-2014', axes)

outlier.plot(ts[:'30-Aug-2014'].Exit, ax=axes[0])
outlier.plot(ts['13-Sep-2014':].Exit, ax=axes[0])
axes[0].legend(loc='best', ncol = 3);

In [201]:

ax = util.plot_source(df_raw, "Newtown Station", "Saturday")

util.mark('6-Sep-2014', [ax])
plt.legend(loc='best');

In [52]:

stations

Out[52]:

['Kings Cross Station',
 'Newtown Station',
 'Parramatta Station',
 'Circular Quay Station',
 'Museum Station',
 'Central Station',
 'Town Hall Station']

Circular Quay Station¶

The BCP plot shows no evidence of changes in traffic from the lockout laws.

Curiously the time series has large (3x-8x) spikes between mid-May and early June which we can't explain.

In [87]:

ts, axes = util.plot_bcp(df_daily_changes, "Circular Quay Station", "Friday")
util.mark(['2-Feb-2013', '11-May-2013', '8-Jun-2013', 
                         '10-May-2014', '7-Jun-2014', '17-Oct-2014', 
                         '15-May-2015', '6-Jun-2015', '26-Sep-2015', 
                         '14-May-2016'], axes)
#util.segment(ts.Exit, ['19-Sep-2014'], axes)

In [88]:

ax = util.plot_source(df_raw, "Circular Quay Station", "Friday")
util.mark(['2-Feb-2013', '11-May-2013', '8-Jun-2013', '10-May-2014', '7-Jun-2014', '17-Oct-2014', '15-May-2015', '6-Jun-2015', '26-Sep-2015', "14-May-2016"], [ax])
plt.legend(loc='upper left');

In [89]:

print (df_daily_changes[df_daily_changes.Exit > 6000].
     query('Station == "Circular Quay Station" and Night == "Friday"').
     index)

df_raw.query("Station == 'Circular Quay Station' and index in ['2013-06-07', '2014-05-23', '2015-05-22', '2015-05-29', '2015-06-05', '2016-05-27', '2016-06-03', '2016-06-10', '2016-06-17']" )

DatetimeIndex(['2013-06-07', '2014-05-23', '2015-05-22', '2015-05-29',
               '2015-06-05', '2016-05-27', '2016-06-03', '2016-06-10',
               '2016-06-17'],
              dtype='datetime64[ns]', name='Date', freq=None)

Out[89]:

	Source	Bands	Station	Entry	Exit	Datetime	Hour	Night
Date
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	554	197	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	32	33	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	537	116	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	8	13	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	36	46	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	15	27	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	42	74	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	80	160	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	94	149	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	7	9	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	14	13	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	10	11	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	108	24	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	389	438	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	7	13	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	0	1	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	17 to 18	Circular Quay Station	0	1	2014-05-23 17:00:00	5PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	298	210	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	43	18	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	244	134	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	7	14	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	14	48	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	38	36	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	43	84	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	79	125	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	66	123	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	6	8	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	7	9	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	6	2	2014-05-23 18:00:00	6PM	Friday
2014-05-23	Magnetic tickets	18 to 19	Circular Quay Station	48	6	2014-05-23 18:00:00	6PM	Friday
...	...	...	...	...	...	...	...	...
2013-06-07	Magnetic tickets	21 to 22	Circular Quay Station	17	16	2013-06-07 21:00:00	9PM	Friday
2013-06-07	Magnetic tickets	21 to 22	Circular Quay Station	54	22	2013-06-07 21:00:00	9PM	Friday
2013-06-07	Magnetic tickets	21 to 22	Circular Quay Station	19	11	2013-06-07 21:00:00	9PM	Friday
2013-06-07	Magnetic tickets	21 to 22	Circular Quay Station	9	2	2013-06-07 21:00:00	9PM	Friday
2013-06-07	Magnetic tickets	21 to 22	Circular Quay Station	250	120	2013-06-07 21:00:00	9PM	Friday
2013-06-07	Magnetic tickets	21 to 22	Circular Quay Station	132	104	2013-06-07 21:00:00	9PM	Friday
2013-06-07	Magnetic tickets	22 to 23	Circular Quay Station	51	2	2013-06-07 22:00:00	10PM	Friday
2013-06-07	Magnetic tickets	22 to 23	Circular Quay Station	18	2	2013-06-07 22:00:00	10PM	Friday
2013-06-07	Magnetic tickets	22 to 23	Circular Quay Station	11	8	2013-06-07 22:00:00	10PM	Friday
2013-06-07	Magnetic tickets	22 to 23	Circular Quay Station	51	6	2013-06-07 22:00:00	10PM	Friday
2013-06-07	Magnetic tickets	22 to 23	Circular Quay Station	16	6	2013-06-07 22:00:00	10PM	Friday
2013-06-07	Magnetic tickets	22 to 23	Circular Quay Station	6	1	2013-06-07 22:00:00	10PM	Friday
2013-06-07	Magnetic tickets	22 to 23	Circular Quay Station	216	27	2013-06-07 22:00:00	10PM	Friday
2013-06-07	Magnetic tickets	22 to 23	Circular Quay Station	107	61	2013-06-07 22:00:00	10PM	Friday
2013-06-07	Magnetic tickets	23 to 24	Circular Quay Station	35	0	2013-06-07 23:00:00	11PM	Friday
2013-06-07	Magnetic tickets	23 to 24	Circular Quay Station	8	1	2013-06-07 23:00:00	11PM	Friday
2013-06-07	Magnetic tickets	23 to 24	Circular Quay Station	16	5	2013-06-07 23:00:00	11PM	Friday
2013-06-07	Magnetic tickets	23 to 24	Circular Quay Station	35	0	2013-06-07 23:00:00	11PM	Friday
2013-06-07	Magnetic tickets	23 to 24	Circular Quay Station	11	2	2013-06-07 23:00:00	11PM	Friday
2013-06-07	Magnetic tickets	23 to 24	Circular Quay Station	4	2	2013-06-07 23:00:00	11PM	Friday
2013-06-07	Magnetic tickets	23 to 24	Circular Quay Station	152	11	2013-06-07 23:00:00	11PM	Friday
2013-06-07	Magnetic tickets	23 to 24	Circular Quay Station	105	31	2013-06-07 23:00:00	11PM	Friday
2013-06-07	Magnetic tickets	24 to 25	Circular Quay Station	12	0	2013-06-08 00:00:00	12AM	Friday
2013-06-07	Magnetic tickets	24 to 25	Circular Quay Station	2	0	2013-06-08 00:00:00	12AM	Friday
2013-06-07	Magnetic tickets	24 to 25	Circular Quay Station	7	1	2013-06-08 00:00:00	12AM	Friday
2013-06-07	Magnetic tickets	24 to 25	Circular Quay Station	8	1	2013-06-08 00:00:00	12AM	Friday
2013-06-07	Magnetic tickets	24 to 25	Circular Quay Station	3	0	2013-06-08 00:00:00	12AM	Friday
2013-06-07	Magnetic tickets	24 to 25	Circular Quay Station	31	0	2013-06-08 00:00:00	12AM	Friday
2013-06-07	Magnetic tickets	24 to 25	Circular Quay Station	44	2	2013-06-08 00:00:00	12AM	Friday
2013-06-07	Magnetic tickets	25 to 26	Circular Quay Station	1	0	2013-06-08 01:00:00	1AM	Friday

2308 rows × 8 columns

Museum Station¶

There is no evidence of change from the lockout laws and yet again the algorithm has identified the switch over from Magnetic tickets to OPEL.

There was a strong change probability peak (30%) on 26-Sep-2015 which we have not investigated.

In [90]:

ts, axes = util.plot_bcp(df_daily_changes, "Museum Station", "Friday")
util.segment(ts.Exit,['11-May-2013', '20-Sep-2014', '26-Sep-2015' ], axes)

In [91]:

ax = util.plot_source(df_raw, "Museum Station", "Friday")
util.mark(['11-May-2013', '20-Sep-2014', '26-Sep-2015' ], [ax])
plt.legend(loc='upper left');

Central Station¶

There is no evidence of change from the lockout laws and yet again the algorithm has identified the switch over from Magnetic tickets to OPEL.

There was a strong change probability peak (30%) on 26-Sep-2015 which we have not investigated.

In [92]:

ts, axes = util.plot_bcp(df_daily_changes, "Central Station", "Friday")
util.segment(ts.Exit, ['17-Oct-2014', '26-Sep-2015'], axes)

In [93]:

ax = util.plot_source(df_raw, "Central Station", "Friday")
util.mark(['19-Sep-2014'], [ax]);
plt.legend(loc='upper left');

Town Hall Station¶

There is no evidence of change from the lockout laws and again the algorithm has identified the switch over from Magnetic tickets to OPEL.

In [94]:

ts, axes = util.plot_bcp(df_daily_changes, "Town Hall Station", "Friday")
util.segment(ts.Exit, ['9-Oct-2014'], axes)

In [95]:

ax = util.plot_source(df_raw, "Town Hall Station", "Friday")
util.mark( ['9-Oct-2014'], [ax]);
plt.legend(loc='upper left');