In this notebook we will produce some visualisations of In the Spotlight performance data over time to see if we can begin to identify any trends.
As we begin to get into more complicated territory, we won't explain every function used in detail. However, hopefully there will be something here that most can follow.
We will again use pandas and plotly as our core Python libraries, both of which were introduced in previous notebooks.
import pandas
import plotly
Our input will again be the dataframe of performance data introduced in a previous notebook. The dataframe is loaded in the code block below.
import os
import sys
module_path = os.path.abspath(os.path.join('..', 'data', 'scripts'))
if module_path not in sys.path:
sys.path.append(module_path)
from get_its_performances import get_performances_df
df = get_performances_df()
# Sets plotly to offline mode
plotly.offline.init_notebook_mode()
As a reminder of how this dataframe looks we can run the head()
function.
df.head()
title | date | genre | link | theatre | city | source | |
---|---|---|---|---|---|---|---|
0 | Pageantry | NaN | NaN | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Theatre Royal, Margate | Margate | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... |
1 | The Hypocrite | NaN | Comedy | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Theatre Royal, Margate | Margate | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... |
2 | The Padlock | NaN | Musical Farce | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Theatre Royal, Margate | Margate | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... |
3 | The Village Lawyer | NaN | Farce | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Theatre Royal, Margate | Margate | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... |
4 | Death of Gen. Wolfe | NaN | Ballet | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Theatre Royal, Margate | Margate | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... |
As we begin looking at our date information more closely it might be useful to add separate columns for day, month and year to our dataframe so that we can plot other entities against these values.
We will also want to remove any rows that do not contian a date, or contain an incomplete date, as is the case for many of the playbills. The following line of code checks each value in the date column against a regular expression and removes those rows that do not match the pattern that identifies a complete date.
df = df[df.date.str.contains('\d{4}-\d{2}-\d{2}', na=False)]
The date column is then converted to a date type.
df['date'] = pandas.to_datetime(df['date'])
We are now ready to create our additional columns.
df['day'] = df['date'].dt.strftime('%d').astype('int32')
df['month'] = df['date'].dt.strftime('%m').astype('int32')
df['year'] = df['date'].dt.strftime('%Y').astype('int32')
df.head()
title | date | genre | link | theatre | city | source | day | month | year | |
---|---|---|---|---|---|---|---|---|---|---|
194 | Wandering Boys: Or, the Castle of Olival | 1829-04-30 | NaN | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Miscellaneous Plymouth theatres | Plymouth | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... | 30 | 4 | 1829 |
198 | High Life Below Stairs | 1828-04-10 | Farce | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Miscellaneous Plymouth theatres | Plymouth | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... | 10 | 4 | 1828 |
202 | Jack Robinson and His Monkey | 1829-01-30 | NaN | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Miscellaneous Plymouth theatres | Plymouth | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... | 30 | 1 | 1829 |
205 | Invincibles; Ou Les Femmes Soldats | 1829-03-05 | NaN | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Miscellaneous Plymouth theatres | Plymouth | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... | 5 | 3 | 1829 |
208 | Devil to Pay | 1830-11-23 | Farce | http://access.bl.uk/item/viewer/ark:/81055/vdc... | Miscellaneous Plymouth theatres | Plymouth | https://api.bl.uk/metadata/iiif/ark:/81055/vdc... | 23 | 11 | 1830 |
We can now identify the days, months or years where most plays were performed. The following code block plots a chart of plays performed by month of the year.
date_part = 'month'
series = df[date_part].value_counts()
series.sort_index(inplace=True)
trace = plotly.graph_objs.Scatter(x=series.index, y=series)
fig = plotly.graph_objs.Figure(data=[trace])
plotly.offline.iplot(fig)
We can see that there appears to be a trend towards less performances during the middle of the year. Although, with a relatively small dateset we might want to be careful about attempting to draw any conclusions just yet (trends will become clearer as more data is collected).
Similar charts for the day or year can be produced by modifying the date_part
variable above.
Work in progress!