#!/usr/bin/env python # coding: utf-8 # # An Introduction to Visualising In the Spotlight Data Using Python # # In this notebook we will introduce a way of producing visualisations of [*In the Spotlight*](https://www.libcrowds.com/collection/playbills) results data, using Python. # # [Plotly.py](https://plot.ly/d3-js-for-python-and-pandas-charts/) is a Python graphing library that can be used to produce over 30 chart types that can viewed in Jupyter notebooks. We will use it here to produce pie and bar charts. In future notebooks we may go on to explore some more complex chart types. # # We begin by importing the required Python libraries, pandas and plotly. # In[24]: import pandas import plotly # ## The dataset # # Our input will be the dataframe of performance data introduced in a [previous notebook](intro_to_analysing_its_data_using_python.ipynb). Again, all we need to know about the code block below is that it loads our dataframe of performance data. # In[25]: import os import sys module_path = os.path.abspath(os.path.join('..', 'data', 'scripts')) if module_path not in sys.path: sys.path.append(module_path) from get_its_performances import get_performances_df df = get_performances_df() # Sets plotly to offline mode plotly.offline.init_notebook_mode() # As a reminder of how this dataframe looks we can run the `head()` function. # In[26]: df.head() # ## Pie charts # # Pie charts are perhaps one of the most straightforward types of visualisation to get started with, all we need are a list of unique labels against the a list of counts for those labels. We can get these by using the `value_counts()` method, which was introduced in an [earlier notebook](intro_to_analysing_its_data_using_python.ipynb). # # The `entity` variable defined below identifies the column that we are counting. # In[27]: entity = 'genre' series = df[entity].value_counts() # The output of the `value_counts()` function is a pandas [Series](https://pandas.pydata.org/pandas-docs/stable/dsintro.html#series), which is a one-dimensional labeled array capable of holding any data type. As with a dataframe, we can also use the `head()` function with a series to display a quick snapshot of the data. # In[28]: series.head() # We can now define the labels and values to be used for our chart. Below, a limit of 10 is defined, before taking that number of rows and setting these as our *labels* and *values*. # In[29]: limit = 10 labels = series[:limit].index.tolist() values = series[:limit].tolist() # The Plotly chart can then be generated and displayed using the code below. # In[30]: trace = plotly.date_part = plotly.graph_objs.Pie(labels=labels, values=values) fig = plotly.graph_objs.Figure(data=[trace]) plotly.offline.iplot(fig) # There are many options available in the plotly library for styling these charts, such as hiding the legend or displaying additional information when hovering over particular areas. These options are probably a little too much to get into here but more details can be found in the [plotly documentation](https://plot.ly/). # # To see the chart for a different column, such as title, you can try modifying the `entity` variable above. Note that the percentages shown are of the slice of data defined by our specified `limit`, rather than of the whole dataset. # ## Bar charts # # Bar charts can be produced with very similar code to the pie chart generated above. Again, we just need a list of labels and a list of values. In fact, we will produce our first chart using the labels and values already defined above. # In[31]: trace = plotly.graph_objs.Bar(x=labels, y=values) fig = plotly.graph_objs.Figure(data=[trace]) plotly.offline.iplot(fig) # Again, you can generate a chart for a different column by assigning a different value to the `entity` variable above. # ## Summary # # In this notebook we began visualising our perfomance data using Python. # In[ ]: