In this project we will be working with a database called Chinook. This database is associated with a digital music shop. It contains information about employees, customers, purchases, and everything to do with the tracks that are on the platform (artists, albums, genres, playlists, etc.). The following is the schema for this Chinook database.
The aim of this project is to answer business questions to help Chinook make more informed decisions.
%%capture
%load_ext sql
%sql sqlite:///chinook.db
# Import necessary libraries
import sqlite3
import plotly.express as px, plotly.graph_objects as go
import pandas as pd
Check that a connection to the database was properly made and the contents of this database.
%%sql
SELECT name, type
FROM sqlite_master
WHERE type = 'table'
* sqlite:///chinook.db Done.
name | type |
---|---|
album | table |
artist | table |
customer | table |
employee | table |
genre | table |
invoice | table |
invoice_line | table |
media_type | table |
playlist | table |
playlist_track | table |
track | table |
There are 11 tables in the chinook database. The following are samples of four of these tables.
%%sql
/* customer */
SELECT *
FROM customer
LIMIT 3
* sqlite:///chinook.db Done.
customer_id | first_name | last_name | company | address | city | state | country | postal_code | phone | fax | support_rep_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Luís | Gonçalves | Embraer - Empresa Brasileira de Aeronáutica S.A. | Av. Brigadeiro Faria Lima, 2170 | São José dos Campos | SP | Brazil | 12227-000 | +55 (12) 3923-5555 | +55 (12) 3923-5566 | luisg@embraer.com.br | 3 |
2 | Leonie | Köhler | None | Theodor-Heuss-Straße 34 | Stuttgart | None | Germany | 70174 | +49 0711 2842222 | None | leonekohler@surfeu.de | 5 |
3 | François | Tremblay | None | 1498 rue Bélanger | Montréal | QC | Canada | H2G 1A7 | +1 (514) 721-4711 | None | ftremblay@gmail.com | 3 |
%%sql
/* employee */
SELECT *
FROM employee
LIMIT 3
* sqlite:///chinook.db Done.
employee_id | last_name | first_name | title | reports_to | birthdate | hire_date | address | city | state | country | postal_code | phone | fax | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Adams | Andrew | General Manager | None | 1962-02-18 00:00:00 | 2016-08-14 00:00:00 | 11120 Jasper Ave NW | Edmonton | AB | Canada | T5K 2N1 | +1 (780) 428-9482 | +1 (780) 428-3457 | andrew@chinookcorp.com |
2 | Edwards | Nancy | Sales Manager | 1 | 1958-12-08 00:00:00 | 2016-05-01 00:00:00 | 825 8 Ave SW | Calgary | AB | Canada | T2P 2T3 | +1 (403) 262-3443 | +1 (403) 262-3322 | nancy@chinookcorp.com |
3 | Peacock | Jane | Sales Support Agent | 2 | 1973-08-29 00:00:00 | 2017-04-01 00:00:00 | 1111 6 Ave SW | Calgary | AB | Canada | T2P 5M5 | +1 (403) 262-3443 | +1 (403) 262-6712 | jane@chinookcorp.com |
%%sql
/* invoice */
SELECT *
FROM invoice
LIMIT 3
* sqlite:///chinook.db Done.
invoice_id | customer_id | invoice_date | billing_address | billing_city | billing_state | billing_country | billing_postal_code | total |
---|---|---|---|---|---|---|---|---|
1 | 18 | 2017-01-03 00:00:00 | 627 Broadway | New York | NY | USA | 10012-2612 | 15.84 |
2 | 30 | 2017-01-03 00:00:00 | 230 Elgin Street | Ottawa | ON | Canada | K2P 1L7 | 9.9 |
3 | 40 | 2017-01-05 00:00:00 | 8, Rue Hanovre | Paris | None | France | 75002 | 1.98 |
%%sql
/* track */
SELECT *
FROM track
LIMIT 3
* sqlite:///chinook.db Done.
track_id | name | album_id | media_type_id | genre_id | composer | milliseconds | bytes | unit_price |
---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
2 | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 |
3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman | 230619 | 3990994 | 0.99 |
Chinook has recently signed a deal with a record label that specializes in artists from the USA. Furthermore, Chinook is looking to add three albums out of four potential new artists. The following are the artists and their genres.
Artist Name | Genre |
---|---|
Regal | Hip-Hop |
Red Tone | Punk |
Meteor and the Girls | Pop |
Slim Jim Bites | Blues |
In order to determine which three albums (artists) to add to the store, the popularity of each artist's genre has to be gauged. To achieve this we will find the genres that sell the most tracks in the USA.
%%sql
WITH usa_tracks_sold AS (
SELECT il.track_id
FROM invoice AS i
INNER JOIN invoice_line AS il
ON il.invoice_id = i.invoice_id
WHERE i.billing_country = 'USA'
),
genre_popularity AS (SELECT g.name AS genre,
COUNT(t.track_id) AS tracks_sold,
ROUND(COUNT(t.track_id) / CAST((SELECT COUNT(*)
FROM usa_tracks_sold) AS FLOAT), 3) AS percentage_sold
FROM track AS t
INNER JOIN genre AS g
ON g.genre_id = t.genre_id
INNER JOIN usa_tracks_sold
ON usa_tracks_sold.track_id = t.track_id
GROUP BY g.genre_id
ORDER BY tracks_sold DESC)
SELECT genre AS Genre,
tracks_sold AS 'Tracks Sold',
percentage_sold AS 'Percentage Sold'
FROM genre_popularity
* sqlite:///chinook.db Done.
Genre | Tracks Sold | Percentage Sold |
---|---|---|
Rock | 561 | 0.534 |
Alternative & Punk | 130 | 0.124 |
Metal | 124 | 0.118 |
R&B/Soul | 53 | 0.05 |
Blues | 36 | 0.034 |
Alternative | 35 | 0.033 |
Pop | 22 | 0.021 |
Latin | 22 | 0.021 |
Hip Hop/Rap | 20 | 0.019 |
Jazz | 14 | 0.013 |
Easy Listening | 13 | 0.012 |
Reggae | 6 | 0.006 |
Electronica/Dance | 5 | 0.005 |
Classical | 4 | 0.004 |
Heavy Metal | 3 | 0.003 |
Soundtrack | 2 | 0.002 |
TV Shows | 1 | 0.001 |
query = """WITH usa_tracks_sold AS (
SELECT il.track_id
FROM invoice AS i
INNER JOIN invoice_line AS il
ON il.invoice_id = i.invoice_id
WHERE i.billing_country = 'USA'
)
SELECT g.name AS genre,
COUNT(t.track_id) AS tracks_sold,
CAST(COUNT(t.track_id) AS FLOAT) / (SELECT COUNT(*)
FROM usa_tracks_sold) AS percentage_sold
FROM track AS t
INNER JOIN genre AS g
ON g.genre_id = t.genre_id
INNER JOIN usa_tracks_sold
ON usa_tracks_sold.track_id = t.track_id
GROUP BY g.genre_id
ORDER BY tracks_sold DESC"""
# Make connection to database
data = sqlite3.connect('chinook.db')
# Convert the query output into a dataframe
genre_pop = pd.read_sql_query(query, data)
genre_pop = genre_pop.append(genre_pop.loc[9:, ['tracks_sold', 'percentage_sold']].sum(),
ignore_index=True)
genre_pop.iloc[-1, 0] = 'Other'
genre_pop.drop(index=list(range(9, 17)), inplace=True)
# Create a pie plot of the results
fig = px.pie(genre_pop, names='genre', values='tracks_sold',
title='Genre Popularity in US Market',
color_discrete_sequence=px.colors.sequential.RdBu_r)
fig.update_layout(legend_title_text='Genre')
fig.show()
By far, the most popular genre is Rock. At 561 tracks sold, this genre comprises 53.4% of all tracks sold by Chinook in the United States. Following are Alternative & Punk and Metal, each with about 12% of tracks sold.
Using these results to filter the potential new artists, we can deduce that the three albums that should be added are those from Red Tone, Slim Jim Bites, and Meteor and the Girls. The genres of these three artists (Punk, Blues, and Pop, respectively) combine for just over 17% of Chinook's total sales. A Rock genre album would be the most desirable addition but this is not one of the options.
To check the performance of our sales staff, the sale totals will be calculated for each employee.
First, however, lets identify the Sales Support Agents.
%%sql
SELECT *
FROM employee
WHERE employee.title = 'Sales Support Agent'
* sqlite:///chinook.db Done.
employee_id | last_name | first_name | title | reports_to | birthdate | hire_date | address | city | state | country | postal_code | phone | fax | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | Peacock | Jane | Sales Support Agent | 2 | 1973-08-29 00:00:00 | 2017-04-01 00:00:00 | 1111 6 Ave SW | Calgary | AB | Canada | T2P 5M5 | +1 (403) 262-3443 | +1 (403) 262-6712 | jane@chinookcorp.com |
4 | Park | Margaret | Sales Support Agent | 2 | 1947-09-19 00:00:00 | 2017-05-03 00:00:00 | 683 10 Street SW | Calgary | AB | Canada | T2P 5G3 | +1 (403) 263-4423 | +1 (403) 263-4289 | margaret@chinookcorp.com |
5 | Johnson | Steve | Sales Support Agent | 2 | 1965-03-03 00:00:00 | 2017-10-17 00:00:00 | 7727B 41 Ave | Calgary | AB | Canada | T3B 1Y7 | 1 (780) 836-9987 | 1 (780) 836-9543 | steve@chinookcorp.com |
There are three Sales Support Agents: Jane Peacock, Margaret Park, and Steve Johnson.
Now, their sales totals will be calculated.
%%sql
WITH agent_totals AS (
SELECT e.employee_id,
SUM(i.total) AS total
FROM customer AS c
INNER JOIN invoice AS i
ON i.customer_id = c.customer_id
INNER JOIN employee AS e
ON e.employee_id = c.support_rep_id
GROUP BY e.employee_id
)
SELECT e.employee_id AS 'Employee ID',
e.first_name || ' ' || e.last_name AS 'Employee Name',
e.hire_date AS 'Hire Date',
ROUND(at.total, 2) AS 'Total Sales'
FROM agent_totals AS at
INNER JOIN employee AS e
ON e.employee_id = at.employee_id
ORDER BY hire_date
* sqlite:///chinook.db Done.
Employee ID | Employee Name | Hire Date | Total Sales |
---|---|---|---|
3 | Jane Peacock | 2017-04-01 00:00:00 | 1731.51 |
4 | Margaret Park | 2017-05-03 00:00:00 | 1584.0 |
5 | Steve Johnson | 2017-10-17 00:00:00 | 1393.92 |
employee_sales = pd.read_sql_query("""WITH agent_totals AS (
SELECT e.employee_id,
SUM(i.total) AS total
FROM customer AS c
INNER JOIN invoice AS i
ON i.customer_id = c.customer_id
INNER JOIN employee AS e
ON e.employee_id = c.support_rep_id
GROUP BY e.employee_id
)
SELECT e.employee_id,
e.first_name || ' ' || e.last_name AS employee_name,
e.hire_date,
ROUND(at.total, 2) AS total
FROM agent_totals AS at
INNER JOIN employee AS e
ON e.employee_id = at.employee_id
ORDER BY hire_date """, data)
fig = px.bar(employee_sales, x='employee_name', y='total', text='total',
title='Chinook: Total Sales Amount by Employee',
hover_data=['employee_id', 'total', 'hire_date'])
fig.update_xaxes(title_text='Employee')
fig.update_yaxes(title_text='Total', showticklabels=False)
fig.update_layout(paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)')
fig.update_traces(marker_color='rgb(207, 207, 207)')
fig.show()
The output displays that with a total sales amount of $1,731.51, Jane Peacock is the top Sales Support Agent. Following are Margaret Park and Steve Johnson, respectively.
One important thing to note is that all three employees were hired on different dates. Jane was hired first (April 2017), then Margaret (May 2017), and Steve last (October 2017). This may be one of the reasons why the same order is found in the total sales amount for these employees; the longer an employee has worked for Chinook, the more time he/she has had to make sales.
It would be interesting and informative to find out which country provides the most business for Chinook. To do this, a number of different statistics will be calculated for each country.
The statistics for these 24 countries will be displayed in a table with the following fields:
Field Name | Description |
---|---|
Number of Customers | The number of customers |
Total Sales Amount | The total amount spent by customers |
Average Sale Amount | The average amount per purchase by customers |
Average Customer Value | The average amount that customers have spent to date |
It is important to note that some countries have more than one customer, while others have only one. These latter countries are grouped into 'Other' in the results.
%%sql
/* Rename countries with only one customer as 'Other'. */
WITH country_map AS (SELECT country,
CASE WHEN COUNT(DISTINCT customer_id) > 1 THEN country
WHEN COUNT(DISTINCT customer_id) = 1 THEN 'Other'
END AS country_1
FROM customer AS c
GROUP BY country),
/* Join country_map table, group by country name, perform calculations, move 'Other' record to the bottom. */
country_stats AS (SELECT cp.country_1 AS country,
COUNT(DISTINCT c.customer_id) AS customers,
ROUND(SUM(i.total), 2) AS total_sales,
ROUND(AVG(i.total), 2) AS avg_sale_value,
ROUND(SUM(i.total) / COUNT(DISTINCT c.customer_id), 2) AS avg_customer_value,
CASE WHEN cp.country_1 = 'Other' THEN 1
ELSE 0
END AS sort
FROM customer AS c
INNER JOIN invoice AS i
ON i.customer_id = c.customer_id
LEFT JOIN country_map AS cp
ON cp.country = c.country
GROUP BY cp.country_1
ORDER BY sort, total_sales DESC)
SELECT country AS Country,
customers AS 'Number of Customers',
total_sales AS 'Total Sales Amount',
avg_sale_value AS 'Average Sale Amount',
avg_customer_value AS 'Average Customer Value'
FROM country_stats
* sqlite:///chinook.db Done.
Country | Number of Customers | Total Sales Amount | Average Sale Amount | Average Customer Value |
---|---|---|---|---|
USA | 13 | 1040.49 | 7.94 | 80.04 |
Canada | 8 | 535.59 | 7.05 | 66.95 |
Brazil | 5 | 427.68 | 7.01 | 85.54 |
France | 5 | 389.07 | 7.78 | 77.81 |
Germany | 4 | 334.62 | 8.16 | 83.66 |
Czech Republic | 2 | 273.24 | 9.11 | 136.62 |
United Kingdom | 3 | 245.52 | 8.77 | 81.84 |
Portugal | 2 | 185.13 | 6.38 | 92.57 |
India | 2 | 183.15 | 8.72 | 91.58 |
Other | 15 | 1094.94 | 7.45 | 73.0 |
query = """/* Rename countries with only one customer as 'Other'. */
WITH country_map AS (SELECT country,
CASE WHEN COUNT(DISTINCT customer_id) > 1 THEN country
WHEN COUNT(DISTINCT customer_id) = 1 THEN 'Other'
END AS country_1
FROM customer AS c
GROUP BY country),
/* Join country_map table, group by country name, perform calculations, move 'Other' record to the bottom. */
country_stats AS (SELECT cp.country_1 AS country,
COUNT(DISTINCT c.customer_id) AS customers,
ROUND(SUM(i.total), 2) AS total_sales,
ROUND(AVG(i.total), 2) AS avg_sale_value,
ROUND(SUM(i.total) / COUNT(DISTINCT c.customer_id), 2) AS avg_customer_value,
CASE WHEN cp.country_1 = 'Other' THEN 1
ELSE 0
END AS sort
FROM customer AS c
INNER JOIN invoice AS i
ON i.customer_id = c.customer_id
LEFT JOIN country_map AS cp
ON cp.country = c.country
GROUP BY cp.country_1
ORDER BY sort, total_sales DESC)
SELECT country AS Country,
customers AS 'Number of Customers',
total_sales AS 'Total Sales Amount',
avg_sale_value AS 'Average Sale Amount',
avg_customer_value AS 'Average Customer Value'
FROM country_stats"""
country_data = pd.read_sql_query(query, data)
# Create bar plot of data
fig = px.bar(country_data, x='Total Sales Amount', y='Country', title='Sales by Country',
hover_data=country_data.columns[1:],
text='Total Sales Amount',
orientation='h')
fig.update_xaxes(tickangle=90, showticklabels=False)
fig.update_traces(marker_color='rgb(207, 207, 207)')
fig.update_layout(plot_bgcolor='rgba(0,0,0,0)')
fig.show()
With \$1,040.49, the United States is the country that produces the most income for Chinook. Too, it has the most customers (13). Furthermore, in second comes Canada with \$535.59, about half of the income produced by US sales.
fig = px.scatter_matrix(country_data.iloc[:, 1:])
fig.update_layout(height=700, width=970, title='Country Statistics Correlations', title_x=.5)
fig.update_traces(showupperhalf=False, diagonal_visible=False)
fig.show()
The only significant correlation is between Number of Customers
and Total Sales Amount
. In fact, this is a very strong correlation (r-value = 0.99), as expected. However, we must be cautious of these results given the small sample size used in these calculations. Regardless, it makes sense for Chinook to ramp marketing efforts in countries with large populations such as India, United States, Brazil, and Germany as marketing efforts have the potential to reach more people. Too, an alternate strategy could be to target countries whose average lifetime customer value is high. For this strategy, countries like India, Portugal, and Czech Republic make sense. If only one country can be chosen, India makes logical sense as it has both a large population (+1.3 billion) and a high average lifetime customer value ($91.58).
Chinook management is deliberating whether they should purchase individual, popular tracks rather than whole albums from record companies. To aid in this decision, we have to figure out if the majority of customer purchases are individual tracks or whole albums.
Approach:
%%sql
WITH invoice_count AS (SELECT COUNT(invoice.invoice_id) AS num_invoices
FROM invoice),
album_or_not AS (
SELECT i.invoice_id,
CASE WHEN (
SELECT invoice_line.track_id
FROM invoice
INNER JOIN invoice_line
ON invoice_line.invoice_id = i.invoice_id
WHERE invoice.invoice_id = i.invoice_id
EXCEPT
SELECT track.track_id
FROM album
INNER JOIN track
ON track.album_id = album.album_id
WHERE album.album_id = a.album_id) IS NULL
AND
(SELECT track.track_id
FROM album
INNER JOIN track
ON track.album_id = album.album_id
WHERE album.album_id = a.album_id
EXCEPT
SELECT invoice_line.track_id
FROM invoice
INNER JOIN invoice_line
ON invoice_line.invoice_id = i.invoice_id) IS NULL THEN 'album'
ELSE 'single'
END AS album_or_not,
SUM(il.unit_price) AS invoice_total
FROM invoice AS i
INNER JOIN invoice_line AS il
ON il.invoice_id = i.invoice_id
INNER JOIN track AS t
ON t.track_id = il.track_id
INNER JOIN album AS a
ON a.album_id = t.album_id
GROUP BY i.invoice_id
)
SELECT aon.album_or_not AS 'Type of Purchase', COUNT(aon.album_or_not) AS 'Number of Invoices',
ROUND(COUNT(aon.album_or_not) / CAST((SELECT COUNT(i.invoice_id)
FROM invoice AS i) AS FLOAT), 2) AS 'Percentage of Invoices',
ROUND(SUM(aon.invoice_total), 2) AS 'Total Sales'
FROM album_or_not AS aon
GROUP BY aon.album_or_not
* sqlite:///chinook.db Done.
Type of Purchase | Number of Invoices | Percentage of Invoices | Total Sales |
---|---|---|---|
album | 114 | 0.19 | 1461.24 |
single | 500 | 0.81 | 3248.19 |
query = '''
WITH invoice_count AS (SELECT COUNT(invoice.invoice_id) AS num_invoices
FROM invoice),
album_or_not AS (
SELECT i.invoice_id,
CASE WHEN (
SELECT invoice_line.track_id
FROM invoice
INNER JOIN invoice_line
ON invoice_line.invoice_id = i.invoice_id
WHERE invoice.invoice_id = i.invoice_id
EXCEPT
SELECT track.track_id
FROM album
INNER JOIN track
ON track.album_id = album.album_id
WHERE album.album_id = a.album_id) IS NULL
AND
(SELECT track.track_id
FROM album
INNER JOIN track
ON track.album_id = album.album_id
WHERE album.album_id = a.album_id
EXCEPT
SELECT invoice_line.track_id
FROM invoice
INNER JOIN invoice_line
ON invoice_line.invoice_id = i.invoice_id) IS NULL THEN 'album'
ELSE 'single'
END AS album_or_not,
SUM(il.unit_price) AS invoice_total
FROM invoice AS i
INNER JOIN invoice_line AS il
ON il.invoice_id = i.invoice_id
INNER JOIN track AS t
ON t.track_id = il.track_id
INNER JOIN album AS a
ON a.album_id = t.album_id
GROUP BY i.invoice_id
)
SELECT aon.album_or_not AS 'Type of Purchase', COUNT(aon.album_or_not) AS 'Number of Invoices',
ROUND(COUNT(aon.album_or_not) / CAST((SELECT COUNT(i.invoice_id)
FROM invoice AS i) AS FLOAT), 2) AS 'Percentage of Invoices',
ROUND(SUM(aon.invoice_total), 2) AS 'Total Sales'
FROM album_or_not AS aon
GROUP BY aon.album_or_not'''
# Store query output in DataFrame
album_or_not = pd.read_sql_query(query, data)
# Create bar plot of DataFrame
fig = px.bar(album_or_not, x='Type of Purchase', y='Number of Invoices',
title='Number of Album vs Single Purchases',
text='Number of Invoices',
hover_data=['Number of Invoices', 'Percentage of Invoices', 'Total Sales'])
fig.update_traces(marker_color='rgb(207, 207, 207)')
fig.update_yaxes(showticklabels=False)
fig.update_layout(plot_bgcolor='rgba(0,0,0,0)')
fig.show()
As the figures suggest, there were 114 whole album purchases, comprising 19% of invoices. However, album purchases account for \$1,461.24 in revenue. This is 31% of the combined album and single track(s) sales total. Therefore, Chinook management should be cautious of the proposed purchasing strategy as it might cause the loss of 31% of revenue. I would recommend against this proposed strategy.
There are various ways in which an artist's popularity can be gauged. One way is to calculate the number of tracks purchased grouped by artist.
query = '''WITH most_tracks_purchased AS (SELECT ar.name AS artist_name,
COUNT(il.invoice_line_id) AS number_of_sold_tracks
FROM track AS t
INNER JOIN invoice_line AS il
ON il.track_id = t.track_id
INNER JOIN album AS al
ON al.album_id = t.album_id
INNER JOIN artist AS ar
ON ar.artist_id = al.artist_id
GROUP BY ar.artist_id, ar.name
ORDER BY number_of_sold_tracks DESC, artist_name
LIMIT 10)
SELECT artist_name AS 'Artist',
number_of_sold_tracks AS 'Tracks Sold'
FROM most_tracks_purchased'''
# Store the output of query into a DataFrame
tracks_sold = pd.read_sql_query(query, data)
# Create bar plot of DataFrame
fig = px.bar(tracks_sold, x='Tracks Sold', y='Artist', orientation='h', text='Tracks Sold',
title='Top 10 Number of Tracks Sold by Artist')
fig.update_xaxes(showticklabels=False)
fig.update_traces(marker_color='rgb(207, 207, 207)')
fig.update_layout(plot_bgcolor='rgba(0,0,0,0)')
fig.show()
Using this method, Queen is the most popular artist on the Chinook platform.
Too, this estimation can be performed by counting the number of playlists that an artist is a part of.
query = '''WITH most_playlists AS (SELECT ar.name AS artist_name,
COUNT(DISTINCT(pt.playlist_id)) AS number_of_appearances
FROM track AS t
INNER JOIN playlist_track AS pt
ON pt.track_id = t.track_id
INNER JOIN album AS al
ON al.album_id = t.album_id
INNER JOIN artist AS ar
ON ar.artist_id = al.artist_id
GROUP BY ar.artist_id, ar.name
ORDER BY number_of_appearances DESC
LIMIT 10)
SELECT artist_name AS 'Artist',
number_of_appearances AS 'Number of Playlists'
FROM most_playlists'''
# Store the output of query into a DataFrame
most_playlists = pd.read_sql_query(query, data)
# Create bar plot of DataFrame
fig = px.bar(most_playlists, x='Number of Playlists', y='Artist', orientation='h', text='Number of Playlists',
title='Top 10 Artists w/ Most Playlist Appearances')
fig.update_xaxes(showticklabels=False)
fig.update_traces(marker_color='rgb(207, 207, 207)')
fig.update_layout(plot_bgcolor='rgba(0,0,0,0)')
fig.show()
For this latter calculation, the most "popular" artist is Eugene Ormandy.
The two calculations provide significantly different results. However, considering that the former method is directly related to revenue earned by Chinook, it is more valid if the results were to be used for decision-making.
There might be tracks that have not sold much or at all. If we look farther out, there might be artists who make little to no sales for Chinook. If this is the case, Chinook might not be receiving returns that justify the cost of a particular artist or album.
In this section of the analysis, we will calculate the number of tracks for each artist that have not been purchased.
%%sql
WITH not_purchased AS (SELECT ar.name AS artist,
COUNT(DISTINCT t.track_id) AS tracks_not_sold
FROM track AS t
LEFT JOIN invoice_line AS il
ON il.track_id = t.track_id
INNER JOIN album AS al
ON al.album_id = t.album_id
INNER JOIN artist AS ar
ON ar.artist_id = al.artist_id
WHERE il.invoice_line_id IS NULL
GROUP BY ar.artist_id, ar.name),
artist_total AS (SELECT ar.name AS artist,
ROUND(SUM(il.unit_price), 2) AS sales_total,
COUNT(DISTINCT t.track_id) AS total_tracks
FROM track AS t
LEFT JOIN invoice_line AS il
ON il.track_id = t.track_id
INNER JOIN album AS al
ON al.album_id = t.album_id
INNER JOIN artist AS ar
ON ar.artist_id = al.artist_id
GROUP BY ar.artist_id, ar.name)
SELECT np.artist AS Artist,
np.tracks_not_sold AS 'Tracks Not Sold',
at.total_tracks - np.tracks_not_sold AS 'Tracks Sold',
at.total_tracks AS 'Total Tracks',
CASE WHEN at.sales_total IS NULL THEN 0
ELSE at.sales_total
END AS 'Sales Total'
FROM not_purchased AS np
INNER JOIN artist_total AS at
ON at.artist = np.artist
ORDER BY ROUND(CAST(tracks_not_sold AS FLOAT) / total_tracks, 2) DESC
LIMIT 10
* sqlite:///chinook.db Done.
Artist | Tracks Not Sold | Tracks Sold | Total Tracks | Sales Total |
---|---|---|---|---|
BackBeat | 12 | 0 | 12 | 0 |
Cláudio Zoli | 10 | 0 | 10 | 0 |
Various Artists | 56 | 0 | 56 | 0 |
Frank Zappa & Captain Beefheart | 9 | 0 | 9 | 0 |
Elis Regina | 14 | 0 | 14 | 0 |
Gene Krupa | 22 | 0 | 22 | 0 |
Toquinho & Vinícius | 15 | 0 | 15 | 0 |
Dennis Chambers | 9 | 0 | 9 | 0 |
João Suplicy | 14 | 0 | 14 | 0 |
Mônica Marianno | 12 | 0 | 12 | 0 |
The above output is a sample of the 145 artists who have tracks that have not sold. Of these 145 artists, 74 (51%) of them have not produced a single purchase on Chinook. Naturally, this means that their work has not produced any revenue for the company. Considering this, it makes it an easy decision to remove these artists if it would prevent future spending for the rights to their work.
However, a closer examination is needed for the other 71 artists who have produced purchases. This analysis will have to be on a more album-by-album basis to determine if any artist's work should be removed. For this, the figures of what is spent on acquiring albums is needed, but this data is not available in the database.
Through the performed analysis we have found answers to business questions that will aid Chinook management decision-making.
With regards to genre popularity, the analysis shows that with 53.4% of tracks sold, Rock is the most popular genre on the platform. Rock is followed by Alternative & Punk and Metal, each with about 12% of tracks sold. This is confirmed by the ten artists with the most tracks sold: Queen, Jimmi Hendrix, Red Hot Chili Peppers, Pearl Jam, AC/DC, Guns N' Roses, Foo Fighters, The Rolling Stones, and Metallica. However, considering the four artists that Chinook can potentially bring in, the recommendation is to bring in Red Tone, Slim Jim Bites, and Meteor and the Girls. This recommendation is based on the fact that the genres of these three artists (Punk, Blues, and Pop, respectively) are the most popular of the four options.
The sales totals attributed to each Sales Support Agent are as follows: Jane Peacock with \$1,731.51, Margaret Park with \$1,584.00, and Steve Johnson with \$1,393.92. It is interesting to note that the order in which these employees were hired is the same. In other words, Jane was hired first, then Margaret, and last Steve. Considering this, the sales totals, or rather, the order of sales totals makes sense.
When considering countries for which to expand marketing efforts in, it makes sense to target countries with large populations and high average lifetime customer values. One country that meets both of these criteria is India. India has a population of over 1.3 billion and customers average lifetime totals of \$91.58. Other countries, such as the US, Germany, Brazil, Portugal, and Czech Republic meet one, not both of the criteria.
Too, we have figured out that album sales make up 19% of all sales but, most importantly, 31% of total revenue. For this reason, it is not recommended to eliminate whole albums from Chinook in favor of single tracks only. Doing so can potentially lead to a loss of \$1,461.24.
Lastly, calculating the number of unsold tracks highlights some issues. There are 74 artists whose work has not produced one sale on the platform. In other words, this work has produced no revenue for Chinook. Clearly, it is not worthwhile to have these artists on the platform, especially if Chinook pays for the rights to their work on a rolling basis. Moreover, there are another 71 artists whose work includes a varying amount of tracks that have not been purchased. For these artists, a closer, album-by-album examination is needed to determine potential areas where investment is not seeing adequate return. However, to perform this analysis, data on amounts payed to record labels per album is required and this is currently not availble in the Chinook database.