For this project we will be querying the chinook database to answer certain hypothetical business questions. The questions we will be answering are as follow:
Here's a schema diagram of the database below to give you information of the structure of our database.
%%capture
%load_ext sql
%sql sqlite:///chinook.db
%%sql
SELECT
name,
type
FROM sqlite_master
WHERE type IN ("table","view");
* sqlite:///chinook.db Done.
name | type |
---|---|
album | table |
artist | table |
customer | table |
employee | table |
genre | table |
invoice | table |
invoice_line | table |
media_type | table |
playlist | table |
playlist_track | table |
track | table |
Here Chinbook has just signed a deal with a new record label that specialises with artist from the US and we are tasked with finding out the three albums out of four to add to our store. All of the four artists have no tracks in our store. We are going to find out the top selling genres in the US and use that to drive our decision on which three albums to add to the store. Below, we have the artist names and the genre of their albums.
Artist Name | Genre |
---|---|
Regal | Hip-Hop |
Red Tone | Punk |
Meteor and the Girls | Pop |
Slim Jim Bites | Blues |
%%sql
WITH us_records AS
(SELECT
c.country,
il.track_id
FROM customer AS c
LEFT JOIN invoice AS i
ON i.customer_id = c.customer_id
LEFT JOIN invoice_line AS il
ON i.invoice_id = il.invoice_id
WHERE country = 'USA'
),
us_genre_records AS
(SELECT
g.name AS genre,
COUNT(*) AS tracks_sold
FROM us_records AS ur
LEFT JOIN track AS t
ON t.track_id = ur.track_id
LEFT JOIN genre AS g
ON g.genre_id = t.genre_id
GROUP BY genre
)
SELECT
*,
ROUND(CAST(tracks_sold AS FLOAT) / (SELECT
SUM(tracks_sold)
FROM us_genre_records
), 3) AS perecntage_sold
FROM us_genre_records
ORDER BY tracks_sold DESC
LIMIT 10;
* sqlite:///chinook.db Done.
genre | tracks_sold | perecntage_sold |
---|---|---|
Rock | 561 | 0.534 |
Alternative & Punk | 130 | 0.124 |
Metal | 124 | 0.118 |
R&B/Soul | 53 | 0.05 |
Blues | 36 | 0.034 |
Alternative | 35 | 0.033 |
Latin | 22 | 0.021 |
Pop | 22 | 0.021 |
Hip Hop/Rap | 20 | 0.019 |
Jazz | 14 | 0.013 |
From our table we can see that from the four genre we have to pick from, Punk has sold the highest in the united states and has a market share of 12%, followed by Blues, Pop and just slightly behind Hip-Hop. Selecting an Album to have in our store, I think the Punk and Blues album are a must have and Then we can complete those with any of the Pop or Hip-Hop album as there isn't a significant difference in their market share.
The goal here is to find out which employee has performed the best and also to find out the underlying factor behind the employee's performance.
%%sql
WITH invoice_details AS
(SELECT *,
SUM(il.quantity) AS total_quantity_sold
FROM invoice_line AS il
INNER JOIN invoice AS i on il.invoice_id = i.invoice_id
GROUP BY il.invoice_id
),
customer_invoice AS
(SELECT *,
SUM(total) AS customer_total,
SUM(total_quantity_sold) AS total_quantity_purchased
FROM invoice_details AS iv
INNER JOIN customer AS c
ON c.customer_id = iv.customer_id
GROUP BY c.customer_id
)
SELECT e.first_name || " " || e.last_name AS employee_name,
e.title,
e.hire_date,
ROUND(SUM(ci.customer_total), 2) AS total_amount_sold,
SUM(ci.total_quantity_purchased) AS total_quantity_sold
FROM customer_invoice AS ci
INNER JOIN employee AS e
ON e.employee_id = ci.support_rep_id
GROUP BY e.employee_id ;
* sqlite:///chinook.db Done.
employee_name | title | hire_date | total_amount_sold | total_quantity_sold |
---|---|---|---|---|
Jane Peacock | Sales Support Agent | 2017-04-01 00:00:00 | 1731.51 | 1749 |
Margaret Park | Sales Support Agent | 2017-05-03 00:00:00 | 1584.0 | 1600 |
Steve Johnson | Sales Support Agent | 2017-10-17 00:00:00 | 1393.92 | 1408 |
The difference in sales between the different Sales Support Agent seems to correspond with the date they were hired. With the top sales agent being the agent hired the earliest. From our data, we can infer that the major factor behind an employee's sales performance is the time they have been at the company.
We are going to find out the average sales value customers from the different countries as well as the average order value. Note: countries that did not have more than 1 customer were grouped together as Others.
%%sql
WITH country_or_other AS
(SELECT
CASE
WHEN
(SELECT
count(*)
FROM customer
WHERE country = c.country
) = 1
THEN 'Others'
ELSE c.country
END AS country,
c.customer_id,
il.*
FROM customer AS c
INNER JOIN invoice AS i
ON i.customer_id = c.customer_id
INNER JOIN invoice_line AS il
ON il.invoice_id = i.invoice_id
)
SELECT
country,
customers,
total_sales,
average_customer_value,
average_order_value
FROM (SELECT
country,
count(DISTINCT customer_id) AS customers,
ROUND(SUM(unit_price), 2) AS total_sales,
ROUND(SUM(unit_price) / COUNT(DISTINCT customer_id), 2) AS average_customer_value,
ROUND(SUM(unit_price) / COUNT(DISTINCT invoice_id), 2) AS average_order_value,
CASE
WHEN country = 'Others'
THEN 1
ELSE 0
END AS sort
FROM country_or_other
GROUP BY country
ORDER BY sort, total_sales DESC
) ;
* sqlite:///chinook.db Done.
country | customers | total_sales | average_customer_value | average_order_value |
---|---|---|---|---|
USA | 13 | 1040.49 | 80.04 | 7.94 |
Canada | 8 | 535.59 | 66.95 | 7.05 |
Brazil | 5 | 427.68 | 85.54 | 7.01 |
France | 5 | 389.07 | 77.81 | 7.78 |
Germany | 4 | 334.62 | 83.66 | 8.16 |
Czech Republic | 2 | 273.24 | 136.62 | 9.11 |
United Kingdom | 3 | 245.52 | 81.84 | 8.77 |
Portugal | 2 | 185.13 | 92.57 | 6.38 |
India | 2 | 183.15 | 91.58 | 8.72 |
Others | 15 | 1094.94 | 73.0 | 7.45 |
Based on our data, we get the highest average order values in:
These are potential markets to keep an eye on. Since our data is a small sample size, a large ad campaign will be careless and so we should experiment with small ad campaugns in those markets to see if the resulting customers follow the trend from our data.
The management are currently considering changing their purchasing strategy to save money. The strategy they are considering is to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.
We have been asked to find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.
%%sql
WITH invoice_track_info AS
(SELECT
invoice_id,
MAX(track_id) AS track_id
FROM invoice_line
GROUP BY invoice_id
)
SELECT
album_purchase,
COUNT(invoice_id) AS invoices,
ROUND(CAST(COUNT(invoice_id) AS FLOAT) / (SELECT COUNT(*)
FROM invoice), 2) AS percent
FROM (SELECT it.*,
CASE
WHEN (SELECT
t.track_id
FROM track AS t
WHERE album_id = (
SELECT t2.album_id FROM track AS t2
WHERE t2.track_id = it.track_id
)
EXCEPT
SELECT il2.track_id
FROM invoice_line AS il2
WHERE il2.invoice_id = it.invoice_id
) IS NULL
AND
(SELECT il2.track_id
FROM invoice_line AS il2
WHERE il2.invoice_id = it.invoice_id
EXCEPT
SELECT t.track_id
FROM track AS t
WHERE t.album_id = (
SELECT t2.album_id FROM track AS t2
WHERE t2.track_id = it.track_id
)
) IS NULL
THEN 'Yes'
ELSE 'No'
END AS album_purchase
FROM invoice_track_info AS it
)
GROUP BY album_purchase ;
* sqlite:///chinook.db Done.
album_purchase | invoices | percent |
---|---|---|
No | 500 | 0.81 |
Yes | 114 | 0.19 |
19% of all purchases are album purchases. If we were to change strategies, only purchasing the most popular tracks in an album from record label, we will be losing 19% of our income stream. The current strategy is good as it is.
%%sql
SELECT
a.name AS artist_name,
COUNT(pt.playlist_id) AS number_of_playlist
FROM artist a
INNER JOIN album ab
ON ab.artist_id = a.artist_id
INNER JOIN track t
ON ab.album_id = t.album_id
INNER JOIN playlist_track pt
ON pt.track_id = t.track_id
GROUP BY a.name
ORDER BY number_of_playlist DESC
LIMIT 10 ;
* sqlite:///chinook.db Done.
artist_name | number_of_playlist |
---|---|
Iron Maiden | 516 |
U2 | 333 |
Metallica | 296 |
Led Zeppelin | 252 |
Deep Purple | 226 |
Lost | 184 |
Pearl Jam | 177 |
Faith No More | 145 |
Eric Clapton | 145 |
Lenny Kravitz | 143 |
Our table above shows us the 10 most common artists found in playlists of our customers. With the top 3 being Iron Maiden, U2 and Metallica.
Our goal was to answer certain hpothetical questions and so far we have been able to answer them. We have found out the following