The objective for this project is to analyse data from the chinook database, which has information about a fictional digital music shop - kind of like a mini-iTunes store.
The chinook database contains information about the artists, songs, and albums from the music shop, as well as information on the shop's employees, customers, and the customers purchases. This information is contained in eleven tables, click here to visit the github repository.
%%capture
%load_ext sql
%sql sqlite:///chinook.db
Preview data tables in the chinook db file.
%%sql
SELECT
name,
type
FROM sqlite_master
WHERE type IN ("table","view");
* sqlite:///chinook.db Done.
name | type |
---|---|
album | table |
artist | table |
customer | table |
employee | table |
genre | table |
invoice | table |
invoice_line | table |
media_type | table |
playlist | table |
playlist_track | table |
track | table |
The most popular music genre for customers purchasing music from the US is Rock
. The Rock
genre has 53% of sales from customer purchases while the Alternative & Punk
genre has 12% of sales followed closely behind by Metal
genre which has 11.7% of sales. Much further from the Top 3 genres in fourth place is the R&B/Soul
music genre which has 5% of total sales.
Based on global sales of tracks across different genres in the US, Chinook record store should purchase albums by the first three artists:
Red Tone - Punk
: Punk genre is ranked 2nd overall in sales (Recommended)Slim Jim Bites - Blues
: Blues genre is ranked 5th overall (Recommended)Meteor and the Girls - Pop
: Pop genre is ranked joint 7th overall with Latin
music (Recommended)The below artist narrowly missed the recommended shortlist, though hip-hop/rap might be ranked 8th overall in customer purchases from the US, the genre was only 0.19 per-cent off the 7th position.
Regal - Hip-Hop
: Hip Hop/Rap genre is ranked 8th overall%%sql
WITH popular_usa_genres AS
(
SELECT il.*,
c.country
FROM invoice_line il
INNER JOIN invoice i on il.invoice_id = i.invoice_id
INNER JOIN customer c on i.customer_id = c.customer_id
WHERE c.country = "USA"
)
SELECT g.name genre,
COUNT(pug.invoice_line_id) tracks_purchased,
round(CAST(COUNT(pug.invoice_line_id) AS FLOAT) / (
SELECT COUNT(*)
FROM popular_usa_genres
) * 100, 2) percentage_purchased
FROM popular_usa_genres pug
INNER JOIN track t on t.track_id = pug.track_id
INNER JOIN genre g on g.genre_id = t.genre_id
GROUP BY g.genre_id
ORDER BY tracks_purchased desc;
* sqlite:///chinook.db Done.
genre | tracks_purchased | percentage_purchased |
---|---|---|
Rock | 561 | 53.38 |
Alternative & Punk | 130 | 12.37 |
Metal | 124 | 11.8 |
R&B/Soul | 53 | 5.04 |
Blues | 36 | 3.43 |
Alternative | 35 | 3.33 |
Pop | 22 | 2.09 |
Latin | 22 | 2.09 |
Hip Hop/Rap | 20 | 1.9 |
Jazz | 14 | 1.33 |
Easy Listening | 13 | 1.24 |
Reggae | 6 | 0.57 |
Electronica/Dance | 5 | 0.48 |
Classical | 4 | 0.38 |
Heavy Metal | 3 | 0.29 |
Soundtrack | 2 | 0.19 |
TV Shows | 1 | 0.1 |
The below data displays total dollar amount of sales assigned to each Sales Support Agent within the organisation.
The highest performing Sales Agent employee is Jane Peacock with 36.77% of total sales followed closely by Margaret Park with 33.63%. Steve Johnson has the least percentage of sales recorded with 29.6%, roughly 7% off the top performing employee.
Based on the hire date for these employees, the data suggests that the employee performance in sales seems to have some correlation with the dates at which they had joined the company. Jane was the first sales support agent amongst the three to be hired, while Steve's percentage sales amount could be due to being hired at least five months later than the others.
%%sql
WITH rep_sales AS
(
SELECT
c.customer_id,
round(sum(i.total), 2) rep_sales_total,
c.support_rep_id
FROM customer c
INNER JOIN invoice i on i.customer_id = c.customer_id
GROUP BY i.customer_id
)
SELECT
e.first_name || ' ' || e.last_name employee_name,
SUM(rs.rep_sales_total) rep_sales_total,
round(CAST(SUM(rs.rep_sales_total) AS FLOAT) /
(
SELECT SUM(rep_sales_total)
FROM rep_sales
) * 100, 2) percentage_sales,
e.hire_date
FROM rep_sales rs
INNER JOIN employee e on e.employee_id = rs.support_rep_id
WHERE e.title in ('Sales Support Agent')
GROUP BY employee_name;
* sqlite:///chinook.db Done.
employee_name | rep_sales_total | percentage_sales | hire_date |
---|---|---|---|
Jane Peacock | 1731.51 | 36.77 | 2017-04-01 00:00:00 |
Margaret Park | 1584.0000000000002 | 33.63 | 2017-05-03 00:00:00 |
Steve Johnson | 1393.92 | 29.6 | 2017-10-17 00:00:00 |
The below shows sales data, for each country by calculating:
Because there are a number of countries with only one customer, these customers will be grouped in 'Other'.
The US might have the most customer sales overall, but the average order value per customer is only $80
(the second lowest total amount across the countries excluding those with only one customer grouped in 'other'). The data suggests that there might be opportunity for higher sales in Czech Republic and Portugal which have $136.62
and $92.57
average order values per customer, though this data should be used with caution due to the limited sample size being used for the analysis.
%%sql
WITH customer_sales AS
(
SELECT il.*,
CASE
WHEN (
SELECT count(*)
FROM customer
where country = c.country
) = 1 THEN "Other"
ELSE c.country
END AS country,
i.total,
c.customer_id
FROM invoice_line il
INNER JOIN invoice i on il.invoice_id = i.invoice_id
INNER JOIN customer c on i.customer_id = c.customer_id
)
SELECT
country,
count(distinct(customer_id)) total_customers,
round(SUM(unit_price), 2) total_sales,
round(SUM(unit_price) / count(distinct customer_id), 2) avg_sales_per_customer,
round(SUM(unit_price) / count(distinct invoice_id), 2) avg_order_value
FROM customer_sales cs
GROUP BY country
ORDER BY total_sales DESC;
* sqlite:///chinook.db Done.
country | total_customers | total_sales | avg_sales_per_customer | avg_order_value |
---|---|---|---|---|
Other | 15 | 1094.94 | 73.0 | 7.45 |
USA | 13 | 1040.49 | 80.04 | 7.94 |
Canada | 8 | 535.59 | 66.95 | 7.05 |
Brazil | 5 | 427.68 | 85.54 | 7.01 |
France | 5 | 389.07 | 77.81 | 7.78 |
Germany | 4 | 334.62 | 83.66 | 8.16 |
Czech Republic | 2 | 273.24 | 136.62 | 9.11 |
United Kingdom | 3 | 245.52 | 81.84 | 8.77 |
Portugal | 2 | 185.13 | 92.57 | 6.38 |
India | 2 | 183.15 | 91.58 | 8.72 |
The most popular artist used in playlists is Iron Maiden
while U2
and Metallica
come in 2nd and 3rd place overall.
%%sql
SELECT
a.artist_id,
a.name artist_name,
count(a.name) tracks_in_playlist
FROM playlist_track pt
INNER JOIN track t on t.track_id = pt.track_id
INNER JOIN album ab on ab.album_id = t.album_id
INNER JOIN artist a on a.artist_id = ab.artist_id
GROUP BY a.name
ORDER BY tracks_in_playlist DESC
LIMIT 10;
* sqlite:///chinook.db Done.
artist_id | artist_name | tracks_in_playlist |
---|---|---|
90 | Iron Maiden | 516 |
150 | U2 | 333 |
50 | Metallica | 296 |
22 | Led Zeppelin | 252 |
58 | Deep Purple | 226 |
149 | Lost | 184 |
118 | Pearl Jam | 177 |
82 | Faith No More | 145 |
81 | Eric Clapton | 145 |
100 | Lenny Kravitz | 143 |
There are a total of 3503
tracks stored in the track
table, of this, customers have purchased 1806
tracks from the library. There are still 1697
tracks yet to be purchased.
%%sql
WITH total_tracks AS
(
SELECT
count(distinct track_id) total_tracks,
(
SELECT
count(distinct track_id)
FROM invoice_line) purchased
FROM track
)
SELECT tt.*,
(tt.total_tracks - tt.purchased) non_purchased
FROM total_tracks tt;
* sqlite:///chinook.db Done.
total_tracks | purchased | non_purchased |
---|---|---|
3503 | 1806 | 1697 |
Displayed below are some of the tracks which are yet to be purchased from the store.
%%sql
SELECT
t.*
FROM track t
WHERE t.track_id not in (
SELECT
il.track_id
FROM invoice_line il
)
LIMIT 10;
* sqlite:///chinook.db Done.
track_id | name | album_id | media_type_id | genre_id | composer | milliseconds | bytes | unit_price |
---|---|---|---|---|---|---|---|---|
99 | Your Time Has Come | 11 | 1 | 4 | Cornell, Commerford, Morello, Wilk | 255529 | 8273592 | 0.99 |
101 | Be Yourself | 11 | 1 | 4 | Cornell, Commerford, Morello, Wilk | 279484 | 9106160 | 0.99 |
104 | Heaven's Dead | 11 | 1 | 4 | Cornell, Commerford, Morello, Wilk | 276688 | 9006158 | 0.99 |
106 | Man Or Animal | 11 | 1 | 4 | Cornell, Commerford, Morello, Wilk | 233195 | 7542942 | 0.99 |
107 | Yesterday To Tomorrow | 11 | 1 | 4 | Cornell, Commerford, Morello, Wilk | 273763 | 8944205 | 0.99 |
111 | Money | 12 | 1 | 5 | Berry Gordy, Jr./Janie Bradford | 147591 | 2365897 | 0.99 |
112 | Long Tall Sally | 12 | 1 | 5 | Enotris Johnson/Little Richard/Robert "Bumps" Blackwell | 106396 | 1707084 | 0.99 |
113 | Bad Boy | 12 | 1 | 5 | Larry Williams | 116088 | 1862126 | 0.99 |
114 | Twist And Shout | 12 | 1 | 5 | Bert Russell/Phil Medley | 161123 | 2582553 | 0.99 |
115 | Please Mr. Postman | 12 | 1 | 5 | Brian Holland/Freddie Gorman/Georgia Dobbins/Robert Bateman/William Garrett | 137639 | 2206986 | 0.99 |