** Connecting Jupyter Notebook to the Database File **
%%capture
%load_ext sql
%sql sqlite:///chinook.db
'Connected: None@chinook.db'
** Overview of Tables/Views **
%%sql
SELECT
name,
type
FROM sqlite_master
WHERE type IN ("table","view");
Done.
name | type |
---|---|
album | table |
artist | table |
customer | table |
employee | table |
genre | table |
invoice | table |
invoice_line | table |
media_type | table |
playlist | table |
playlist_track | table |
track | table |
%%sql
WITH most_popular_genre AS
(
SELECT
i.billing_country,
g.name genre_name,
g.genre_id,
SUM(il.quantity) total_no_tracks
FROM invoice i
INNER JOIN invoice_line il ON il.invoice_id = i.invoice_id
INNER JOIN track t ON t.track_id = il.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
WHERE billing_country = "USA"
GROUP BY g.genre_id
ORDER BY total_no_tracks DESC
)
SELECT
*,
CAST(total_no_tracks AS float)/(SELECT SUM(total_no_tracks) FROM most_popular_genre) Percentage_of_Total
FROM most_popular_genre mpg
Done.
billing_country | genre_name | genre_id | total_no_tracks | Percentage_of_Total |
---|---|---|---|---|
USA | Rock | 1 | 561 | 0.5337773549000951 |
USA | Alternative & Punk | 4 | 130 | 0.12369172216936251 |
USA | Metal | 3 | 124 | 0.11798287345385347 |
USA | R&B/Soul | 14 | 53 | 0.05042816365366318 |
USA | Blues | 6 | 36 | 0.03425309229305423 |
USA | Alternative | 23 | 35 | 0.03330161750713606 |
USA | Latin | 7 | 22 | 0.02093244529019981 |
USA | Pop | 9 | 22 | 0.02093244529019981 |
USA | Hip Hop/Rap | 17 | 20 | 0.019029495718363463 |
USA | Jazz | 2 | 14 | 0.013320647002854425 |
USA | Easy Listening | 12 | 13 | 0.012369172216936251 |
USA | Reggae | 8 | 6 | 0.005708848715509039 |
USA | Electronica/Dance | 15 | 5 | 0.004757373929590866 |
USA | Classical | 24 | 4 | 0.003805899143672693 |
USA | Heavy Metal | 13 | 3 | 0.0028544243577545195 |
USA | Soundtrack | 10 | 2 | 0.0019029495718363464 |
USA | TV Shows | 19 | 1 | 0.0009514747859181732 |
%%sql
WITH tracks_sold_in_usa AS
(
SELECT
i.billing_country country,
il.track_id,
il.quantity
FROM invoice_line il
INNER JOIN invoice i ON i.invoice_id = il.invoice_id
WHERE billing_country = 'USA'
)
SELECT
g.name genre,
SUM(quantity) tracks_sold,
CAST(SUM(quantity) AS float)/(SELECT SUM(quantity) FROM tracks_sold_in_usa) percentage_sold
FROM tracks_sold_in_usa ts
INNER JOIN track t ON t.track_id = ts.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
GROUP BY g.genre_id
ORDER BY tracks_sold DESC
Done.
genre | tracks_sold | percentage_sold |
---|---|---|
Rock | 561 | 0.5337773549000951 |
Alternative & Punk | 130 | 0.12369172216936251 |
Metal | 124 | 0.11798287345385347 |
R&B/Soul | 53 | 0.05042816365366318 |
Blues | 36 | 0.03425309229305423 |
Alternative | 35 | 0.03330161750713606 |
Latin | 22 | 0.02093244529019981 |
Pop | 22 | 0.02093244529019981 |
Hip Hop/Rap | 20 | 0.019029495718363463 |
Jazz | 14 | 0.013320647002854425 |
Easy Listening | 13 | 0.012369172216936251 |
Reggae | 6 | 0.005708848715509039 |
Electronica/Dance | 5 | 0.004757373929590866 |
Classical | 4 | 0.003805899143672693 |
Heavy Metal | 3 | 0.0028544243577545195 |
Soundtrack | 2 | 0.0019029495718363464 |
TV Shows | 1 | 0.0009514747859181732 |
Based on our research, the Top 10 Genres in USA (based on the number of tracks sold) are as follows:
Out of the four artists, I would recommend the following three artists:
%%sql
WITH customer_total AS
(
SELECT
customer_id,
SUM(total) total_spent
FROM invoice
GROUP BY customer_id
)
SELECT
e.first_name || " " || e.last_name sales_support_agent,
ROUND(SUM(ct.total_spent),2) total_sales,
e.title,
e.reports_to,
e.hire_date,
e.birthdate
FROM customer_total ct
INNER JOIN customer c ON c.customer_id = ct.customer_id
INNER JOIN employee e ON e.employee_id = c.support_rep_id
GROUP BY employee_id
Done.
sales_support_agent | total_sales | title | reports_to | hire_date | birthdate |
---|---|---|---|---|---|
Jane Peacock | 1731.51 | Sales Support Agent | 2 | 2017-04-01 00:00:00 | 1973-08-29 00:00:00 |
Margaret Park | 1584.0 | Sales Support Agent | 2 | 2017-05-03 00:00:00 | 1947-09-19 00:00:00 |
Steve Johnson | 1393.92 | Sales Support Agent | 2 | 2017-10-17 00:00:00 | 1965-03-03 00:00:00 |
From our analysis, we can see that Jane Peacock has the highest sales of 1731.51. This is followed by Margaret Park that made the total sales of 1584 and then Steve Johnson which made 1393.92 sales in total.
Margaret's sales is 147.51 less than Jane's while Steve's sales is 190.08 less than Margaret's.
As Jane has been working for the company the longest among the three, Jane has the most experience as a Sales Support Agent in Chinook. This could be a possible reason why Jane made the most sales. Margaret was hired a month later and then Steve was hired 5 months after Margaret. Again, the result table shows that the total sales made is relative to the order of the employee's hire date.
Another reason is that Jane is also the youngest among the three. She is probably able to relate to the customers better as the customers using Chinook might be of a similar age. Additionally, Jane might also be more knowledgeable since she might be using Chinook herself.
%%sql
WITH
country_or_other AS
(
SELECT
CASE
WHEN COUNT(DISTINCT c.customer_id) = 1 THEN "Other"
ELSE c.country
END
AS country,
COUNT(DISTINCT c.customer_id) total_customers,
SUM(i.total) total_sales,
COUNT(DISTINCT i.invoice_id) total_orders
FROM customer c
INNER JOIN invoice i ON i.customer_id = c.customer_id
GROUP BY c.country
),
total_country_sales_customer AS
(
SELECT
country,
SUM(total_customers) customers,
SUM(total_sales) total_sales,
SUM(total_orders) total_orders
FROM country_or_other
GROUP BY country
)
SELECT
country,
customers,
total_sales,
total_sales/customers avg_sales_per_customer,
total_sales/total_orders avg_order_value
FROM
(
SELECT
*,
CASE
WHEN country = "Other" THEN 1
ELSE 0
END
AS sort
FROM total_country_sales_customer tcsc
)
ORDER BY sort, total_sales DESC
Done.
country | customers | total_sales | avg_sales_per_customer | avg_order_value |
---|---|---|---|---|
USA | 13 | 1040.4899999999998 | 80.0376923076923 | 7.942671755725189 |
Canada | 8 | 535.5900000000001 | 66.94875000000002 | 7.047236842105265 |
Brazil | 5 | 427.68000000000006 | 85.53600000000002 | 7.0111475409836075 |
France | 5 | 389.0699999999999 | 77.81399999999998 | 7.781399999999998 |
Germany | 4 | 334.62 | 83.655 | 8.161463414634147 |
Czech Republic | 2 | 273.24000000000007 | 136.62000000000003 | 9.108000000000002 |
United Kingdom | 3 | 245.52 | 81.84 | 8.768571428571429 |
Portugal | 2 | 185.13000000000002 | 92.56500000000001 | 6.383793103448276 |
India | 2 | 183.14999999999998 | 91.57499999999999 | 8.72142857142857 |
Other | 15 | 1094.9399999999998 | 72.996 | 7.4485714285714275 |
Firstly, note that Czech Republic has an above average sales per customer of 136.62 compared to the rest of the countries whereby their values are lower than 100. Other than that, United Kingdom and India has a higher average order value compared to the rest of the countries (after Czech Republic). Therefore, there might be potential in these countries.
%%sql
WITH
invoices_info AS
(
SELECT
il.invoice_id,
il.invoice_line_id,
il.track_id,
t.album_id
FROM invoice_line il
INNER JOIN track t ON t.track_id = il.track_id
),
invoice_album AS
(
SELECT
invoice_id,
album_id
FROM invoices_info
GROUP BY invoice_id
),
album_or_not AS
(
SELECT
CASE
WHEN
(
(
SELECT t.track_id
FROM track t
WHERE t.album_id = ia.album_id
EXCEPT
SELECT
ii.track_id
FROM invoices_info ii
WHERE ii.invoice_id = ia.invoice_id
) IS NULL
AND
(
SELECT
ii.track_id
FROM invoices_info ii
WHERE ii.invoice_id = ia.invoice_id
EXCEPT
SELECT t.track_id
FROM track t
WHERE t.album_id = ia.album_id
) IS NULL
) THEN "Yes"
ELSE "No"
END
AS album_purchase,
invoice_id
FROM invoice_album ia
)
SELECT
album_purchase,
COUNT(invoice_id) number_of_invoices,
CAST(COUNT(invoice_id) as float)/(SELECT COUNT(*) FROM invoice) percentage
FROM album_or_not
GROUP BY album_purchase
Done.
album_purchase | number_of_invoices | percentage |
---|---|---|
No | 500 | 0.8143322475570033 |
Yes | 114 | 0.18566775244299674 |
The Chinook store should definitely continue to buy full albums from record companies as it accounts for 18% of their revenue. If they stop buying full albums, they would lose one fifth of their revenue.
%%sql
SELECT
a.artist_id,
ar.name artist_name,
SUM(DISTINCT playlist_id) no_of_playlist
FROM track t
INNER JOIN playlist_track pt ON pt.track_id = t.track_id
INNER JOIN album a ON a.album_id = t.album_id
INNER JOIN artist ar ON ar.artist_id = a.artist_id
GROUP BY a.artist_id
ORDER BY no_of_playlist DESC
LIMIT 10
Done.
artist_id | artist_name | no_of_playlist |
---|---|---|
226 | Eugene Ormandy | 68 |
208 | English Concert & Trevor Pinnock | 55 |
214 | Academy of St. Martin in the Fields & Sir Neville Marriner | 55 |
247 | The King's Singers | 53 |
248 | Berliner Philharmoniker & Herbert Von Karajan | 53 |
206 | Alberto Turco & Nova Schola Gregoriana | 41 |
207 | Richard Marlow & The Choir of Trinity College, Cambridge | 41 |
211 | Wilhelm Kempff | 41 |
212 | Yo-Yo Ma | 41 |
213 | Scholars Baroque Ensemble | 41 |
The artist that is used in the most playlist is Eugene Ormandy. He is used in a total of 68 playlists.
%%sql
WITH
tracks_purchased AS
(
SELECT
il.track_id
FROM invoice i
INNER JOIN invoice_line il ON il.invoice_id = i.invoice_id
GROUP BY il.track_id
)
SELECT
COUNT(*) tracks_purchased,
CAST(COUNT(*) as Float)/(SELECT COUNT(track_id) FROM track) percentage_purchased,
(SELECT COUNT(track_id) FROM track) - COUNT(*) not_purchased,
(SELECT COUNT(track_id) FROM track) total_tracks
FROM tracks_purchased
Done.
tracks_purchased | percentage_purchased | not_purchased | total_tracks |
---|---|---|---|
1806 | 0.5155580930630888 | 1697 | 3503 |
It is quite concerning that only 52% of the tracks available at the Chinook Store are purchsed. The Chinook Store might want to reconsider the range of tracks that are currently offered.
%%sql
WITH
tracks_sold AS
(
SELECT
g.name genre,
COUNT(il.quantity) tracks_sold,
CAST(COUNT(il.quantity) as float)/(SELECT COUNT(*) FROM invoice_line) percentage_sold
FROM invoice_line il
INNER JOIN track t ON t.track_id = il.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
GROUP BY genre
ORDER BY percentage_sold DESC
),
tracks_available AS
(
SELECT
g.name genre,
COUNT(track_id) total_tracks,
CAST(COUNT(track_id) as float)/(SELECT COUNT(*) FROM track) percentage
FROM track t
INNER JOIN genre g ON g.genre_id = t.genre_id
GROUP BY g.name
)
SELECT
ta.genre,
ts.tracks_sold,
ts.percentage_sold,
ta.total_tracks tracks_available,
ta.percentage percentage_available
FROM tracks_available ta
LEFT JOIN tracks_sold ts ON ts.genre = ta.genre
ORDER BY percentage_sold DESC;
Done.
genre | tracks_sold | percentage_sold | tracks_available | percentage_available |
---|---|---|---|---|
Rock | 2635 | 0.553920538154299 | 1297 | 0.3702540679417642 |
Metal | 619 | 0.13012402774858103 | 374 | 0.10676562946046246 |
Alternative & Punk | 492 | 0.10342652932520496 | 332 | 0.09477590636597202 |
Latin | 167 | 0.03510615934412445 | 579 | 0.1652868969454753 |
R&B/Soul | 159 | 0.033424427159974776 | 61 | 0.01741364544675992 |
Blues | 124 | 0.026066848854319948 | 81 | 0.02312303739651727 |
Jazz | 121 | 0.025436199285263823 | 130 | 0.03711104767342278 |
Alternative | 117 | 0.024595333193188986 | 40 | 0.011418783899514702 |
Easy Listening | 74 | 0.015556022703384485 | 24 | 0.006851270339708821 |
Pop | 63 | 0.013243640950178683 | 48 | 0.013702540679417643 |
Electronica/Dance | 55 | 0.01156190876602901 | 30 | 0.008564087924636025 |
Classical | 47 | 0.009880176581879336 | 74 | 0.0211247502141022 |
Reggae | 35 | 0.007357578305654824 | 58 | 0.016557236654296318 |
Hip Hop/Rap | 33 | 0.006937145259617406 | 35 | 0.009991435912075365 |
Heavy Metal | 8 | 0.0016817321841496743 | 28 | 0.00799314872966029 |
Soundtrack | 5 | 0.0010510826150935463 | 43 | 0.012275192691978304 |
TV Shows | 2 | 0.00042043304603741857 | 93 | 0.02654867256637168 |
Drama | 1 | 0.00021021652301870928 | 64 | 0.018270054239223524 |
Bossa Nova | None | None | 15 | 0.004282043962318013 |
Comedy | None | None | 17 | 0.004852983157293748 |
Opera | None | None | 1 | 0.00028546959748786756 |
Rock And Roll | None | None | 12 | 0.0034256351698544107 |
Sci Fi & Fantasy | None | None | 26 | 0.007422209534684556 |
Science Fiction | None | None | 13 | 0.003711104767342278 |
World | None | None | 28 | 0.00799314872966029 |
Firstly, we categorised the tracks by their genre to represent the range of tracks that are available. From the results, we can see that the top 4 genres that have the highest percentage of tracks available are also the same 4 genres that have the highest percentages of sales. These genres are Rock, Metal, Alternative & Punk and Latin. While the rest of the range of tracks in store are pretty reflective of their sales popularity, there are a couple of genres that we should take note of to reduce loss to the Chinook Store.
%%sql
SELECT
mt.name media_type,
COUNT(il.quantity) tracks_sold,
CAST(COUNT(il.quantity) as float)/(SELECT COUNT(*) FROM invoice_line) percentage_sold
FROM invoice_line il
INNER JOIN track t ON t.track_id = il.track_id
INNER JOIN media_type mt ON mt.media_type_id = t.media_type_id
GROUP BY mt.media_type_id
ORDER BY percentage_sold DESC
Done.
media_type | tracks_sold | percentage_sold |
---|---|---|
MPEG audio file | 4259 | 0.8953121715366827 |
Protected AAC audio file | 439 | 0.09228505360521337 |
Purchased AAC audio file | 35 | 0.007357578305654824 |
AAC audio file | 21 | 0.004414546983392895 |
Protected MPEG-4 video file | 3 | 0.0006306495690561278 |
Protected vs non-protected media types definitely have an effect on popularity. From our results table, MPEG audio file which is a non-protected media type is the most popular in sales with a percentage of 89.5%.