Using the Chinook database, we will answer a series of business related questions.
%%capture
%load_ext sql
%sql sqlite:///chinook.db
'Connected: None@chinook.db'
%%sql
SELECT
name,
type
FROM sqlite_master
WHERE type IN ("table","view");
Done.
name | type |
---|---|
album | table |
artist | table |
customer | table |
employee | table |
genre | table |
invoice | table |
invoice_line | table |
media_type | table |
playlist | table |
playlist_track | table |
track | table |
%%sql
SELECT *
FROM track
LIMIT 3;
Done.
track_id | name | album_id | media_type_id | genre_id | composer | milliseconds | bytes | unit_price |
---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
2 | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 |
3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman | 230619 | 3990994 | 0.99 |
%%sql
SELECT *
FROM customer
LIMIT 3;
Done.
customer_id | first_name | last_name | company | address | city | state | country | postal_code | phone | fax | support_rep_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Luís | Gonçalves | Embraer - Empresa Brasileira de Aeronáutica S.A. | Av. Brigadeiro Faria Lima, 2170 | São José dos Campos | SP | Brazil | 12227-000 | +55 (12) 3923-5555 | +55 (12) 3923-5566 | luisg@embraer.com.br | 3 |
2 | Leonie | Köhler | None | Theodor-Heuss-Straße 34 | Stuttgart | None | Germany | 70174 | +49 0711 2842222 | None | leonekohler@surfeu.de | 5 |
3 | François | Tremblay | None | 1498 rue Bélanger | Montréal | QC | Canada | H2G 1A7 | +1 (514) 721-4711 | None | ftremblay@gmail.com | 3 |
With Chinook records recently signing a deal with a new record label, they are assigned a task to select the first three albums to be added to the store from a list of four.
These four albums are the following
ARTIST NAME | ** GENRE ** |
---|---|
Regal | Hip-Hop |
Red Tone | Punk |
Meteor and the Girls | Pop |
Slim Jim Bites | Blues |
As the label specializes in artists from the US, they would want to focus their attention on the US market. Therefore, we want to know which of these four artists will be recommended to have their album available for purchase.
THOUGHT PROCESS
We need to find out exclusively US customers. So we need to use the "customer" table that is selected for "USA".
We need a COUNT of the number of tracks sold which can be represented by a count of the number of invoice_line (i.e. use COUNT(invoice_line_id)
Since genre is important, we need to include it into our query.
The path seems to be customer -> invoice -> invoice_line -> track -> genre
I want the main query to return
So this will put out something along the lines of
SELECT g.name AS genre, number_of_tracks, percentages
Likely to use a WITH clause. Possibly in this manner
WITH usa_only AS (SELECT c.*, COUNT(il.invoice_line_id) AS num_track, il.track_id FROM invoice_line AS il INNER JOIN invoice AS i ON i.invoice_id = il.invoice_id INNER JOIN customer AS c ON c.customed_id = i.customer_id WITH c.country = "USA")
Main query will look something like the following:
SELECT g.name AS genre, uo.num_track AS num_of_tracks, CAST(uo.num_track AS FLOAT) / SUM(t.track_id) AS percentage FROM usa_only AS uo INNER JOIN track AS t ON t.track_id = uo.track_id INNER JOIN genre AS g ON g.genre_id = t.genre_id GROUP BY 1 ORDER BY 2
(UPDATE NOTE): The above didn't work. Will need to re-work the WITH Clause as well as the main.
In the WITH Clause, only modify the invoice_line table to include only customers that were from the US.
In the main query, need to readjust to include genre name and rework the number_of_track + percentage columns. However, you will need to add a subquery for a sum total.
%%sql
WITH usa_only AS
(SELECT il.*
FROM invoice_line AS il
INNER JOIN invoice AS i ON i.invoice_id = il.invoice_id
INNER JOIN customer AS c ON c.customer_id = i.customer_id
WHERE c.country = "USA")
SELECT g.name AS genre,
COUNT(uo.invoice_line_id) AS "num_of_tracks",
ROUND(CAST(COUNT(uo.invoice_line_id) AS FLOAT) / (SELECT COUNT(*)
FROM usa_only)
*100, 2) AS "percentage"
FROM usa_only AS uo
INNER JOIN track AS t ON t.track_id = uo.track_id
INNER JOIN genre AS g ON g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC;
Done.
genre | num_of_tracks | percentage |
---|---|---|
Rock | 561 | 53.38 |
Alternative & Punk | 130 | 12.37 |
Metal | 124 | 11.8 |
R&B/Soul | 53 | 5.04 |
Blues | 36 | 3.43 |
Alternative | 35 | 3.33 |
Latin | 22 | 2.09 |
Pop | 22 | 2.09 |
Hip Hop/Rap | 20 | 1.9 |
Jazz | 14 | 1.33 |
Easy Listening | 13 | 1.24 |
Reggae | 6 | 0.57 |
Electronica/Dance | 5 | 0.48 |
Classical | 4 | 0.38 |
Heavy Metal | 3 | 0.29 |
Soundtrack | 2 | 0.19 |
TV Shows | 1 | 0.1 |
** STATEMENT **
In observing the findings of exclusively US customers, we see that the Red Tone, Slim Jim Bites and Meteor & the Girls should be recommended based on how Punk, Pop and Blues music had a greater number of track sales compared to Hip Hop.
Interestingly, the combination of Punk + Pop + Blues only make up 17.89% of the total tracks sold. Seeing as rock music makes up a larger proportion of track sales, it may be a better business decision to focus marketing and musical direction to this genre.
As each customer is assigned a Sales Support Agent to aide in making their 1st purchase, we want to analyze the performance of each of these employees and see what can be done to bolster performance based on our collected information.
%%sql
SELECT *
FROM employee
LIMIT 3;
Done.
employee_id | last_name | first_name | title | reports_to | birthdate | hire_date | address | city | state | country | postal_code | phone | fax | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Adams | Andrew | General Manager | None | 1962-02-18 00:00:00 | 2016-08-14 00:00:00 | 11120 Jasper Ave NW | Edmonton | AB | Canada | T5K 2N1 | +1 (780) 428-9482 | +1 (780) 428-3457 | andrew@chinookcorp.com |
2 | Edwards | Nancy | Sales Manager | 1 | 1958-12-08 00:00:00 | 2016-05-01 00:00:00 | 825 8 Ave SW | Calgary | AB | Canada | T2P 2T3 | +1 (403) 262-3443 | +1 (403) 262-3322 | nancy@chinookcorp.com |
3 | Peacock | Jane | Sales Support Agent | 2 | 1973-08-29 00:00:00 | 2017-04-01 00:00:00 | 1111 6 Ave SW | Calgary | AB | Canada | T2P 5M5 | +1 (403) 262-3443 | +1 (403) 262-6712 | jane@chinookcorp.com |
THOUGHT PROCESS
Looking at the above, we want the title to only show "Sales Support Agent"
We may also want to include hire_date as that seems to be the only information that could be of relevance to us.
We want the name of the employee to be listed as the following:
first_name || " " || last_name AS "agent_name",
Additionally, we would also like to include the hire date as well as the total dollar amount of sales assigned to each agent.
| agent_name | hire_date | Total_sales |
We will need the following tables: Employee + customer + invoice as the invoice table contains the "total" column.
Looking at the process, we need to use the WITH clause to isolate only title = "Sales Support Agent", as well as left join these employees to customers. This will also need to include the 'customer_id" as this is the linker to the invoice table.
WITH sales_agent_only AS
(SELECT
e.first_name || " " || e.last_name AS 'name',
e.hire_date,
c.customer_id
FROM
employee AS e
LEFT JOIN
customer AS c ON c.support_rep_id = e.employee_id
WHERE
e.title = "Sales Support Agent")
In developing a main query, we will need to find out the sum dollar amount of sales which will need to be included as SUM(total) AS "total_sales" and also need to join with subquery in clause.
SELECT sao.name AS "sales_agent", sao.hire_date, SUM(i.total) AS "total_sales" FROM sales_agent_only AS sao INNER JOIN invoice AS i ON i.customer_id = sao.customer_id GROUP BY 1 ORDER BY 3 DESC;
%%sql
WITH sales_agent_only AS
(SELECT
e.first_name || " " || e.last_name AS 'name',
e.hire_date,
c.customer_id
FROM
employee AS e
LEFT JOIN
customer AS c ON c.support_rep_id = e.employee_id
WHERE
e.title = "Sales Support Agent")
SELECT
sao.name AS "sales_agent",
sao.hire_date,
ROUND(SUM(i.total), 2) AS "total_sales"
FROM
sales_agent_only AS sao
INNER JOIN
invoice AS i ON i.customer_id = sao.customer_id
GROUP BY 1
ORDER BY 3 DESC;
Done.
sales_agent | hire_date | total_sales |
---|---|---|
Jane Peacock | 2017-04-01 00:00:00 | 1731.51 |
Margaret Park | 2017-05-03 00:00:00 | 1584.0 |
Steve Johnson | 2017-10-17 00:00:00 | 1393.92 |
** STATEMENT **
Looking at the findings we see that Jane Peacock has the most 1st-time customer sales out of the three Sales Support Agent with 1731.51 dollars followed by Margaret Park and Steve Johnson. Interestingly enough there seemed to also be a correlation between hiring date and total sales as the agent with the longest tenure seem to also have the highest total sales amount.
In this task, we want to analyze the sales data for customers from each country. Specifically we are interested in calculating data for each country on
Keep in mind that some countries have only one customer and thus should be grouped into other countries with only one customer as well into a group called "other".
THOUGHT PROCESS
From the customer table, we would need to use "country" to group customers accordingly based on country. Furthermore, we need to include a logic that would state that countries with only 1 customer = "Other" otherwise it will be listed as from that country. This likely should be done through a subquery.
We will likely need the following three tables: customer + invoice + invoice_line. However, considering that we are looking at per unit amounts, the invoice table will not likely be needed directly seeing as the "total" column would refer to the total cost listed on an individual invoice.
We will need to inner join these three tables together. Likely through a subquery.
Working through this, we will first need to use a WITH clause to make a subquery that will link up the three tables together + define the name of the country as either country or "other"
WITH country_or_other AS
(SELECT
CASE
WHEN ???
ELSE ???
END as 'country',
c.customer_id,
il.*
FROM
invoice_line AS il
INNER JOIN
invoice AS i ON i.invoice_id = il.invoice_id
INNER JOIN
customer AS c ON c.customer_id = i.customer_id)
Looking at the example provided, the conditions will need to make it so that WHEN number of customer (i.e. COUNT(customer_id)) equals ONE, then country = "Other", ELSE country = country
CASE
WHEN (SELECT COUNT(*)
FROM customer
WHERE country = c.country) = 1 THEN "Other"
ELSE c.country
END AS "country"
Together, the WITH Clause looks like the following:
WITH country_or_other AS
(SELECT
CASE
WHEN (SELECT COUNT(*)
FROM customer
WHERE country = c.country) = 1 THEN "Other"
ELSE c.country
END AS "country",
c.customer_id,
il.*
FROM
invoice_line AS il
INNER JOIN
invoice AS i ON i.invoice_id = il.invoice_id
INNER JOIN
customer AS c ON c.customer_id = i.customer_id)
Looking to create a main subquery, we need the following columns
- name of country (likely refer to subquery)
- number of customer per country
- total sales amount per country
- avg total sale amount per customer
- avg total sale amount per invoice
So it should look like the following:
SELECT country,
num_of_customer(s),
total_sales,
avg_sale_per_customer,
avg_sale_per_invoice
FROM ???
GROUP BY 1
ORDER BY 1 ASC, 2 DESC
In order to calculate total sales per given customer, we will need a summarized table that contains (a) name of country, (b) sales info. Using our WITH clause subquery, we can formulate another subquery:
(SELECT country,
COUNT(customer_id),
SUM(unit_price) AS 'total_sales',
CAST(SUM(unit_price) AS FLOAT) / COUNT(customer_id) AS "avg_sale_per_customer",
CAST(SUM(unit_price) AS FLOAT) / COUNT(invoice_id) AS "avg_sale_per_invoice"
FROM country_or_other)
Currently with the above set-up, we will end up getting just a single value due to GROUP BY refers to the main query's SELECT and not of the subquery. Thus need to modify it by inclusing the ORDER BY and GROUP BY options in the subquery.
(SELECT country,
COUNT(customer_id),
SUM(unit_price) AS 'total_sales',
CAST(SUM(unit_price) AS FLOAT) / COUNT(customer_id) AS "avg_sale_per_customer",
CAST(SUM(unit_price) AS FLOAT) / COUNT(invoice_id) AS "avg_sale_per_invoice"
FROM country_or_other
GROUP BY 1
ORDER BY 2 DESC)
In running this set up, we run into a few problems. Firstly, we end up getting hundreds-to-thousands of customers. Furthermore in the last two columns gives us the average unit sale of each item as opposed to per customer or per invoice. Knowing that the "Other" column should have been made up of only 1 customer from the various countries (which is a total of 195), it is likely a result of the same customer within these countries being counted multiple times. Thus we need a way to only include entries for a customer as one entry for that country. This is solved by using the DISTINCT statement.
(SELECT country,
COUNT(DISTINCT customer_id),
ROUND(SUM(unit_price), 2) AS 'total_sales',
ROUND(CAST(SUM(unit_price) AS FLOAT) / COUNT(DISTINCT customer_id), 2) AS "avg_sale_per_customer",
ROUND(CAST(SUM(unit_price) AS FLOAT) / COUNT(DISTINCT invoice_id), 2) AS "avg_sale_per_invoice"
FROM country_or_other
GROUP BY 1
ORDER BY 2 DESC)
Lastly, we need a way to make it so that the order of the COUNTRY should include the "Other" group at the bottom of the list despite is lexiographic order. Using the example provided, it looks like we need to include a logic of sorts to state then WHEN the name of the country is "Other" we assign it a value of 1 whilst the rest is assigned a value of 0 and assign this as a column of sorts. Then use this column as part of the ORDER BY statement.
(SELECT country,
COUNT(DISTINCT customer_id),
ROUND(SUM(unit_price), 2) AS 'total_sales',
ROUND(CAST(SUM(unit_price) AS FLOAT) / COUNT(DISTINCT customer_id), 2) AS "avg_sale_per_customer",
ROUND(CAST(SUM(unit_price) AS FLOAT) / COUNT(DISTINCT invoice_id), 2) AS "avg_sale_per_invoice",
CASE
WHEN country = "Other" THEN 1
ELSE 0
END AS 'sort'
FROM country_or_other
GROUP BY 1
ORDER BY 6 ASC, 2 DESC)
%%sql
WITH country_or_other AS
(SELECT
CASE
WHEN (SELECT COUNT(*)
FROM customer
WHERE country = c.country) = 1 THEN "Other"
ELSE c.country
END AS "country",
c.customer_id,
il.*
FROM
invoice_line AS il
INNER JOIN
invoice AS i ON i.invoice_id = il.invoice_id
INNER JOIN
customer AS c ON c.customer_id = i.customer_id)
SELECT country,
num_customer,
total_sales,
avg_sale_per_customer,
avg_sale_per_invoice
FROM (SELECT
country,
COUNT(DISTINCT customer_id) AS 'num_customer',
ROUND(SUM(unit_price),2) AS 'total_sales',
ROUND(CAST(SUM(unit_price) AS FLOAT) / COUNT(DISTINCT customer_id), 2) AS "avg_sale_per_customer",
ROUND(CAST(SUM(unit_price) AS FLOAT) / COUNT(DISTINCT invoice_id), 2) AS "avg_sale_per_invoice",
CASE
WHEN country = "Other" THEN 1
ELSE 0
END AS 'sort'
FROM
country_or_other
GROUP BY 1
ORDER BY 6 ASC, 2 DESC)
Done.
country | num_customer | total_sales | avg_sale_per_customer | avg_sale_per_invoice |
---|---|---|---|---|
USA | 13 | 1040.49 | 80.04 | 7.94 |
Canada | 8 | 535.59 | 66.95 | 7.05 |
Brazil | 5 | 427.68 | 85.54 | 7.01 |
France | 5 | 389.07 | 77.81 | 7.78 |
Germany | 4 | 334.62 | 83.66 | 8.16 |
United Kingdom | 3 | 245.52 | 81.84 | 8.77 |
Czech Republic | 2 | 273.24 | 136.62 | 9.11 |
India | 2 | 183.15 | 91.58 | 8.72 |
Portugal | 2 | 185.13 | 92.57 | 6.38 |
Other | 15 | 1094.94 | 73.0 | 7.45 |
** STATEMENT **
Looking at the data, we see that the countries with the greatest lifetime sales total per customer are:
1) Czech Republic
2) Portugal
3) India
However, as it relates to sales total per given order, this would be:
1) Czech Republic
2) United Kingdom
3) India
These should be viewed as potential markets to initiate marketing campaigns for new releases from artists. Whilst informative, there are a few things to keep in mind.
Thus a small campaign could be worthwhile in order to see if these trends hold.
Looking at the store setup, customers may make a purchase either as purchasing a whole album or through a collection of 1+ individual track. However the store doesn't let customer purchase purchase a whole album and then add individual tracks to that same invoice order. Instead, these customers are charged the same price as if they had purchased each of those tracks separately.
FOR EXAMPLE: Say that an album (like has 12 tracks each) costs 8.00 dollars and a track is each 1.00 dollar. So if someone were to buy an album it's 8.00 dollars while 12 individual tracks would be 12.00 dollars. However, if they bought the album + 3 tracks, instead of it being 11.00 dollars, it will instead be 15.00 dollars.
So managament wants to consider changing their purchasing strategy to make it more money-friendly by considering to purchase only the most popular tracks from each album instead of every track. So we want to find out what percentage of purchases are individual tracks and which are whole albums.
Some things to consider is that
1) Albums that have only 1 or 2 tracks are likely to be purchased by customers as part of a collection of individual tracks
2) Customers may decide to manually select every track from an album and then add a few individual tracks from other albums to their purchase
However, we can safely ignore these cases.
So ultimately we want to
identify whether each invoice has all the tracks from an album by getting the list of tracks from an invoice and compare it to the list of tracks from an album
find the album to compare the purchase to by looking up the album that one of the purchased tracks belong to
Ultimately we want a query that catagorizes each invoice as either an album purchase or not and then calculate the following:
1) number of invoices
2) percentage of invoices
** THOUGHT PROCESS **
1) I want to include a query that will produce the following table
album_purchase | Number of invoices | Percentage of invoices |
---|---|---|
Yes | ???? | ???? |
No | ???? | ???? ? |
2) Considering that we need track information + invoice information, it is likely that the tables that we need will be track + invoice_line + invoice
3) Broadly looking at a list of track titles with album name and artist, we can see that
a) some of these tracks made from the same artist are found in various albums (likely from a mix album or live recording)
EX. the song "2 Minutes to Midnight" by Iron Maiden is found on 6 different albums
EX. the song "We are the Champions" by Queen is found on the original + greatest hits album
b) some of the tracks may have the same name but are made by different artists (possibly as cover or due to generic naming) EX. the song "Believe" has been recorded by Spyro Gyra, Lenny Kravitz and Smashing Pumpkins EX. the song "Don't Look Back" has been recorded by Lenny Kravitz and Heroes
4) Looking at the COUNT of track_id (i.e. the individual identifier for each track), there are 3503 individual tracks
5) Looking at the COUNT of album_id (i.e. the individual identifier for each album), there are 347 different albums where the 3503 individual tracks stem from
6) Looking at the COUNT of invoice_id (i.e. the individual invoices sent for each customer purchase), there are 4757 purchases made.
7) Ultimately to determine if an individual purchased either an album or 1+ tracks, the album_id must be the same for each of the tracks purchased
EX. If I bought the AC/DC album "Let There Be Rock" only, which has 8 tracks, then on my invoice I should have each of the track display a single album_id
SAMPLE INVOICE LINE
(TRACK NAME) (ALBUM NAME) (ARTIST) (t.ID) (a.ID)
Go Down Let There Be Rock AC/DC 15 4
Dog Eat Dog Let There Be Rock AC/DC 16 4
Let There Be Rock Let There Be Rock AC/DC 17 4
Bad Boy Boogie Let There Be Rock AC/DC 18 4
Problem Child Let There Be Rock AC/DC 19 4
Overdose Let There Be Rock AC/DC 20 4
Hell Ain't A Bad Place To Be Let There Be Rock AC/DC 21 4
Whole Lotta Rosie Let There Be Rock AC/DC 22 4
So potentially we can use this as part of a CASE STATEMENT
CASE
WHEN COUNT(DISTINCT album_id) = 1 THEN 'album'
ELSE 'track'
END AS "album_or_track"
8) Potential problem with the above statement is that an individual may just have purchased a large number of tracks from a given album but not necessarily the album itself because there may be one trash song in the album that's not worthwhile to own
EX. Say I wanted to purchase a particular John Lennon album (12 tracks long) post-Beetles breakup.However, I know that 3 tracks were a Yoko Ono-inspired experiment (i.e. trash), so I chose not to buy it outright. By excluding these 3 tracks, this purchase would not be considered an album purchase since it wasn't the whole album.
As such, we would need to make an amendment to the above CASE statement to ensure that the number of tracks purchased for an order matches the total number of tracks for a given album
COUNT(DISTINCT track_id) = (SELECT album_id, COUNT(*) FROM track GROUP BY album_id)
9) To determine the percentage that either an album-only or tracks-only purchase makes up of total sales, I would need the total number of invoices available using the following query:
SELECT COUNT(invoice_id)
FROM invoice_line;
Keeping in mind that I need this as a float, I will need to make the following adjustments:
CAST(SELECT COUNT(invoice_id) FROM invoice_line AS FLOAT)
10) So far, the main query will look like the following:
SELECT
purchase_type,
??? AS 'num_invoices',
ROUND(CAST(num_invoices AS FLOAT) / CAST(SELECT COUNT(invoice_id) FROM invoice_line AS FLOAT), 2) AS "percentage"
FROM
subquery with WITH CLAUSE
GROUP BY
1
11) Ultimately, I would need to make 1 or more subqueries that can ultimately be referred to prior to making the major query. In either case, the plan is to be able to JOIN the necessary tables into one so that this doesn't have to be done in the major query.
Make 1 subquery
WITH purchase_category AS
(
SELECT ???
???
???
FROM ???
INNER JOIN ???
[INNER JOIN] ???
GROUP BY ???
)
Knowingly, I need (a) track table and (b) invoice_line table. However, seeing as I don't need specific info pertaining to an actual album nor the actual invoice itself. We'll focus on these two tables.
WITH purchase_category AS
(
SELECT
il.invoice_id,
CASE
WHEN COUNT(DISTINCT t.album_id) = 1
AND
COUNT(DISTINCT t.track_id) = (SELECT COUNT(*)
FROM track
GROUP BY album_id) THEN 'album'
ELSE 'track'
END AS 'album_or_track'
FROM
track AS t
INNER JOIN
invoice_line AS il ON il.track_id = t.track_id
GROUP BY 1)
12) Making the main query, we want to reference the with clause subquery
SELECT
album_or_track,
COUNT(*) AS 'num_invoices',
ROUND(CAST(COUNT(*) AS FLOAT) / (SELECT COUNT(*)
FROM invoice), 2) AS 'percentage'
FROM
purchase_category
GROUP BY
1
Combined, it will be
WITH purchase_category AS
(
SELECT
il.invoice_id,
CASE
WHEN COUNT(DISTINCT t.album_id) = 1
AND
COUNT(DISTINCT t.track_id) = (SELECT COUNT(*)
FROM track
GROUP BY album_id) THEN 'album'
ELSE 'track'
END AS 'album_or_track'
FROM
track AS t
INNER JOIN
invoice_line AS il ON il.track_id = t.track_id
GROUP BY 1)
SELECT
album_or_track,
COUNT(*) AS 'num_invoices',
ROUND(CAST(COUNT(*) AS FLOAT) / (SELECT COUNT(*)
FROM invoice), 2) AS 'percentage'
FROM
purchase_category
GROUP BY
1
13) UPDATE: Running this, an error had came up when comparing the corrected answer with our findings where a much larger proportion of the purchases are tracks as opposed to albums. Looking into the issue, it appears as though the logic set up in determining whether an invoice represent an album or multiple tracks is incorrect.
Ultimately with the subquery, I had failed to note the total number of tracks for a given album. This needs to be done through the use of an INNER JOIN as WHERE is not a viable option.
Thus, we create an additional table to join
INNER JOIN (SELECT album_id,
COUNT(*) as 'total_num_tracks_in_album'
FROM album
GROUP BY album_id) as c ON c.album_id = t.album_id
Thus creating a finalized query of:
WITH purchase_category AS
(
SELECT
il.invoice_id,
CASE
WHEN COUNT(DISTINCT t.album_id) = 1
AND
COUNT(DISTINCT t.track_id) = c.total_num_tracks_in_album THEN 'album'
ELSE 'track'
END AS 'album_or_track'
FROM
track AS t
INNER JOIN
invoice_line AS il ON il.track_id = t.track_id
INNER JOIN (SELECT album_id,
COUNT(*) as 'total_num_tracks_in_album'
FROM track
GROUP BY album_id) as c ON c.album_id = t.album_id
GROUP BY 1)
SELECT
album_or_track,
COUNT(*) AS 'num_invoices',
ROUND(CAST(COUNT(*) AS FLOAT) / (SELECT COUNT(*)
FROM invoice), 2) AS 'percentage'
FROM
purchase_category
GROUP BY
1
%%sql
WITH purchase_category AS
(
SELECT
il.invoice_id,
CASE
WHEN COUNT(DISTINCT t.album_id) = 1
AND
COUNT(DISTINCT t.track_id) = c.total_num_tracks_in_album THEN 'album'
ELSE 'track'
END AS 'album_or_track'
FROM
track AS t
INNER JOIN
invoice_line AS il ON il.track_id = t.track_id
INNER JOIN (SELECT album_id,
COUNT(*) as 'total_num_tracks_in_album'
FROM track
GROUP BY album_id) as c ON c.album_id = t.album_id
GROUP BY 1)
SELECT
album_or_track,
COUNT(*) AS 'num_invoices',
ROUND(CAST(COUNT(*) AS FLOAT) / (SELECT COUNT(*) FROM invoice), 2) AS 'percentage'
FROM
purchase_category
GROUP BY
1
Done.
album_or_track | num_invoices | percentage |
---|---|---|
album | 114 | 0.19 |
track | 500 | 0.81 |
STATEMENT
Looking at our findings, we see that a larger proportion (81%) of our sale orders stem from the purchases of select tracks compared to purchasing full albums (19%). Nevertheless, the purchase of only the most popular tracks may not be a wise decision as album sales still make up 1/5 of our revenue and it has yet to be proven if selecting only the most popular tracks would save enough expenditure to cover this lost in revenue. Further analysis may be warranted to find this answer.