DH-401: Digital Musicology semester project¶

Predicting music popularity using DNNs - Milestone 2¶

Dataset¶

We decided to use FMA: A Dataset for Music Analysis for our project. The dataset is publicly available on Github and the files are stored on UNIL/SWITCH server.

General overview¶

The FMA dataset consists of "917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies." (Defferrard, M., Benzi, K., Vandergheynst, P., & Bresson, X. (2017). FMA: A Dataset for Music Analysis. ArXiv, abs/1612.01840.).

To the best of our knowledge, it is the biggest, publicly available dataset of music, which consist of both raw music files (MP3) and metadata, and which can be analysed legally (on Creative Commons license)

This fact makes it good and easy source for music analysis, however this easiness comes at price. Considering our analysis is scoped on popularity of music, we might expect the top popularity music not to be part of this dataset. The music business is very vivid and profitable, and therefore publishing work on Creative Commons license is an uncommon practice, generally applicable to low popularity bands.

On the other hand, the fact of using only the Creative Commons music and skipping the top music hits, can make our project less prone to biases - since the promotion of the pop musicians is the large scale, nuanced, sociotechnical project, their popularity might reflect the actual musical piece content only to some very small extent. People tend to listen to pop song, because they hear it everywhere around. But when they listen to the niche music, they must have better reasons. We will try to find, and describe the mechanism behind this popularity.

We accept the fact that our dataset is only a small sample of the whole music world, and cannot represent the global trends well. Probably, given the amount of music recordings in the human history, the complete dataset won't ever exist. Presumably only the streaming platforms such as spotify might have the music library which is nearly complete.

Raw music files¶

There are three different shapes of the raw music dataset one can download.

fma_small: 8,000 tracks of 30s, 8 balanced genres (7.2 GiB)

This small subset is very useful, since it provides us with only 8, well balanced genres of the music. This will allow us to perform follow up, per-genre popularity analysis of the music. Since we assume that in full, cross-genre analysis, we might see the very general popularity trends, but lose the nuanced rules, that we can find when we analyse only the music of the same type. Also, these tracks were trimed to 30 seconds and were selected as the ones with the most complete metadata, which makes them the best option for our exploratory analysis

fma_medium: 25,000 tracks of 30s, 16 unbalanced genres (22 GiB)

The medium dataset consists only of the music with well-defined, top genres and the most complete metadata. Since we assumed this sample is too small for pre-training, but uses unbalanced genres, and thus it cannot be easily used for per-genre popularity analysis. Therefore, we decided not to use this size

fma_large: 106,574 tracks of 30s, 161 unbalanced genres (93 GiB)

This dataset contains every music piece collected by the autors, trimmed to 30 seconds. This setup makes it a perfect choice for the unsupervised pre-training, which we will perform on the raw music data chunks using wav2vec2.0

fma_full: 106,574 untrimmed tracks, 161 unbalanced genres (879 GiB)

full, raw data, acquired by the authors. Due to data volume, and amount of computational power required to process samples longer than 30s, we decided not to use this dataset.

The dataset is a zip archive, containing a flat structure of music files denoted by their IDs. All of the music is encoded using MP3. Most of them is using sampling rate of 44,100 Hz, bit rate 320 kbit/s (263kbit/s on average), and in recorded in stereo.

DISCLAIMER: The dataset authors have done very good job performing the exploratory data analysis on their own, so we don't get into much details for many basic metadata exploration topics, since that would require us to just copy their work. The original analysis is available in usage.ipynb and analysis.ipynb

In [ ]:

from zipfile import ZipFile
from tqdm import tqdm

from IPython import display
import librosa

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import ast

import statsmodels.api as sm
import statsmodels.formula.api as smf
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

import numpy as np

from scipy.stats import chisquare

plt.rcParams['figure.figsize'] = (17, 5)

/usr/local/lib/python3.7/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm

In [ ]:

# Download fma_small
!wget https://os.unil.cloud.switch.ch/fma/fma_small.zip

--2021-04-25 08:00:30--  https://os.unil.cloud.switch.ch/fma/fma_small.zip
Resolving os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)... 86.119.28.16, 2001:620:5ca1:201::214
Connecting to os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)|86.119.28.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7679594875 (7.2G) [application/zip]
Saving to: ‘fma_small.zip’

fma_small.zip       100%[===================>]   7.15G  27.3MB/s    in 4m 40s  

2021-04-25 08:05:11 (26.1 MB/s) - ‘fma_small.zip’ saved [7679594875/7679594875]

In [ ]:

# Extract the first 50 mp3 files
with ZipFile('fma_small.zip', 'r') as zip_file:
    for file in tqdm(iterable=zip_file.namelist()[:50], total=50):
        zip_file.extract(member=file)

100%|██████████| 50/50 [00:05<00:00,  9.14it/s]

In [ ]:

# Check the parameters and play a random raw audio sample
x, sr = librosa.load("fma_small/000/000002.mp3", sr=None, mono=True)
print(f'Duration: {x.shape[-1] / sr}s, {x.size} samples, rate: {sr}')

display.Audio(data=x, rate=sr)

/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")

Duration: 29.976575963718822s, 1321967 samples, rate: 44100

Out[ ]:

Raw music files for Wav2vec 2.0, and first deep learning efforts¶

The training efforts, which we currently work on, require us to perform further data processing:

The pre-training is currently only available using the latest facebook FAIRSEQ library. The pre-training pipeline expects the 30s clips of 16k sample rate, in either WAV or FLAC format. Therefore, we used sox software to convert the samples to FLAC, while downsampling them from 44k to 16k. Currently, it's being tested on small sample, and the final training will be performed on the large dataset. Google Colab Tesla V100 is used for training.
Then, the FAIRSEQ model checkpoint is converted to Huggingface Transformers format, using the conversion script.
The linear layer with one output is added on top of the embedding output of the raw model. This architecture will be fine-tuned for popularity prediction. The question of good popularity metric will be explored in the following parts of our analysis.

For clarity of our exploratory analysis, we keep our wav2vec notebook separately. You are welcome to see the current efforts on our Google Colab

Metadata files¶

Following the description from the original dataset page, metadata consists of following files:

tracks.csv: per track metadata such as ID, title, artist, genres, tags and play counts, for all 106,574 tracks.
genres.csv: all 163 genres with name and parent (used to infer the genre hierarchy and top-level genres).
features.csv: common features extracted with librosa.
echonest.csv: audio features provided by Echonest (now Spotify) for a subset of 13,129 tracks.

We will explore this files in the following sections

In [ ]:

!wget https://os.unil.cloud.switch.ch/fma/fma_metadata.zip

# Extract the metadata
with ZipFile('fma_metadata.zip', 'r') as zip_file:
    for file in zip_file.namelist():
        zip_file.extract(member=file)

--2021-04-25 08:06:10--  https://os.unil.cloud.switch.ch/fma/fma_metadata.zip
Resolving os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)... 86.119.28.16, 2001:620:5ca1:201::214
Connecting to os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)|86.119.28.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 358412441 (342M) [application/zip]
Saving to: ‘fma_metadata.zip’

fma_metadata.zip    100%[===================>] 341.81M  27.2MB/s    in 15s     

2021-04-25 08:06:25 (23.2 MB/s) - ‘fma_metadata.zip’ saved [358412441/358412441]

Tracks¶

This is the main file that contains all the contextual metadata which are normally provided to the human listener, such as artist, title, publisher and publication date. It also provides additional dataset-specific information, such as suggested train/test split when using as ML applications benchmark, and the smallest subset (small/medium/large) of the dataset, to which the sample belongs.

The format of the tracks CSV is very unusual. It contains multi-level header and consists of multiple different types of data. Fortunately, the dataset creators provided the official loader script which we use below to load the data

In [ ]:

# Load metadata and features.
# Function based on: https://github.com/mdeff/fma/blob/master/utils.py
tracks = pd.read_csv('fma_metadata/tracks.csv', index_col=0, header=[0, 1])

COLUMNS = [('track', 'tags'), ('album', 'tags'), ('artist', 'tags'),
            ('track', 'genres'), ('track', 'genres_all')]
for column in COLUMNS:
    tracks[column] = tracks[column].map(ast.literal_eval)

COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),
            ('album', 'date_created'), ('album', 'date_released'),
            ('artist', 'date_created'), ('artist', 'active_year_begin'),
            ('artist', 'active_year_end')]
for column in COLUMNS:
    tracks[column] = pd.to_datetime(tracks[column])

SUBSETS = ('small', 'medium', 'large')
try:
    tracks['set', 'subset'] = tracks['set', 'subset'].astype(
            'category', categories=SUBSETS, ordered=True)
except (ValueError, TypeError):
    # the categories and ordered arguments were removed in pandas 0.25
    tracks['set', 'subset'] = tracks['set', 'subset'].astype(
              pd.CategoricalDtype(categories=SUBSETS, ordered=True))

COLUMNS = [('track', 'genre_top'), ('track', 'license'),
            ('album', 'type'), ('album', 'information'),
            ('artist', 'bio')]
for column in COLUMNS:
    tracks[column] = tracks[column].astype('category')


tracks

Out[ ]:

	album													artist																	set		track
	comments	date_created	date_released	engineer	favorites	id	information	listens	producer	tags	title	tracks	type	active_year_begin	active_year_end	associated_labels	bio	comments	date_created	favorites	id	latitude	location	longitude	members	name	related_projects	tags	website	wikipedia_page	split	subset	bit_rate	comments	composer	date_created	date_recorded	duration	favorites	genre_top	genres	genres_all	information	interest	language_code	license	listens	lyricist	number	publisher	tags	title
track_id
2	0	2008-11-26 01:44:45	2009-01-05	NaN	4	1	<p></p>	6073	NaN	[]	AWOL - A Way Of Life	7	Album	2006-01-01	NaT	NaN	<p>A Way Of Life, A Collective of Hip-Hop from...	0	2008-11-26 01:42:32	9	1	40.058324	New Jersey	-74.405661	Sajje Morocco,Brownbum,ZawidaGod,Custodian of ...	AWOL	The list of past projects is 2 long but every1...	[awol]	http://www.AzillionRecords.blogspot.com	NaN	training	small	256000	0	NaN	2008-11-26 01:48:12	2008-11-26	168	2	Hip-Hop	[21]	[21]	NaN	4656	en	Attribution-NonCommercial-ShareAlike 3.0 Inter...	1293	NaN	3	NaN	[]	Food
3	0	2008-11-26 01:44:45	2009-01-05	NaN	4	1	<p></p>	6073	NaN	[]	AWOL - A Way Of Life	7	Album	2006-01-01	NaT	NaN	<p>A Way Of Life, A Collective of Hip-Hop from...	0	2008-11-26 01:42:32	9	1	40.058324	New Jersey	-74.405661	Sajje Morocco,Brownbum,ZawidaGod,Custodian of ...	AWOL	The list of past projects is 2 long but every1...	[awol]	http://www.AzillionRecords.blogspot.com	NaN	training	medium	256000	0	NaN	2008-11-26 01:48:14	2008-11-26	237	1	Hip-Hop	[21]	[21]	NaN	1470	en	Attribution-NonCommercial-ShareAlike 3.0 Inter...	514	NaN	4	NaN	[]	Electric Ave
5	0	2008-11-26 01:44:45	2009-01-05	NaN	4	1	<p></p>	6073	NaN	[]	AWOL - A Way Of Life	7	Album	2006-01-01	NaT	NaN	<p>A Way Of Life, A Collective of Hip-Hop from...	0	2008-11-26 01:42:32	9	1	40.058324	New Jersey	-74.405661	Sajje Morocco,Brownbum,ZawidaGod,Custodian of ...	AWOL	The list of past projects is 2 long but every1...	[awol]	http://www.AzillionRecords.blogspot.com	NaN	training	small	256000	0	NaN	2008-11-26 01:48:20	2008-11-26	206	6	Hip-Hop	[21]	[21]	NaN	1933	en	Attribution-NonCommercial-ShareAlike 3.0 Inter...	1151	NaN	6	NaN	[]	This World
10	0	2008-11-26 01:45:08	2008-02-06	NaN	4	6	NaN	47632	NaN	[]	Constant Hitmaker	2	Album	NaT	NaT	Mexican Summer, Richie Records, Woodsist, Skul...	<p><span style="font-family:Verdana, Geneva, A...	3	2008-11-26 01:42:55	74	6	NaN	NaN	NaN	Kurt Vile, the Violators	Kurt Vile	NaN	[philly, kurt vile]	http://kurtvile.com	NaN	training	small	192000	0	Kurt Vile	2008-11-25 17:49:06	2008-11-26	161	178	Pop	[10]	[10]	NaN	54881	en	Attribution-NonCommercial-NoDerivatives (aka M...	50135	NaN	1	NaN	[]	Freeway
20	0	2008-11-26 01:45:05	2009-01-06	NaN	2	4	<p> "spiritual songs" from Nicky Cook</p>	2710	NaN	[]	Niris	13	Album	1990-01-01	2011-01-01	NaN	<p>Songs written by: Nicky Cook</p>\n<p>VOCALS...	2	2008-11-26 01:42:52	10	4	51.895927	Colchester England	0.891874	Nicky Cook\n	Nicky Cook	NaN	[instrumentals, experimental pop, post punk, e...	NaN	NaN	training	large	256000	0	NaN	2008-11-26 01:48:56	2008-01-01	311	0	NaN	[76, 103]	[17, 10, 76, 103]	NaN	978	en	Attribution-NonCommercial-NoDerivatives (aka M...	361	NaN	3	NaN	[]	Spiritual Level
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
155316	0	2017-03-30 15:20:35	2017-02-17	NaN	0	22940	<p>A live performance at Monty Hall on Feb 17,...	1506	Monty Hall	[]	Live at Monty Hall, 2/17/2017	6	Live Performance	NaT	NaT	NaN	NaN	0	2017-03-30 15:18:28	0	24357	NaN	New Jersey	NaN	GILLIAN/JENNA/DECLAN/JAIME	Spowder	NaN	[spowder]	https://spowder.bandcamp.com/	NaN	training	large	320000	0	NaN	2017-03-30 15:23:34	NaT	162	1	Rock	[25]	[25, 12]	NaN	122	NaN	Creative Commons Attribution-NonCommercial-NoD...	102	NaN	3	NaN	[]	The Auger
155317	0	2017-03-30 15:20:35	2017-02-17	NaN	0	22940	<p>A live performance at Monty Hall on Feb 17,...	1506	Monty Hall	[]	Live at Monty Hall, 2/17/2017	6	Live Performance	NaT	NaT	NaN	NaN	0	2017-03-30 15:18:28	0	24357	NaN	New Jersey	NaN	GILLIAN/JENNA/DECLAN/JAIME	Spowder	NaN	[spowder]	https://spowder.bandcamp.com/	NaN	training	large	320000	0	NaN	2017-03-30 15:23:36	NaT	217	1	Rock	[25]	[25, 12]	NaN	194	NaN	Creative Commons Attribution-NonCommercial-NoD...	165	NaN	4	NaN	[]	Let's Skin Ruby
155318	0	2017-03-30 15:20:35	2017-02-17	NaN	0	22940	<p>A live performance at Monty Hall on Feb 17,...	1506	Monty Hall	[]	Live at Monty Hall, 2/17/2017	6	Live Performance	NaT	NaT	NaN	NaN	0	2017-03-30 15:18:28	0	24357	NaN	New Jersey	NaN	GILLIAN/JENNA/DECLAN/JAIME	Spowder	NaN	[spowder]	https://spowder.bandcamp.com/	NaN	training	large	320000	0	NaN	2017-03-30 15:23:37	NaT	404	2	Rock	[25]	[25, 12]	NaN	214	NaN	Creative Commons Attribution-NonCommercial-NoD...	168	NaN	6	NaN	[]	My House Smells Like Kim Deal/Pulp
155319	0	2017-03-30 15:20:35	2017-02-17	NaN	0	22940	<p>A live performance at Monty Hall on Feb 17,...	1506	Monty Hall	[]	Live at Monty Hall, 2/17/2017	6	Live Performance	NaT	NaT	NaN	NaN	0	2017-03-30 15:18:28	0	24357	NaN	New Jersey	NaN	GILLIAN/JENNA/DECLAN/JAIME	Spowder	NaN	[spowder]	https://spowder.bandcamp.com/	NaN	training	large	320000	0	NaN	2017-03-30 15:23:39	NaT	146	0	Rock	[25]	[25, 12]	NaN	336	NaN	Creative Commons Attribution-NonCommercial-NoD...	294	NaN	5	NaN	[]	The Man With Two Mouths
155320	0	2017-03-26 16:22:18	2017-03-26	NaN	1	22906	NaN	7481	NaN	[ballad, epic, rockabilly, curse, hex, hard ro...	What I Tell Myself Vol. 2	11	Album	NaT	NaT	NaN	<p>**NOTE FOR USING OUR MUSIC ON YOUTUBE**...	1	2016-02-04 17:26:24	12	21615	NaN	Jersey City, NJ 07302	NaN	Alishia Taiping (lead vocals, bass) \nDan Pier...	Forget the Whale	** PLEASE CONNECT WITH US THROUGH FACEBOOK! WE...	[forget the whale, witches, rockabilly, rb, rn...	NaN	NaN	validation	large	320000	0	NaN	2017-03-30 09:15:36	NaT	198	1	NaN	[10, 12, 169]	[169, 10, 12, 9]	NaN	972	NaN	Attribution-NonCommercial	705	NaN	7	NaN	[ballad, epic, rockabilly, curse, hex, hard ro...	Another Trick Up My Sleeve (Instrumental)

106574 rows × 52 columns

Genres¶

The file contains a complete genres forest structure, organized as node list. For each genre, we got its: title, number of tracks that belong to it or to its children, the ID of the parent genre, and the ID of the highest level genre.

In [ ]:

genres = pd.read_csv('fma_metadata/genres.csv', index_col=0)
genres

Out[ ]:

	#tracks	parent	title	top_level
genre_id
1	8693	38	Avant-Garde	38
2	5271	0	International	2
3	1752	0	Blues	3
4	4126	0	Jazz	4
5	4106	0	Classical	5
...	...	...	...	...
1032	60	102	Turkish	2
1060	30	46	Tango	2
1156	26	130	Fado	2
1193	72	763	Christmas	38
1235	14938	0	Instrumental	1235

163 rows × 4 columns

In [ ]:

print(f"There is {len(genres.top_level.unique())} top-level genres, from which all the other genres inherit")

There is 16 top-level genres, from which all the other genres inherit

Features¶

This file contains a complete set of fundamental signal analysis features, extracted by authors using librosa.

While these features are not easily interpretable for human being, they might be the core for understanding music popularity using statistical methods

In [ ]:

features = pd.read_csv('fma_metadata/features.csv', index_col=0, header=[0, 1, 2])
features

Out[ ]:

feature	chroma_cens																																								...	tonnetz																																	zcr
statistics	kurtosis												max												mean												median				...	max			mean						median						min						skew						std						kurtosis	max	mean	median	min	skew	std
number	01	02	03	04	05	06	07	08	09	10	11	12	01	02	03	04	05	06	07	08	09	10	11	12	01	02	03	04	05	06	07	08	09	10	11	12	01	02	03	04	...	04	05	06	01	02	03	04	05	06	01	02	03	04	05	06	01	02	03	04	05	06	01	02	03	04	05	06	01	02	03	04	05	06	01	01	01	01	01	01	01
track_id
2	7.180653	5.230309	0.249321	1.347620	1.482478	0.531371	1.481593	2.691455	0.866868	1.341231	1.347792	1.237658	0.692500	0.569344	0.597041	0.625864	0.567330	0.443949	0.487976	0.497327	0.574435	0.579241	0.620102	0.586945	0.474300	0.369816	0.236119	0.228068	0.222830	0.221415	0.229238	0.248795	0.196245	0.175809	0.200713	0.319972	0.482825	0.387652	0.249082	0.238187	...	0.318972	0.059690	0.069184	-0.002570	0.019296	0.010510	0.073464	0.009272	0.015765	-0.003789	0.017786	0.007311	0.067945	0.009488	0.016876	-0.059769	-0.091745	-0.185687	-0.140306	-0.048525	-0.089286	0.752462	0.262607	0.200944	0.593595	-0.177665	-1.424201	0.019809	0.029569	0.038974	0.054125	0.012226	0.012111	5.758890	0.459473	0.085629	0.071289	0.000000	2.089872	0.061448
3	1.888963	0.760539	0.345297	2.295201	1.654031	0.067592	1.366848	1.054094	0.108103	0.619185	1.038253	1.292235	0.677641	0.584248	0.581271	0.581182	0.454241	0.464841	0.542833	0.664720	0.511329	0.530998	0.603398	0.547428	0.232784	0.229469	0.225674	0.216713	0.220512	0.242744	0.369235	0.420716	0.312129	0.242748	0.264292	0.225683	0.230579	0.228059	0.209370	0.202267	...	0.214807	0.070261	0.070394	0.000183	0.006908	0.047025	-0.029942	0.017535	-0.001496	-0.000108	0.007161	0.046912	-0.021149	0.016299	-0.002657	-0.097199	-0.079651	-0.164613	-0.304375	-0.024958	-0.055667	0.265541	-0.131471	0.171930	-0.990710	0.574556	0.556494	0.026316	0.018708	0.051151	0.063831	0.014212	0.017740	2.824694	0.466309	0.084578	0.063965	0.000000	1.716724	0.069330
5	0.527563	-0.077654	-0.279610	0.685883	1.937570	0.880839	-0.923192	-0.927232	0.666617	1.038546	0.268932	1.125141	0.611014	0.651471	0.494528	0.448799	0.468624	0.454021	0.497172	0.559755	0.671287	0.610565	0.551663	0.603413	0.258420	0.303385	0.250737	0.218562	0.245743	0.236018	0.275766	0.293982	0.346324	0.289821	0.246368	0.220939	0.255472	0.293571	0.245253	0.222065	...	0.180027	0.072169	0.076847	-0.007501	-0.018525	-0.030318	0.024743	0.004771	-0.004536	-0.007385	-0.018953	-0.020358	0.024615	0.004868	-0.003899	-0.128391	-0.125289	-0.359463	-0.166667	-0.038546	-0.146136	1.212025	0.218381	-0.419971	-0.014541	-0.199314	-0.925733	0.025550	0.021106	0.084997	0.040730	0.012691	0.014759	6.808415	0.375000	0.053114	0.041504	0.000000	2.193303	0.044861
10	3.702245	-0.291193	2.196742	-0.234449	1.367364	0.998411	1.770694	1.604566	0.521217	1.982386	4.326824	1.300406	0.461840	0.540411	0.446708	0.647553	0.591908	0.513306	0.651501	0.516887	0.511479	0.478263	0.638766	0.638495	0.229882	0.286978	0.240096	0.226792	0.192443	0.288410	0.413348	0.349137	0.268424	0.243144	0.268941	0.236763	0.230555	0.280229	0.234060	0.226213	...	0.192640	0.117094	0.059757	-0.021650	-0.018369	-0.003282	-0.074165	0.008971	0.007101	-0.021108	-0.019117	-0.007409	-0.067350	0.007036	0.006788	-0.107889	-0.194957	-0.273549	-0.343055	-0.052284	-0.029836	-0.135219	-0.275780	0.015767	-1.094873	1.164041	0.246746	0.021413	0.031989	0.088197	0.074358	0.017952	0.013921	21.434212	0.452148	0.077515	0.071777	0.000000	3.542325	0.040800
20	-0.193837	-0.198527	0.201546	0.258556	0.775204	0.084794	-0.289294	-0.816410	0.043851	-0.804761	-0.990958	-0.430381	0.652864	0.676290	0.670288	0.598666	0.653607	0.697645	0.664929	0.686563	0.635117	0.667393	0.689589	0.683196	0.202806	0.245125	0.262997	0.187961	0.182397	0.238173	0.278600	0.292905	0.247150	0.292501	0.304655	0.235177	0.200830	0.224745	0.234279	0.187789	...	0.286794	0.097534	0.072202	0.012362	0.012246	-0.021837	-0.075866	0.006179	-0.007771	0.011057	0.012416	-0.025059	-0.072732	0.005057	-0.006812	-0.147339	-0.210110	-0.342446	-0.388083	-0.075566	-0.091831	0.192395	-0.215337	0.081732	0.040777	0.232350	-0.207831	0.033342	0.035174	0.105521	0.095003	0.022492	0.021355	16.669037	0.469727	0.047225	0.040039	0.000977	3.189831	0.030993
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
155316	-0.490129	0.463834	2.321970	-0.084352	1.662914	2.115189	-0.237794	5.695442	0.830353	1.951819	-0.190785	-0.186416	0.458839	0.521428	0.706573	0.654298	0.714273	0.697577	0.492030	0.635969	0.490951	0.673249	0.482165	0.575247	0.162204	0.165724	0.183395	0.370902	0.503333	0.402122	0.172934	0.117394	0.170457	0.183613	0.176377	0.224584	0.167897	0.151765	0.144263	0.409983	...	0.362495	0.059700	0.061864	-0.002646	-0.020130	-0.014871	0.108537	-0.002627	-0.014542	-0.001425	-0.020164	-0.003678	0.124491	0.001654	-0.015243	-0.146410	-0.097058	-0.351751	-0.429963	-0.079035	-0.083754	-0.453276	0.150489	-0.555654	-1.179946	-0.769824	0.186115	0.027962	0.024058	0.102859	0.128410	0.022547	0.019816	4.448255	0.172852	0.028773	0.028320	0.003906	0.955388	0.012385
155317	-0.461559	-0.229601	-0.496632	-0.422033	0.130612	-0.263825	-0.628103	-0.082687	-0.229483	-0.492753	-0.746905	-0.041635	0.535615	0.639617	0.685340	0.632173	0.717913	0.644235	0.536997	0.498459	0.604127	0.678619	0.554486	0.716705	0.143780	0.260813	0.394564	0.331805	0.199928	0.191021	0.189175	0.125561	0.217677	0.331529	0.294365	0.179570	0.108340	0.253941	0.438395	0.364152	...	0.422891	0.081929	0.106361	-0.002764	-0.006797	-0.052222	-0.114434	-0.016163	0.011280	0.000395	-0.003267	-0.068680	-0.126662	-0.014746	0.014104	-0.143499	-0.182120	-0.356863	-0.414251	-0.096268	-0.072447	-0.490728	-0.439654	0.490163	0.964098	-0.157360	-0.412955	0.037378	0.041271	0.135479	0.132964	0.023548	0.026527	3.270612	0.196289	0.031116	0.027832	0.002441	1.283060	0.019059
155318	0.552473	-0.110498	-0.532014	0.263131	-0.224011	-0.530972	1.713526	1.418444	1.325197	0.120333	1.307971	-0.284805	0.510003	0.608308	0.662439	0.662508	0.712947	0.620948	0.631632	0.569815	0.636054	0.678449	0.646084	0.601371	0.196570	0.205890	0.252149	0.351820	0.408357	0.328593	0.208854	0.204884	0.243831	0.239283	0.230858	0.252463	0.202658	0.199851	0.222772	0.367897	...	0.392702	0.068798	0.072355	0.002449	-0.010554	0.011718	0.052900	-0.003610	-0.011104	0.003275	-0.010283	0.023729	0.056378	-0.004724	-0.010906	-0.128721	-0.148665	-0.334573	-0.375282	-0.091255	-0.084922	-0.296828	-0.335870	-0.512710	0.032411	0.386611	0.081947	0.027789	0.024484	0.089910	0.108324	0.017540	0.020471	2.356727	0.212891	0.038450	0.037109	0.003418	0.828569	0.017904
155319	-0.176901	0.187208	-0.050664	0.368843	0.066005	-0.857354	-0.780860	0.626281	-0.630938	-0.787229	1.402555	-0.673350	0.599551	0.622361	0.571105	0.692349	0.641605	0.684136	0.660806	0.649285	0.625508	0.518667	0.643290	0.479706	0.243379	0.260273	0.252141	0.264242	0.308059	0.315641	0.298065	0.269567	0.272238	0.258127	0.206424	0.219717	0.217516	0.244000	0.242138	0.231664	...	0.291018	0.083252	0.074353	0.006948	-0.007725	-0.025543	0.020396	-0.002116	0.001965	0.006262	-0.007116	-0.015680	0.031222	-0.003402	0.001749	-0.081886	-0.096519	-0.366708	-0.248391	-0.057357	-0.059915	0.007862	-0.029525	-0.038584	-0.443134	0.501710	0.006415	0.023194	0.022953	0.092314	0.088311	0.018328	0.017936	6.188604	0.167480	0.041480	0.038086	0.004883	1.818740	0.020133
155320	0.489665	1.862421	0.854461	-0.103666	-0.249835	0.360283	-0.366701	0.033578	-0.834606	-1.154845	-0.834298	-0.235041	0.625346	0.692410	0.681736	0.614683	0.682144	0.618091	0.562875	0.661097	0.558848	0.674898	0.484816	0.592569	0.219860	0.226032	0.237571	0.292068	0.323929	0.258558	0.214975	0.237283	0.300346	0.313399	0.266092	0.223576	0.213141	0.216042	0.224989	0.266809	...	0.367158	0.075096	0.063012	0.001088	-0.005252	-0.016319	0.041925	0.004942	0.000728	0.000575	-0.005006	-0.023387	0.048306	0.004030	0.001267	-0.090776	-0.087376	-0.305717	-0.288380	-0.056747	-0.069693	0.258051	0.065349	0.009541	-0.408493	0.122494	-0.130529	0.024848	0.023589	0.099553	0.091421	0.020312	0.016794	21.756050	0.845215	0.075141	0.044434	0.004395	4.687204	0.137205

106574 rows × 518 columns

Echonest¶

The features extracted from Echonest
Echonest was the music analysis platform (acquired by Spotify), which provided the high-level features for the music. While these features are immensely useful, because they are both human- and machine-interpretable, this set has serious flaws.
Most importantly, the echonest features are available only for the subset of 13129 samples, which makes it usable only for fma_small analysis.
However, what is of the same importance, is the fact that prioprietary algorithms behing the calculated audio features, make the analysis based on these features obscure and very hard to interpret itself, as explained in the paper below.

Eriksson, M. (2016). Close reading big data: The Echo Nest and the production of (rotten) music metadata. First Monday, 21.

In [ ]:

echonest = pd.read_csv('fma_metadata/echonest.csv', index_col=0, header=[0, 1, 2])
echonest

Out[ ]:

	echonest
	audio_features								metadata							ranks					social_features					temporal_features
	acousticness	danceability	energy	instrumentalness	liveness	speechiness	tempo	valence	album_date	album_name	artist_latitude	artist_location	artist_longitude	artist_name	release	artist_discovery_rank	artist_familiarity_rank	artist_hotttnesss_rank	song_currency_rank	song_hotttnesss_rank	artist_discovery	artist_familiarity	artist_hotttnesss	song_currency	song_hotttnesss	000	001	002	003	004	005	006	007	008	009	010	011	012	013	014	...	184	185	186	187	188	189	190	191	192	193	194	195	196	197	198	199	200	201	202	203	204	205	206	207	208	209	210	211	212	213	214	215	216	217	218	219	220	221	222	223
track_id
2	0.416675	0.675894	0.634476	0.010628	0.177647	0.159310	165.922	0.576661	NaN	NaN	32.6783	Georgia, US	-83.22300	AWOL	AWOL - A Way Of Life	NaN	NaN	NaN	NaN	NaN	0.388990	0.386740	0.406370	0.000000	0.000000	0.877233	0.588911	0.354243	0.295090	0.298413	0.309430	0.304496	0.334579	0.249495	0.259656	0.318376	0.371974	1.0000	0.5710	0.278	...	0.097149	0.401260	0.006324	0.643486	0.012059	0.237947	0.655938	1.213864	-12.486146	-11.2695	46.031261	-60.000000	-3.933	56.067001	-2.587475	11.802585	0.047970	0.038275	0.000988	0.00000	0.20730	0.20730	1.603659	2.984276	-21.812077	-20.312000	49.157482	-60.000000	-9.691	50.308998	-1.992303	6.805694	0.233070	0.192880	0.027455	0.06408	3.676960	3.61288	13.316690	262.929749
3	0.374408	0.528643	0.817461	0.001851	0.105880	0.461818	126.957	0.269240	NaN	NaN	32.6783	Georgia, US	-83.22300	AWOL	AWOL - A Way Of Life	NaN	NaN	NaN	NaN	NaN	0.388990	0.386740	0.406370	0.000000	0.000000	0.534429	0.537414	0.443299	0.390879	0.344573	0.366448	0.419455	0.747766	0.460901	0.392379	0.474559	0.406729	0.5060	0.5145	0.387	...	1.015813	1.627731	0.032318	0.819126	-0.030998	0.734610	0.458883	0.999964	-12.502044	-11.4205	26.468552	-60.000000	-5.789	54.210999	-1.755855	7.895351	0.057707	0.045360	0.001397	0.00000	0.33950	0.33950	2.271021	9.186051	-20.185032	-19.868000	24.002327	-60.000000	-9.679	50.320999	-1.582331	8.889308	0.258464	0.220905	0.081368	0.06413	6.082770	6.01864	16.673548	325.581085
5	0.043567	0.745566	0.701470	0.000697	0.373143	0.124595	100.260	0.621661	NaN	NaN	32.6783	Georgia, US	-83.22300	AWOL	AWOL - A Way Of Life	NaN	NaN	NaN	NaN	NaN	0.388990	0.386740	0.406370	0.000000	0.000000	0.548093	0.720192	0.389257	0.344934	0.361300	0.402543	0.434044	0.388137	0.512487	0.525755	0.425371	0.446896	0.5110	0.7720	0.361	...	-0.250734	4.719755	-0.183342	0.340812	-0.295970	0.099103	0.098723	1.389372	-15.458095	-14.1050	35.955223	-60.000000	-7.248	52.751999	-2.505533	9.716598	0.058608	0.045700	0.001777	0.00000	0.29497	0.29497	1.827837	5.253727	-24.523119	-24.367001	31.804546	-60.000000	-12.582	47.417999	-2.288358	11.527109	0.256821	0.237820	0.060122	0.06014	5.926490	5.86635	16.013849	356.755737
10	0.951670	0.658179	0.924525	0.965427	0.115474	0.032985	111.562	0.963590	2008-03-11	Constant Hitmaker	39.9523	Philadelphia, PA, US	-75.16240	Kurt Vile	Constant Hitmaker	2635.0	2544.0	397.0	115691.0	67609.0	0.557339	0.614272	0.798387	0.005158	0.354516	0.311404	0.711402	0.321914	0.500601	0.250963	0.321316	0.734250	0.325188	0.373012	0.235840	0.368756	0.440775	0.2630	0.7360	0.273	...	7.889378	1.809147	2.219095	1.518430	0.654815	0.650727	12.656473	0.406731	-10.244890	-9.4640	20.304308	-60.000000	-5.027	54.973000	-5.365219	41.201279	0.048938	0.040800	0.002591	0.00000	0.89574	0.89574	10.539709	150.359985	-16.472773	-15.903000	27.539440	-60.000000	-9.025	50.974998	-3.662988	21.508228	0.283352	0.267070	0.125704	0.08082	8.414010	8.33319	21.317064	483.403809
134	0.452217	0.513238	0.560410	0.019443	0.096567	0.525519	114.290	0.894072	NaN	NaN	32.6783	Georgia, US	-83.22300	AWOL	AWOL - A Way Of Life	NaN	NaN	NaN	NaN	NaN	0.388990	0.386740	0.406370	0.000000	0.000000	0.610849	0.569169	0.428494	0.345796	0.376920	0.460590	0.401371	0.449900	0.428946	0.446736	0.479849	0.378221	0.6140	0.5450	0.363	...	-0.139364	2.251030	-0.224826	0.050703	0.188019	0.249750	0.931698	0.766069	-15.145472	-14.1510	19.988146	-40.209999	-7.351	32.859001	-1.632508	3.340982	0.059470	0.048560	0.001586	0.01079	0.42006	0.40927	2.763948	13.718324	-24.336575	-22.448999	52.783905	-60.000000	-13.128	46.872002	-1.452696	2.356398	0.234686	0.199550	0.149332	0.06440	11.267070	11.20267	26.454180	751.147705
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
124857	0.007592	0.790364	0.719288	0.853114	0.720715	0.082550	141.332	0.890461	NaN	NaN	52.1082	Netherlands	5.32986	Basic	Do You Know The Word	NaN	NaN	NaN	NaN	NaN	0.430808	0.456871	0.486749	0.000000	0.000000	0.717013	0.686557	0.411056	0.342718	0.341934	0.482926	0.419219	0.408946	0.393060	0.382778	0.450459	0.514263	0.7690	0.7260	0.360	...	0.311250	3.484239	1.163561	0.529112	0.603466	-0.044795	0.674263	0.532037	-10.715957	-9.6450	45.425846	-60.000000	-2.113	57.887001	-2.873086	11.131849	0.047152	0.042770	0.000877	0.00000	0.29170	0.29170	1.735818	7.347428	-20.573229	-18.757999	60.454914	-60.000000	-4.776	55.223999	-1.721207	4.686078	0.213789	0.208800	0.007911	0.06395	2.040730	1.97678	8.144532	147.040405
124862	0.041498	0.843077	0.536496	0.865151	0.547949	0.074001	101.975	0.476845	NaN	NaN	52.1082	Netherlands	5.32986	Basic	Do You Know The Word	NaN	NaN	NaN	NaN	NaN	0.430808	0.456871	0.486749	0.000000	0.000000	0.673395	0.846995	0.447772	0.425936	0.407817	0.405924	0.290565	0.314019	0.318129	0.310359	0.329973	0.344658	0.6560	1.0000	0.397	...	0.499671	2.396445	0.555582	1.468794	0.351559	0.639904	0.255333	0.229793	-13.910966	-13.4740	46.758633	-60.000000	-1.623	58.376999	-0.946041	2.599266	0.041717	0.034930	0.000976	0.00000	0.33370	0.33370	2.364584	10.817008	-26.106918	-25.502001	52.491909	-60.000000	-9.225	50.775002	-0.647897	1.282306	0.214586	0.181860	0.011247	0.06240	0.922360	0.85996	1.794739	6.321268
124863	0.000124	0.609686	0.895136	0.846624	0.632903	0.051517	129.996	0.496667	NaN	NaN	52.1082	Netherlands	5.32986	Basic	Do You Know The Word	NaN	NaN	NaN	NaN	NaN	0.430808	0.456871	0.486749	0.000000	0.000000	0.842368	0.719091	0.351503	0.354707	0.314619	0.276266	0.340571	0.342762	0.449963	0.456690	0.525160	0.379067	0.9410	0.7365	0.322	...	4.096390	1.093365	-0.206154	1.746544	0.559263	1.698776	2.477646	0.512601	-9.183863	-8.5280	22.383396	-57.069000	-2.212	54.857002	-1.961082	10.309643	0.027183	0.020315	0.000572	0.00160	0.30417	0.30257	3.510331	19.698879	-18.732578	-18.153000	38.714157	-58.957001	-6.727	52.230000	-0.771613	1.623510	0.180471	0.128185	0.010103	0.06222	2.251160	2.18894	5.578341	89.180328
124864	0.327576	0.574426	0.548327	0.452867	0.075928	0.033388	142.009	0.569274	NaN	NaN	52.1082	Netherlands	5.32986	Basic	Do You Know The Word	NaN	NaN	NaN	NaN	NaN	0.430808	0.456871	0.486749	0.000000	0.000000	0.346748	0.311817	0.220864	0.185269	0.333642	0.290699	0.558345	0.397021	0.217570	0.297939	0.282145	0.448469	0.2665	0.2060	0.148	...	1.019667	3.658061	0.368144	0.571845	0.264829	1.132334	1.301940	0.004652	-12.746772	-11.5220	36.483959	-60.000000	-3.782	56.217999	-2.340024	11.157758	0.052788	0.042720	0.001556	-0.00041	0.40494	0.40535	2.422421	10.651926	-19.304338	-17.783001	39.064819	-60.000000	-9.612	50.388000	-2.054143	7.927149	0.250178	0.219205	0.014851	0.06390	1.487440	1.42354	2.173092	12.503966
124911	0.993606	0.499339	0.050622	0.945677	0.095965	0.065189	119.965	0.204652	2009-10-23	Suicide Beauty Girl	35.7497	Higashiyamato-shi, Tokyo Prefecture, JP	139.42200	Yusuke Tsutsumi	Suicide Beauty Girl	48702.0	226201.0	61518.0	4882500.0	2811503.0	0.450229	0.274787	0.470345	0.000061	0.068649	0.319693	0.164967	0.071792	0.426253	0.068991	0.226611	0.076166	0.179372	0.446614	0.097417	0.330121	0.080717	0.1290	0.0560	0.043	...	0.014182	2.421063	0.052055	1.932200	0.242033	6.913671	1.326249	0.358084	-26.911585	-26.9160	78.375488	-60.000000	-3.710	56.290001	-0.051008	-0.300746	0.043151	0.035030	0.000772	0.00000	0.19257	0.19257	1.369908	2.912444	-36.282391	-36.123001	62.512142	-60.000000	-18.836	41.164001	-0.215639	-0.584081	0.603893	0.505940	0.608585	0.06830	16.559731	16.49143	15.169022	302.946350

13129 rows × 249 columns

3. Exploratory Analysis¶

Feature Exploration¶

Dates of the musics¶

We will analyse music from 2008 to 2018, that looks quite uniformly distributed in time.

In [ ]:

small = tracks[tracks['set', 'subset'] <= 'small']
small['track']['date_created'].hist(bins=100)

Out[ ]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f70d60ff690>

About the chroma in each song¶

The 'chroma_cens' feature corresponds to “Chroma Energy Normalized”. It is a more robust measure of the Chroma vector, which represents how much energy of each pitch class is present in the signal.

The main idea of CENS features is that taking statistics over large windows smooths local deviations in tempo, articulation, and musical ornaments such as trills and arpeggiated chords.

For each of the 12 pitch classes, we have data about statistics on their chroma : max, min, mean, median, but also skew, kurtosis, and standard deviation.

We can look at the distribution of the pitches that have on average the most energy in the signal, that would potentially be the tonic of each song.

In [ ]:

small_features = features[tracks['set', 'subset'] <= 'small']
small_features['chroma_cens']['mean'].head()

Out[ ]:

number	01	02	03	04	05	06	07	08	09	10	11	12
track_id
2	0.474300	0.369816	0.236119	0.228068	0.222830	0.221415	0.229238	0.248795	0.196245	0.175809	0.200713	0.319972
5	0.258420	0.303385	0.250737	0.218562	0.245743	0.236018	0.275766	0.293982	0.346324	0.289821	0.246368	0.220939
10	0.229882	0.286978	0.240096	0.226792	0.192443	0.288410	0.413348	0.349137	0.268424	0.243144	0.268941	0.236763
140	0.161163	0.272767	0.295905	0.255588	0.260886	0.252854	0.193282	0.191970	0.291551	0.319938	0.198516	0.120607
141	0.150417	0.155785	0.217253	0.224969	0.273518	0.295436	0.259958	0.181313	0.177233	0.296048	0.331963	0.218315

In [ ]:

small_features['mfcc']['mean'].hist(bins=50, figsize=(17, 16))
plt.plot()

Out[ ]:

[]

All of the notes seem to follow a Gaussian distribution centered around zero, expect for the first two which are centered respectively around -150 and +150, which means that the notes are quite present in the energy of the signal.

In [ ]:

small_features['chroma_cens']['mean'].idxmax(axis=1).sort_values().hist(bins=12)

Out[ ]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f70cdd78b50>

Interestingly, we observe that the pitch that is on average the maximum is more frequently white keys from the piano, rather than black ones.

We also have other chroma features like 'chroma_cqt', and 'chroma_stft' that give us other insights on chroma in each songs.

Mel-frequency cepstral coefficients (MFCCs)¶

One other feature that is interesting is the MFCCs. This are the coefficients that describe the short-term power spectrum of a sound, that itself describes how power of a signal or time series is distributed over frequency.

In [ ]:

small_features['mfcc']['mean']

Out[ ]:

number	01	02	03	04	05	06	07	08	09	10	11	12	13	14	15	16	17	18	19	20
track_id
2	-163.772964	116.696678	-41.753826	29.144329	-15.050158	18.879372	-8.918165	12.002118	-4.253151	1.359791	-2.683000	-0.794632	-6.920971	-3.655366	1.465213	0.201078	3.998204	-2.114676	0.116842	-5.785884
5	-205.440491	132.215073	-16.085823	41.514759	-7.642954	16.942802	-5.651261	9.569445	0.503157	8.673513	-8.271377	0.594473	-0.340203	2.377888	7.899487	1.947641	7.441950	-1.739911	0.278015	-5.489016
10	-135.864822	157.040085	-53.453247	17.198896	6.868035	13.934344	-11.749298	8.360711	-5.130381	0.233845	-5.421206	1.679479	-6.218249	1.844195	-4.099704	0.779950	-0.559577	-1.018324	-3.807545	-0.679533
140	-225.713318	139.332825	-13.097699	44.533356	2.468400	28.328743	-9.931481	10.810857	3.002879	-0.937692	7.138268	-6.625260	0.824269	-2.003132	4.293943	-7.935050	0.063948	-2.363509	-0.158602	0.594098
141	-253.143906	155.716324	-16.636627	23.683815	6.045957	11.692952	-9.947761	6.887814	-3.273322	-6.340906	7.602782	-5.851329	2.017422	-4.396296	-3.689521	-0.929987	0.783247	0.768126	2.809321	3.325740
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
154308	-288.879303	152.342087	-2.517532	8.582313	-0.384105	4.368904	3.391255	5.985993	9.245567	4.665963	4.699042	0.592779	2.949878	5.256505	1.607061	2.205407	-1.347569	2.550121	5.137365	6.354999
154309	-367.696625	104.314285	11.179615	-9.774891	-12.614079	1.173940	-10.825560	-10.481558	-12.770616	-8.729056	-13.194090	-6.167614	-5.817488	0.828269	-1.539551	-2.340352	-6.185808	-5.277195	-9.561085	-8.861394
154413	-229.868378	155.606247	4.354007	45.476540	-7.373891	22.758905	2.523344	9.925150	7.117429	1.199985	3.841901	4.540169	4.224662	1.915167	-1.782451	4.374582	6.465470	1.281081	4.305267	0.595658
154414	-225.491821	156.149582	-4.608137	42.489342	-8.574591	10.030898	-9.780822	-1.589058	3.073855	3.905075	6.453709	6.542689	-0.049502	-2.948072	-3.487443	-0.383391	-2.238332	-4.390033	-2.679048	-5.389973
155066	-291.700775	160.115982	93.731789	49.252903	-0.456035	-8.281082	-12.962958	-2.272800	1.398377	-0.398876	-4.343043	-5.039856	-4.664367	-2.756296	-0.987940	0.257712	1.739598	2.405217	2.127824	3.123198

8000 rows × 20 columns

In [ ]:

small_features['mfcc']['mean'].hist(bins=50, figsize=(17, 16))
plt.plot()

Out[ ]:

[]

Again, we observe Gaussian distributions with mean zero, except for the two first ones.

Root-mean-square (RMS) energy¶

Corresponds to how 'loud' the music is.

In [ ]:

small_features['rmse']['mean'].hist(bins=20)

Out[ ]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f70ce2c00d0>]],
      dtype=object)

In [ ]:

small_features['rmse']['mean'].describe()

Out[ ]:

number	01
count	8000.000000
mean	4.568130
std	2.484890
min	0.000246
25%	2.735237
50%	4.286309
75%	6.012164
max	21.608154

The RMSE in the dataset has a mean of 4.5 and a standard deviation of 2.48, ranging from almost zero to 21.6.

Spectral Bandwidth, centroid, contrast and rolloff¶

The spectral bandwidth is defined as the band width at one-half the peak maximum

In [ ]:

small_features['spectral_bandwidth']['mean'].hist(bins=50)

Out[ ]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f70d5f8e2d0>]],
      dtype=object)

In [ ]:

small_features['spectral_bandwidth']['mean'].describe()

Out[ ]:

number	01
count	8000.000000
mean	1420.529136
std	440.968569
min	241.754852
25%	1119.511719
50%	1410.169250
75%	1699.891907
max	3789.371582

It seems to again be normally distributed in the dataset, around 1420 and with a standard deviation of 440.

The centroid is an acoustical descriptor of timbre. Estimates the center of mass of the spectrum (in Hz)

In [ ]:

small_features['spectral_centroid']['mean'].hist(bins=50)

Out[ ]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f70ce09d890>]],
      dtype=object)

In [ ]:

small_features['spectral_centroid']['mean'].describe()

Out[ ]:

number	01
count	8000.000000
mean	1174.792826
std	501.694070
min	140.754303
25%	818.932144
50%	1125.063171
75%	1460.376740
max	5575.107910

Then, spectral contrast different than MFCC in the following aspect : works on octaves and considers spectral peak, spectral valley and their difference in each sub-band.

In [ ]:

small_features['spectral_contrast']['mean'].hist(bins=50, figsize=(17,16))
plt.plot()

Out[ ]:

[]

Tonnetz¶

Finally, let's study the tonnetz features of our music tracks !

Tonnetz centroids :

0: Fifth x-axis

1: Fifth y-axis

2: Minor x-axis

3: Minor y-axis

4: Major x-axis

5: Major y-axis

In [ ]:

small_features['spectral_contrast']['mean'].head()

Out[ ]:

number	01	02	03	04	05	06	07
track_id
2	18.005175	15.363138	17.129013	17.157160	18.087046	17.616112	38.268646
5	17.097452	15.969444	18.646988	16.973648	17.292145	19.255819	36.413609
10	19.177481	14.281867	15.510051	16.541273	20.316816	18.967014	34.886196
140	18.151390	16.298210	18.562231	18.060900	21.355371	19.176929	36.932270
141	19.646608	18.583612	19.953085	18.620075	22.643318	23.950989	29.015615

In [ ]:

small_features['tonnetz']['mean'].hist(bins=50, figsize=(17,16))
plt.plot()

Out[ ]:

[]

The tonnetz are also following Guassian distributions.

Zero Crossing Rate¶

The rate at which a signal changes from positive to zero to negative or from negative to zero to positive.

In [ ]:

small_features['zcr']['mean'].hist(bins=50)
plt.plot()

Out[ ]:

[]

The rate is relatively small for all the music in the dataset, the extreme values beign more than 0.1

Popularity¶

Popularity measures¶

The dataset contains a few measures that can be used to assess songs' popularity. The fields favorites, comments, listens or interest of the tracks' DataFrame could be used as such. All of them are, more or less strongly, positively correlated (see below plot) with a high significant level thanks to the large amount of data. Nonetheless we discarded interest as a measure of popularity as its calculation is arcane and makes it less directly interpretable. Moreover comment and favorite options on Free Music Archives' website don't seem to be largely adopted. That's why, in addition to stay closer to how music's popularity can be defined in the industry, we chose to use listens to characterize popularity.

On a sidenote, it could have been interesting to have access to the download's count of the songs to build a metric closer to the units for certification purposes.

In [ ]:

from scipy.stats import pearsonr

def corrfunc(x, y, ax=None, **kws):
    """Plot the correlation coefficient in the top left hand corner of a plot."""
    r, p = pearsonr(x, y)
    ax = ax or plt.gca()
    ax.annotate(f'r = {r:.2f}', xy=(.1, .9), xycoords=ax.transAxes)
    
g = sns.pairplot(tracks["track"]
                 , vars=["listens", "interest", "favorites", "comments"]
                 , corner=True
                 , kind="reg"
                 , diag_kind="kde"
                )
g.map_lower(corrfunc)

Out[ ]:

<seaborn.axisgrid.PairGrid at 0x7f70d5331810>

Built upon Free Music Archive's content, a strength of our dataset, being constituted of royalty-free music, can also be seen as one of its limits when it comes to our research question. Indeed, solely based on the number of listens from our dataset, songs are far from what the industry could consider as Hit Songs. American audio-engineer and author Bobby Owsinski wrote for Forbes in 2017 that "a song with a million streams doesn't really get you on the industry's radar", adding that "very minor hit now comes in at around 50 million [streams]". As we can see on the plots (coarsely on the bottom line of last figure and more precisely on the CCDF plotted below), none of the song present in the fma dataset go close to those numbers, the most popular one recording a bit less than 550K streams at that time. Thus, it is important to acknowledge that our analysis strongly depends on the source of our data and could hardly be generalized.

On the other hand, it can also be assumpted that the external feautres, such as commercial and communicational components, might have less influence on the streams on this platform, making it more prone to emphasize popular musical features.

Let's now dig a bit on the streams' count for the songs in the dataset by plotting its distribution:

In [ ]:

from scipy.optimize import curve_fit # import to ease the implementation of the fitting function

def linlaw(x, a, b) :
    return a+x*b

def powerlaw_fit(xdata, ydata) :
    """
    Role : fit the given data (x and y) to a powerlaw : find the optimal a and b parameters in y = a*x^k
            to do so the equation is linearized: log(y)=k*log(x)+log(a)
            so that we can realise a linear fit
    
    # PARAMETERS
    xdata : array of the x-coordinates of the data to be fitted
    ydata : array of the y-coordinates of the data to be fitted
    
    # RETURN
    popt_log : array [optimal log(a), optimal k]
    perr : array one standard deviation errors on the parameters
    """
    # take the log of the arguments to linearise the equation
    xdata_log = np.log10(xdata)
    ydata_log = np.log10(ydata)
    
    # Fit linear using curve_fit and the affine function linlaw
    popt_log, pcov_log = curve_fit(linlaw, xdata_log, ydata_log)
    # compute one standard deviation errors on the parameters
    perr = np.sqrt(np.diag(pcov_log))
    
    return (popt_log, perr)

listens = tracks["track"]["listens"]
MIN = listens.min()
MAX = listens.max()

# build the histogram as a CCDF
y, x = np.histogram(listens, bins = 10 ** np.linspace(np.log10(MIN+1), np.log10(MAX), 101))
y = (y.cumsum()[::-1])/len(listens)

# preprocessed isolation of the points where the amount of point is sufficiently linear to perform fit
n_end_fit = 2
n_start_fit = 59

# find the optimal parameters and error on the fit of our data
popt, perr = powerlaw_fit(x[n_start_fit+1:-n_end_fit], y[n_start_fit:-n_end_fit])

# isolate the parameters and errors
log_a_err = perr[0]
k_err = perr[1]

log_a_fit = popt[0]
k_fit = popt[1]

# Represent this fit on a plot with the data to visualise its pertinence
fig = plt.figure()

plt.loglog(x[1:],y
          , c='k', lw=2
           , label = 'CCDF'
          )
plt.scatter(x[n_start_fit+1:-n_end_fit], y[n_start_fit:-n_end_fit]
            , alpha = 0.4, color='k', label='Fitted points'
           )

plt.loglog(x[n_start_fit-5:], 10**log_a_fit*(x[n_start_fit-5:])**k_fit
           , ls ='--', c='b', lw=2
           , label = r'Power law fit: $y= \alpha \dot x^{k}$' + '\n' + r'$\log{\alpha} = %.1f \pm %.1f$, $k= %.2f \pm %.2f$' %(log_a_fit, log_a_err, k_fit, k_err)
          )


# add legend 
plt.legend()

# add axis labels
plt.ylabel('Proportion of songs (in log scale)')
plt.xlabel('Listens count (in log scale)')

# display plot
plt.show()

The Complementary Cumulative Density Function plotted on logarithmic scale on both axis displays a linear decrease slope (past approximately 2000 to 3000 listens) emphasized by the blue line fitted on top of the data. This indicates that the distribution of listens' count exhibits an heavy tail (as implied on previous plot on linear scale).

In other words, the majority of the songs are situated in the same range of number of streams and a few have much larger one. The plot shows for example that 90% of the songs have been streamed less than 4100 times, which represents not even 1% of the most streamed song.

Genres' specificities¶

It could be insightful to refine the granularity of our analysis to genres. Indeed this musical characteristic might have an important impact on the popularity of the songs and different musical features could have different effects on songs' popularity for different genres. To see how the number of listens can differ from one genre to another, let's represent the distribution for each genre as boxplots:

In [ ]:

import matplotlib.cm as cm
import matplotlib

# create color code reflecting the proportion of songs per genre in the dataset
sorted_genres = tracks["track"].groupby("genre_top").listens.count().sort_values(ascending=False)
cmap = cm.get_cmap('Spectral')
norm = matplotlib.colors.Normalize(vmin=np.log10(sorted_genres.min())#/sorted_genres.sum()
                                   , vmax=np.log10(sorted_genres.max())#/sorted_genres.sum()
                                  )
#color_seq = np.log10(sorted_genres)/np.log10(sorted_genres.sum())
color_seq = np.log10(sorted_genres.values)#/sorted_genres.sum()

# plot the boxplot of listens for each genre
fig, ax = plt.subplots(figsize=(16,12)
                       , nrows=2 
                       , ncols=1
                       , sharex=True
                       , gridspec_kw={'height_ratios':[0.5,2]}
                      )

ax[0].boxplot(x=tracks["track"]["listens"], vert=False)
ax[0].set_xlim([0.9, np.max(tracks["track"]["listens"])])
ax[0].set_yticklabels(["All data"])
ax[0].semilogx()

sns.boxplot(x="listens"
            , y="genre_top"
            , data=tracks["track"]
            , order=sorted_genres.index
            , ax=ax[1]
            , palette=cmap(norm(color_seq))
            , boxprops=dict(alpha=.7)
           )
ax[1].semilogx()
ax[1].set_ylabel("Top genre", fontsize=12)

# add colorbar
cb = fig.colorbar(cm.ScalarMappable(cmap=cmap, norm=norm), alpha=0.7, ax=ax[1])
cb.set_label(r"$\log(N_g)$", fontsize=12)

pos = ax[1].get_position()
pos2 = ax[0].get_position()
ax[0].set_position([pos.x0,pos2.y0,pos.width,pos2.height])

plt.show()

The above boxplots per genre, plotted on a logarithmic scale, expose the different standards of popularity for the different top genres. A logarithmic color code is added to reflect the total number of songs per genre $N_g$ in the dataset.

It shows that if a song had $n_{l}$ listens, it could be considered very popular for one genre but not that much if it were from another genre. For example, a blues song with 1600 streams would be in the least 25% listened songs of the genre but would still have been more streamed than more than 75% of rock, experimental, hip-hop, folk or spoken music.

Discretization¶

Finally, determining discrete categories of popularity (eg. unpopular, moderately popular, popular, extremely popular) could be useful either for building logistic regression models or for visualization purposes (see PCA).

This can be done based on the listens percentiles. Using percentiles will create intervals of unequal lengths but has the advantage of enabling to balance categories based on selected proportions. We first propose to segregate popularity based on the 15th, 50th, and 85th percentiles, as summed up on the table below.

Percentile	Popularity category
0%-15%	Unpopular
15%-50%	Moderately popular
50%-85%	Popular
85%-100%	Extremely popular

However, as seen on the genre's exploration, these values will strongly depend on the selected set of values.

In [ ]:

df_set_listens = tracks["track"][["listens"]]
df_set_listens["set"] = tracks["set"]["subset"]

sns.boxplot(x="listens", y="set", data=df_set_listens)
plt.ylabel("Set (exclusive)", fontsize=12)
plt.semilogx()
plt.show()

In [ ]:

def popularity_percentiles(listens_list, percentiles=[15,50,85]):
    """ Discretize popularity based on given percentiles """
    listens_perc = np.percentile(listens_list, percentiles)
    return ["unpopular" if listens <= listens_perc[0]
            else "moderatly popular" if listens <= listens_perc[1]
            else "popular" if listens <= listens_perc[2]
            else "extremely popular"
            for listens in listens_list
           ]

First regression¶

To quickly get more insight about our data, we now perform a linear regression, to try to find good predictors for the popularity in music.

In [ ]:

small = tracks[tracks['set', 'subset'] <= 'small']
small_features = features[tracks['set', 'subset'] <= 'small']

Let's try a regression with the following features : the tonic (assuming it is the note that has the highest chroma cens value), the loudness (which corresponds to the mean of RMSE), the mean spectral bandwidth and the mean ZCR.

In [ ]:

regression_df = pd.DataFrame()
regression_df['listens'] = small['track']['listens']
regression_df['tonic'] = small_features['chroma_cens']['mean'].idxmax(axis=1)
regression_df['loudness'] = small_features['rmse']['mean']
regression_df['spectral_bandwith'] = small_features['spectral_bandwidth']['mean']
regression_df['ZCR'] = small_features['zcr']['mean']
regression_df.head()

Out[ ]:

	listens	tonic	loudness	spectral_bandwith	ZCR
track_id
2	1293	01	3.188761	1607.474365	0.085629
5	1151	09	3.251386	1512.917358	0.053114
10	50135	07	3.893810	1420.259644	0.077515
140	1299	10	2.953848	1475.625366	0.052379
141	725	11	2.576761	1192.835571	0.040267

Let's create the regression model now.

In [ ]:

mod = smf.ols(formula='listens ~ C(tonic)+loudness+spectral_bandwith+ZCR', data=regression_df)
res = mod.fit()
res.summary()

Out[ ]:

OLS Regression Results
Dep. Variable:	listens	R-squared:	0.005
Model:	OLS	Adj. R-squared:	0.003
Method:	Least Squares	F-statistic:	2.783
Date:	Sun, 25 Apr 2021	Prob (F-statistic):	0.000379
Time:	08:13:27	Log-Likelihood:	-86619.
No. Observations:	8000	AIC:	1.733e+05
Df Residuals:	7985	BIC:	1.734e+05
Df Model:	14
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	3894.3849	597.154	6.522	0.000	2723.806	5064.963
C(tonic)[T.02]	344.1589	681.509	0.505	0.614	-991.777	1680.095
C(tonic)[T.03]	-260.9056	549.930	-0.474	0.635	-1338.913	817.102
C(tonic)[T.04]	559.6154	686.244	0.815	0.415	-785.602	1904.832
C(tonic)[T.05]	-791.4610	545.483	-1.451	0.147	-1860.751	277.829
C(tonic)[T.06]	-355.4451	639.107	-0.556	0.578	-1608.262	897.372
C(tonic)[T.07]	-758.9719	676.598	-1.122	0.262	-2085.282	567.338
C(tonic)[T.08]	-813.6974	549.598	-1.481	0.139	-1891.053	263.658
C(tonic)[T.09]	1221.2014	659.657	1.851	0.064	-71.899	2514.302
C(tonic)[T.10]	-301.0269	566.432	-0.531	0.595	-1411.382	809.328
C(tonic)[T.11]	-916.1798	715.902	-1.280	0.201	-2319.535	487.175
C(tonic)[T.12]	-785.8983	683.208	-1.150	0.250	-2125.164	553.368
loudness	181.6048	57.076	3.182	0.001	69.720	293.489
spectral_bandwith	0.7408	0.404	1.833	0.067	-0.052	1.533
ZCR	-1.505e+04	6082.520	-2.474	0.013	-2.7e+04	-3123.225

Omnibus:	18665.805	Durbin-Watson:	1.745
Prob(Omnibus):	0.000	Jarque-Bera (JB):	208709528.773
Skew:	22.934	Prob(JB):	0.00
Kurtosis:	792.952	Cond. No.	6.63e+04

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.63e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

The loudness and ZCR seem to be good predictors for the popularity ! (p value is less than 0.05)

Indeed, popular musics seem to be loud (positive coefficient) while having low ZCR (negative coefficient).

That is an example of analysis that we can do to try to find correlations between musical features and popularity, that can help us answer our research question.

This is also encouraging because it goes into the direction of our hyptohesis : that popular musics are sharing features, and that we can identify some of them.

First classification¶

Another first experiment that we did for the exploratory data analysis is to try another approach : classification.

We would like to predict popularity classes, as we defined them earlier.

We use the function that, as discussed before, creates four different categories : 'unpopular', 'moderatly popular', 'popular', 'extremely popular'.

In [ ]:

regression_df['Popularity'] = popularity_percentiles(regression_df['listens'])
regression_df.groupby('Popularity').count()['listens']

Out[ ]:

Popularity
extremely popular    1200
moderatly popular    2797
popular              2800
unpopular            1203
Name: listens, dtype: int64

We will try to perform LDA to cluster diffent levels of popularity in music.

We start by doing this with the feature 'MFCC', to see what kind of insight about popularity this feature can give us.

In [ ]:

small_loc_classification = tracks['set', 'subset'] <= 'small'
X = features.loc[small_loc_classification, 'mfcc']
y = regression_df['Popularity']

lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y).transform(X)

plt.figure()
colors = ['navy', 'turquoise', 'yellow', 'darkorange']

for color, target_name in zip(colors, ['unpopular', 'moderatly popular', 'popular', 'extremely popular']):
    plt.scatter(X_r2[y == target_name, 0], X_r2[y == target_name, 1], alpha=.3, color=color,
                label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('LDA')

plt.show()

The results are looking a bit messy and do not really give us a lot of information, let's try to only look at the extremes popularity types.

In [ ]:

plt.figure()
colors = ['navy', 'darkorange']

for color, target_name in zip(colors, ['unpopular', 'extremely popular']):
    plt.scatter(X_r2[y == target_name, 0], X_r2[y == target_name, 1], alpha=.4, color=color,
                label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('LDA')

plt.show()

Here, on the other hand, we see a blurry distinction between unpopular and extremely popular music. The MFCC feature seems to also detain some information about popularity. For example, a music that would fall in the right side of the plane would be less likely to be popular.

What's next¶

To answer our research question, we need to perform a more complex analysis of our dataset.

We first want to analyze on the largest dataset the impact of the different features we have on popularity. We will for that use the same process as we did above : linear regressions and LDA, with the largest range of features that we have at our disposition. That will allow us to have an in depth understanding of the impact of each feature on the popularity of a music.

Then, will follow a more fine-grained analysis of what makes music popular within genres. We will for that apply the same methods as earlier, but eight different times on the eight different genres of the fma-small dataset. This will allow us to be more precise on which features are important for a music to be popular, with no genre bias.

Expectations¶

If we look into the existing litterature, we see that we might expect music's pitch (Jakubowski et al., 2017; Yang et. al, 2017; Lee & Lee, 2018), tempo (Ni et al., 2011; Jakubowski et al., 2017; Léveillé Gauvin, 2017) and intrinsic loudness (Ni et al., 2011; Serra et al., 2012; Gauvin, 2017; Lee & Lee, 2018)to be features to discriminate hit songs from non-hits.

For the between-genre analysis, we expect the feature that we identify to be only very general features for popular music, because all the genres are mixed, thus having a very diverse corpora, making it difficult to identify any precise feature.

Then, with the precise study of the eight genres, we would expect to have better result, more significant. Indeed, the analysis is within genre, so we are more likely to find genre-specific features for popular music, that would have been hidden with the between-genre analysis.

Evaluation of Outcomes¶

To evaluate the results of the regression, we can look at the p-value of every coefficient for every feature. If the p-value is less than 0.05, that means that we reject at 95% the null hypothesis, which is that there is no linear relationship between the independent variable (the feature) and the dependent variable (the popularity). Only the features with this statistically significant relationship with popularity will be considered to have a role in popularity.

Then, to evaluate the Linear Discriminant Analysis, we will use the Leave-One-Out cross-validation technique, and look for features that produce the best scores.