FMA: A Dataset For Music Analysis

Michaƫl Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.

Free Music Archive web API

All the data in the raw_*.csv tables was collected from the Free Music Archive public API. With this notebook, you can:

  • reconstruct the original data,
  • update some fields, e.g. the track listens (play count),
  • augment the data with newer fields wich may have been introduced in their API,
  • update the dataset with new songs added to the archive.

Notes:

  • You need a key to access the API, which you can request online and write into your .env file as a new line reading FMA_KEY=MYPERSONALKEY.
  • Requests take some hunderd milliseconds to complete.
In [1]:
import os
import IPython.display as ipd
import utils
In [2]:
fma = utils.FreeMusicArchive(os.environ.get('FMA_KEY'))

1 Get recently added tracks

  • track_id are assigned in monotonically increasing order.
  • Tracks can be removed, so that number does not indicate the number of available tracks.
In [3]:
for track_id, artist_name, date_created in zip(*fma.get_recent_tracks()):
    print(track_id, date_created, artist_name)
156413 4/25/2017 11:28:59 AM Krestovsky
156409 4/25/2017 03:05:50 PM Parvus Decree
156408 4/25/2017 03:05:50 PM Parvus Decree
156407 4/25/2017 02:33:27 PM Parvus Decree
156406 4/25/2017 02:33:26 PM Parvus Decree
156405 4/25/2017 02:30:13 PM Surfer Blood
156404 4/25/2017 02:30:13 PM Surfer Blood
156403 4/25/2017 02:30:12 PM Surfer Blood
156402 4/25/2017 02:30:12 PM Surfer Blood
156401 4/25/2017 02:30:11 PM Surfer Blood
156400 4/25/2017 02:30:10 PM Surfer Blood
156399 4/25/2017 01:37:34 PM Jared C. Balogh
156398 4/25/2017 01:37:34 PM Jared C. Balogh
156397 4/25/2017 01:37:33 PM Jared C. Balogh
156396 4/25/2017 01:37:32 PM Jared C. Balogh
156395 4/25/2017 01:37:32 PM Jared C. Balogh
156394 4/25/2017 01:37:31 PM Jared C. Balogh
156393 4/25/2017 01:37:30 PM Jared C. Balogh
156392 4/25/2017 01:37:30 PM Jared C. Balogh
156391 4/25/2017 01:37:29 PM Jared C. Balogh

2 Get metadata about tracks, albums and artists

Given IDs, we can get information about tracks, albums and artists. See the available fields in the API documentation.

In [4]:
fma.get_track(track_id=2, fields=['track_title', 'track_date_created',
                                  'track_duration', 'track_bit_rate',
                                  'track_listens', 'track_interest', 'track_comments', 'track_favorites',
                                  'artist_id', 'album_id'])
Out[4]:
{'album_id': '1',
 'artist_id': '1',
 'track_bit_rate': '256000',
 'track_comments': '0',
 'track_date_created': '11/26/2008 01:44:43 AM',
 'track_duration': '02:48',
 'track_favorites': '2',
 'track_interest': '4668',
 'track_listens': '1304',
 'track_title': 'Food'}
In [5]:
fma.get_track_genres(track_id=20)
Out[5]:
(['76', '103'], ['Experimental Pop', 'Singer-Songwriter'])
In [6]:
fma.get_album(album_id=1, fields=['album_title', 'album_tracks',
                                  'album_listens', 'album_comments', 'album_favorites',
                                  'album_date_created', 'album_date_released'])
Out[6]:
{'album_comments': '0',
 'album_date_created': '11/26/2008 01:44:41 AM',
 'album_date_released': '1/05/2009',
 'album_favorites': '4',
 'album_listens': '6101',
 'album_title': 'AWOL - A Way Of Life',
 'album_tracks': '7'}
In [7]:
fma.get_artist(artist_id=1, fields=['artist_name', 'artist_location',
                                    'artist_comments', 'artist_favorites'])
Out[7]:
{'artist_comments': '0',
 'artist_favorites': '9',
 'artist_location': 'New Jersey',
 'artist_name': 'AWOL'}

3 Get data, i.e. raw audio

We can download the original audio as well. Tracks are provided by the archive as MP3 with various bit and sample rates.

In [8]:
track_file = fma.get_track(2, 'track_file')
fma.download_track(track_file, path='track.mp3')

4 Get genres

Instead of compiling the genres of each track, we can get all the genres present on the archive with some API calls.

In [9]:
genres = fma.get_all_genres()
print('{} genres'.format(genres.shape[0]))
genres[10:25]
164 genres
Out[9]:
genre_parent_id genre_title genre_handle genre_color
genre_id
11 14 Disco Disco #E40089
12 None Rock Rock #840000
13 126 Easy Listening Easy_Listening #5B747C
14 None Soul-RnB Soul-RB #330033
15 None Electronic Electronic #FF6600
16 6 Sound Effects Sound_Effects #003366
17 None Folk Folk #5E6D3F
18 1235 Soundtrack Soundtrack #669933
19 14 Funk Funk #5E6D3F
20 None Spoken Spoken #006699
21 None Hip-Hop Hip-Hop #CC0000
22 38 Audio Collage Audio_Collage #dddd00
25 12 Punk Punk #840000
26 12 Post-Rock Post-Rock #840000
27 12 Lo-Fi Lo-fi #840000

And look for genres related to Rock.

In [10]:
genres[['Rock' in title for title in genres['genre_title']]]
Out[10]:
genre_parent_id genre_title genre_handle genre_color
genre_id
12 None Rock Rock #840000
26 12 Post-Rock Post-Rock #840000
45 12 Loud-Rock Loud-Rock #666666
53 45 Noise-Rock Noise-Rock #666666
58 12 Psych-Rock Psych-Rock #840000
66 12 Indie-Rock Indie-Rock #840000
113 26 Space-Rock Space-Rock #840000
169 9 Rockabilly Rockabilly #663366
440 12 Rock Opera Rock_Opera #840000
In [11]:
genres[genres['genre_parent_id'] == '12']
Out[11]:
genre_parent_id genre_title genre_handle genre_color
genre_id
25 12 Punk Punk #840000
26 12 Post-Rock Post-Rock #840000
27 12 Lo-Fi Lo-fi #840000
31 12 Metal Metal #777777
36 12 Krautrock Krautrock #840000
45 12 Loud-Rock Loud-Rock #666666
58 12 Psych-Rock Psych-Rock #840000
66 12 Indie-Rock Indie-Rock #840000
70 12 Industrial Industrial #8400FF
85 12 Garage Garage #840000
88 12 New Wave New_Wave #840000
98 12 Progressive Progressive #840000
314 12 Goth Goth #840000
359 12 Shoegaze Shoegaze #840000
440 12 Rock Opera Rock_Opera #840000