Spotify is one of the digital services that helped me to tide over the somber of Covid-19 lockdown. Therefore, I chose to explore its data in my first attempt to write code in python. After a bunch of trials and errors, I managed to extract songs and its features from spotify through the below codes in Jupyter lab notebook. Web reference to these codes are at the end.
Part 1- The first step to extract data from spotify is to set up client cerdintials using spotify's API key
# I found the API key from the spotify's developers website
# https://developer.spotify.com/dashboard/login
# PS- The client id expires after an hour of extraction
import requests
CLIENT_ID = 'add the client id from the website'
CLIENT_SECRET = 'add the client secret from the website'
AUTH_URL = 'https://accounts.spotify.com/api/token'
auth_response = requests.post(AUTH_URL, {
'grant_type': 'client_credentials',
'client_id': CLIENT_ID,
'client_secret': CLIENT_SECRET,})
# convert the response to JSON
auth_response_data = auth_response.json()
# save the access token
access_token = auth_response_data['access_token']
headers = {
'Authorization': 'Bearer {token}'.format(token=access_token)
}
# base URL of all Spotify API endpoints
BASE_URL = 'https://api.spotify.com/v1/'
Part 2- Below code allowed me to extract features of the song that topped the global charts
# Dakiti song was streamed the most globally in the first week of Jan 2021
# Refer- https://spotifycharts.com/regional/global/weekly/2021-01-01--2021-01-08
track_id = '4MzXwWMhyBbmu6hOcLVD49?si=7d86fb3ca8fe410a'
# actual GET request with proper header
r = requests.get(BASE_URL + 'audio-features/' + track_id, headers=headers)
# description of the result- https://developer.spotify.com/documentation/web-api/reference/#endpoint-get-audio-features
Part 3- Below code allowed me to extract the features of the songs from the albums of an artist. I chose one of the artists of Dakiti.
for album in d['items']:
print(album['name'], ' --- ', album['release_date'])
EL ÚLTIMO TOUR DEL MUNDO --- 2020-11-27 LAS QUE NO IBAN A SALIR --- 2020-05-10 YHLQMDLG --- 2020-02-28 OASIS --- 2019-06-28 X 100PRE --- 2018-12-23
data = [] # will hold all track info
albums = [] # to keep track of duplicates
# loop over albums and get all tracks
for album in d['items']:
album_name = album['name']
# here's a hacky way to skip over albums we've already grabbed
trim_name = album_name.split('(')[0].strip()
# this takes a few seconds so let's keep track of progress
print(album_name)
# pull all tracks from this album
r = requests.get(BASE_URL + 'albums/' + album['id'] + '/tracks',
headers=headers)
tracks = r.json()['items']
for track in tracks:
# get audio features (key, liveness, danceability, ...)
f = requests.get(BASE_URL + 'audio-features/' + track['id'],
headers=headers)
f = f.json()
# combine with album info
f.update({
'track_name': track['name'],
'album_name': album_name,
'short_album_name': trim_name,
'release_date': album['release_date'],
'album_id': album['id']
})
data.append(f)
EL ÚLTIMO TOUR DEL MUNDO LAS QUE NO IBAN A SALIR YHLQMDLG OASIS X 100PRE
#create data frame of songs in the artist's spotify album
import pandas as pd
df = pd.DataFrame(data)
df.head(5)
danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | ... | uri | track_href | analysis_url | duration_ms | time_signature | track_name | album_name | short_album_name | release_date | album_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.716 | 0.522 | 5 | -6.834 | 1 | 0.0582 | 0.1660 | 0.000065 | 0.1130 | 0.224 | ... | spotify:track:36DHxTW2xdr9GG15T9oK9L | https://api.spotify.com/v1/tracks/36DHxTW2xdr9... | https://api.spotify.com/v1/audio-analysis/36DH... | 165199 | 4 | EL MUNDO ES MÍO | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
1 | 0.811 | 0.637 | 10 | -4.835 | 0 | 0.0591 | 0.2340 | 0.000572 | 0.1180 | 0.471 | ... | spotify:track:5RubKOuDoPn5Kj5TLVxSxY | https://api.spotify.com/v1/tracks/5RubKOuDoPn5... | https://api.spotify.com/v1/audio-analysis/5Rub... | 130014 | 4 | TE MUDASTE | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
2 | 0.860 | 0.725 | 11 | -6.700 | 1 | 0.2490 | 0.0464 | 0.000091 | 0.0994 | 0.375 | ... | spotify:track:0tjZv2hChdHZCW1zFXpy1J | https://api.spotify.com/v1/tracks/0tjZv2hChdHZ... | https://api.spotify.com/v1/audio-analysis/0tjZ... | 162151 | 4 | HOY COBRÉ | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
3 | 0.762 | 0.861 | 4 | -4.075 | 0 | 0.0652 | 0.1390 | 0.000001 | 0.0956 | 0.588 | ... | spotify:track:0Lsis3LB0XAK6XlTHXaJk2 | https://api.spotify.com/v1/tracks/0Lsis3LB0XAK... | https://api.spotify.com/v1/audio-analysis/0Lsi... | 213609 | 4 | MALDITA POBREZA | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
4 | 0.856 | 0.618 | 7 | -4.892 | 1 | 0.2860 | 0.0303 | 0.000000 | 0.0866 | 0.391 | ... | spotify:track:2XIc1pqjXV3Cr2BQUGNBck | https://api.spotify.com/v1/tracks/2XIc1pqjXV3C... | https://api.spotify.com/v1/audio-analysis/2XIc... | 203201 | 4 | LA NOCHE DE ANOCHE | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
5 rows × 23 columns
# Display the first rows and the last rows of the dataframe
df.head(-1)
danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | ... | uri | track_href | analysis_url | duration_ms | time_signature | track_name | album_name | short_album_name | release_date | album_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.716 | 0.522 | 5 | -6.834 | 1 | 0.0582 | 0.1660 | 0.000065 | 0.1130 | 0.224 | ... | spotify:track:36DHxTW2xdr9GG15T9oK9L | https://api.spotify.com/v1/tracks/36DHxTW2xdr9... | https://api.spotify.com/v1/audio-analysis/36DH... | 165199 | 4 | EL MUNDO ES MÍO | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
1 | 0.811 | 0.637 | 10 | -4.835 | 0 | 0.0591 | 0.2340 | 0.000572 | 0.1180 | 0.471 | ... | spotify:track:5RubKOuDoPn5Kj5TLVxSxY | https://api.spotify.com/v1/tracks/5RubKOuDoPn5... | https://api.spotify.com/v1/audio-analysis/5Rub... | 130014 | 4 | TE MUDASTE | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
2 | 0.860 | 0.725 | 11 | -6.700 | 1 | 0.2490 | 0.0464 | 0.000091 | 0.0994 | 0.375 | ... | spotify:track:0tjZv2hChdHZCW1zFXpy1J | https://api.spotify.com/v1/tracks/0tjZv2hChdHZ... | https://api.spotify.com/v1/audio-analysis/0tjZ... | 162151 | 4 | HOY COBRÉ | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
3 | 0.762 | 0.861 | 4 | -4.075 | 0 | 0.0652 | 0.1390 | 0.000001 | 0.0956 | 0.588 | ... | spotify:track:0Lsis3LB0XAK6XlTHXaJk2 | https://api.spotify.com/v1/tracks/0Lsis3LB0XAK... | https://api.spotify.com/v1/audio-analysis/0Lsi... | 213609 | 4 | MALDITA POBREZA | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
4 | 0.856 | 0.618 | 7 | -4.892 | 1 | 0.2860 | 0.0303 | 0.000000 | 0.0866 | 0.391 | ... | spotify:track:2XIc1pqjXV3Cr2BQUGNBck | https://api.spotify.com/v1/tracks/2XIc1pqjXV3C... | https://api.spotify.com/v1/audio-analysis/2XIc... | 203201 | 4 | LA NOCHE DE ANOCHE | EL ÚLTIMO TOUR DEL MUNDO | EL ÚLTIMO TOUR DEL MUNDO | 2020-11-27 | 2d9BCZeAAhiZWPpbX9aPCW |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
63 | 0.787 | 0.705 | 0 | -7.582 | 1 | 0.0695 | 0.1370 | 0.000001 | 0.1080 | 0.499 | ... | spotify:track:5mj8WVFcKdGA8p9HWGTSLc | https://api.spotify.com/v1/tracks/5mj8WVFcKdGA... | https://api.spotify.com/v1/audio-analysis/5mj8... | 188654 | 4 | Cuando Perriabas | X 100PRE | X 100PRE | 2018-12-23 | 7CjJb2mikwAWA1V6kewFBF |
64 | 0.655 | 0.725 | 0 | -5.497 | 1 | 0.1880 | 0.0327 | 0.002640 | 0.0611 | 0.326 | ... | spotify:track:1khmgu0pveJbkbpbkyvcQv | https://api.spotify.com/v1/tracks/1khmgu0pveJb... | https://api.spotify.com/v1/audio-analysis/1khm... | 300579 | 4 | La Romana | X 100PRE | X 100PRE | 2018-12-23 | 7CjJb2mikwAWA1V6kewFBF |
65 | 0.767 | 0.379 | 0 | -10.348 | 1 | 0.0385 | 0.6680 | 0.000145 | 0.2170 | 0.252 | ... | spotify:track:69ZaPBHhRMRDjRpW1ivnOU | https://api.spotify.com/v1/tracks/69ZaPBHhRMRD... | https://api.spotify.com/v1/audio-analysis/69Za... | 230578 | 4 | Como Antes | X 100PRE | X 100PRE | 2018-12-23 | 7CjJb2mikwAWA1V6kewFBF |
66 | 0.600 | 0.528 | 0 | -6.554 | 1 | 0.0308 | 0.2630 | 0.000000 | 0.5880 | 0.142 | ... | spotify:track:6pZHZndlo57dPCYnvlYFOE | https://api.spotify.com/v1/tracks/6pZHZndlo57d... | https://api.spotify.com/v1/audio-analysis/6pZH... | 284853 | 4 | RLNDT | X 100PRE | X 100PRE | 2018-12-23 | 7CjJb2mikwAWA1V6kewFBF |
67 | 0.759 | 0.536 | 9 | -6.663 | 0 | 0.1730 | 0.8210 | 0.000005 | 0.1070 | 0.439 | ... | spotify:track:2OWVCFTolecLiGZPquvWvT | https://api.spotify.com/v1/tracks/2OWVCFTolecL... | https://api.spotify.com/v1/audio-analysis/2OWV... | 208080 | 4 | Estamos Bien | X 100PRE | X 100PRE | 2018-12-23 | 7CjJb2mikwAWA1V6kewFBF |
68 rows × 23 columns
# Data types of columns
print(df.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 69 entries, 0 to 68 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 danceability 69 non-null float64 1 energy 69 non-null float64 2 key 69 non-null int64 3 loudness 69 non-null float64 4 mode 69 non-null int64 5 speechiness 69 non-null float64 6 acousticness 69 non-null float64 7 instrumentalness 69 non-null float64 8 liveness 69 non-null float64 9 valence 69 non-null float64 10 tempo 69 non-null float64 11 type 69 non-null object 12 id 69 non-null object 13 uri 69 non-null object 14 track_href 69 non-null object 15 analysis_url 69 non-null object 16 duration_ms 69 non-null int64 17 time_signature 69 non-null int64 18 track_name 69 non-null object 19 album_name 69 non-null object 20 short_album_name 69 non-null object 21 release_date 69 non-null object 22 album_id 69 non-null object dtypes: float64(9), int64(4), object(10) memory usage: 12.5+ KB None
# summary of the data frame
df.describe(include='all')
danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | ... | uri | track_href | analysis_url | duration_ms | time_signature | track_name | album_name | short_album_name | release_date | album_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 69.000000 | 69.000000 | 69.000000 | 69.000000 | 69.000000 | 69.000000 | 69.000000 | 69.000000 | 69.000000 | 69.000000 | ... | 69 | 69 | 69 | 69.000000 | 69.0 | 69 | 69 | 69 | 69 | 69 |
unique | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 69 | 69 | 69 | NaN | NaN | 69 | 5 | 5 | 5 | 5 |
top | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | spotify:track:5RubKOuDoPn5Kj5TLVxSxY | https://api.spotify.com/v1/tracks/4UEuIEv9Wc3w... | https://api.spotify.com/v1/audio-analysis/53v2... | NaN | NaN | Ser Bichote | YHLQMDLG | YHLQMDLG | 2020-02-28 | 5lJqux7orBlA1QzyiBGti1 |
freq | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1 | 1 | 1 | NaN | NaN | 1 | 20 | 20 | 20 | 20 |
mean | 0.744130 | 0.663928 | 4.840580 | -6.068261 | 0.550725 | 0.116468 | 0.209574 | 0.000281 | 0.156283 | 0.514128 | ... | NaN | NaN | NaN | 198843.391304 | 4.0 | NaN | NaN | NaN | NaN | NaN |
std | 0.106835 | 0.122930 | 3.632465 | 1.713921 | 0.501065 | 0.093154 | 0.210900 | 0.001258 | 0.112964 | 0.240325 | ... | NaN | NaN | NaN | 39238.177695 | 0.0 | NaN | NaN | NaN | NaN | NaN |
min | 0.430000 | 0.379000 | 0.000000 | -10.805000 | 0.000000 | 0.028100 | 0.010300 | 0.000000 | 0.061100 | 0.050800 | ... | NaN | NaN | NaN | 130014.000000 | 4.0 | NaN | NaN | NaN | NaN | NaN |
25% | 0.683000 | 0.580000 | 1.000000 | -7.125000 | 0.000000 | 0.058200 | 0.058900 | 0.000000 | 0.098000 | 0.326000 | ... | NaN | NaN | NaN | 165199.000000 | 4.0 | NaN | NaN | NaN | NaN | NaN |
50% | 0.762000 | 0.656000 | 5.000000 | -5.749000 | 1.000000 | 0.077200 | 0.139000 | 0.000005 | 0.108000 | 0.514000 | ... | NaN | NaN | NaN | 196500.000000 | 4.0 | NaN | NaN | NaN | NaN | NaN |
75% | 0.826000 | 0.764000 | 7.000000 | -4.835000 | 1.000000 | 0.131000 | 0.287000 | 0.000065 | 0.153000 | 0.685000 | ... | NaN | NaN | NaN | 224512.000000 | 4.0 | NaN | NaN | NaN | NaN | NaN |
max | 0.900000 | 0.881000 | 11.000000 | -2.979000 | 1.000000 | 0.402000 | 0.869000 | 0.009910 | 0.659000 | 0.962000 | ... | NaN | NaN | NaN | 300579.000000 | 4.0 | NaN | NaN | NaN | NaN | NaN |
11 rows × 23 columns
# save the dataframe into csv
df.to_csv("spotify_music.csv")
print(df.max()['danceability'])
0.9
# find the row of max value of danceability
df.loc[df['danceability'].idxmax()]
danceability 0.9 energy 0.603 key 2 loudness -5.313 mode 1 speechiness 0.0646 acousticness 0.402 instrumentalness 0.000005 liveness 0.134 valence 0.824 tempo 129.928 type audio_features id 41wtwzCZkXwpnakmwJ239F uri spotify:track:41wtwzCZkXwpnakmwJ239F track_href https://api.spotify.com/v1/tracks/41wtwzCZkXwp... analysis_url https://api.spotify.com/v1/audio-analysis/41wt... duration_ms 170972 time_signature 4 track_name Si Veo a Tu Mamá album_name YHLQMDLG short_album_name YHLQMDLG release_date 2020-02-28 album_id 5lJqux7orBlA1QzyiBGti1 Name: 26, dtype: object
# create a dataframe of row with maximum danceability
df2= df.loc[df['danceability'].idxmax()]
print (df2)
danceability 0.9 energy 0.603 key 2 loudness -5.313 mode 1 speechiness 0.0646 acousticness 0.402 instrumentalness 0.000005 liveness 0.134 valence 0.824 tempo 129.928 type audio_features id 41wtwzCZkXwpnakmwJ239F uri spotify:track:41wtwzCZkXwpnakmwJ239F track_href https://api.spotify.com/v1/tracks/41wtwzCZkXwp... analysis_url https://api.spotify.com/v1/audio-analysis/41wt... duration_ms 170972 time_signature 4 track_name Si Veo a Tu Mamá album_name YHLQMDLG short_album_name YHLQMDLG release_date 2020-02-28 album_id 5lJqux7orBlA1QzyiBGti1 Name: 26, dtype: object
df2 == r
danceability False energy False key False loudness False mode False speechiness False acousticness False instrumentalness False liveness False valence False tempo False type False id False uri False track_href False analysis_url False duration_ms False time_signature False track_name False album_name False short_album_name False release_date False album_id False Name: 26, dtype: bool
Refrence of the codes-
Steven Morse's blog-- https://stmorse.github.io/journal/spotify-api.html
Ujaval Gandhi's blog-- https://spatialthoughts.com/courses/python-foundation-for-spatial-analysis/