Spotify is one of the digital services that helped me to tide over the somber of Covid-19 lockdown. Therefore, I chose to explore its data in my first attempt to write code in python. After a bunch of trials and errors, I managed to extract songs and its features from spotify through the below codes in Jupyter lab notebook. Web reference to these codes are at the end.

In [ ]:
Part 1- The first step to extract data from spotify is to set up client cerdintials using spotify's API key
In [29]:
# I found the API key from the spotify's developers website
# https://developer.spotify.com/dashboard/login
# PS- The client id expires after an hour of extraction

import requests

CLIENT_ID = 'add the client id from the website'
CLIENT_SECRET = 'add the client secret from the website'

AUTH_URL = 'https://accounts.spotify.com/api/token'

auth_response = requests.post(AUTH_URL, {
    'grant_type': 'client_credentials',
    'client_id': CLIENT_ID,
    'client_secret': CLIENT_SECRET,})

# convert the response to JSON
auth_response_data = auth_response.json()

# save the access token
access_token = auth_response_data['access_token']

headers = {
    'Authorization': 'Bearer {token}'.format(token=access_token)
}

# base URL of all Spotify API endpoints
BASE_URL = 'https://api.spotify.com/v1/'
In [ ]:
Part 2-  Below code allowed me to extract features of the song that topped the global charts 
In [33]:
# Dakiti song was streamed the most globally in the first week of Jan 2021 
# Refer- https://spotifycharts.com/regional/global/weekly/2021-01-01--2021-01-08
track_id = '4MzXwWMhyBbmu6hOcLVD49?si=7d86fb3ca8fe410a' 

# actual GET request with proper header
r = requests.get(BASE_URL + 'audio-features/' + track_id, headers=headers)

# description of the result- https://developer.spotify.com/documentation/web-api/reference/#endpoint-get-audio-features

Part 3- Below code allowed me to extract the features of the songs from the albums of an artist. I chose one of the artists of Dakiti.

In [40]:
for album in d['items']:
    print(album['name'], ' --- ', album['release_date'])
EL ÚLTIMO TOUR DEL MUNDO  ---  2020-11-27
LAS QUE NO IBAN A SALIR  ---  2020-05-10
YHLQMDLG  ---  2020-02-28
OASIS  ---  2019-06-28
X 100PRE  ---  2018-12-23
In [50]:
data = []   # will hold all track info
albums = [] # to keep track of duplicates

# loop over albums and get all tracks
for album in d['items']:
    album_name = album['name']
    
# here's a hacky way to skip over albums we've already grabbed
    trim_name = album_name.split('(')[0].strip()

    # this takes a few seconds so let's keep track of progress    
    print(album_name)
    
    # pull all tracks from this album
    r = requests.get(BASE_URL + 'albums/' + album['id'] + '/tracks', 
        headers=headers)
    tracks = r.json()['items']
    
    for track in tracks:
        # get audio features (key, liveness, danceability, ...)
        f = requests.get(BASE_URL + 'audio-features/' + track['id'], 
            headers=headers)
        f = f.json()
        
        # combine with album info
        f.update({
            'track_name': track['name'],
            'album_name': album_name,
            'short_album_name': trim_name,
            'release_date': album['release_date'],
            'album_id': album['id']
        })
        
        data.append(f)
EL ÚLTIMO TOUR DEL MUNDO
LAS QUE NO IBAN A SALIR
YHLQMDLG
OASIS
X 100PRE
In [51]:
#create data frame of songs in the artist's spotify album

import pandas as pd
df = pd.DataFrame(data)
In [71]:
df.head(5)
Out[71]:
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence ... uri track_href analysis_url duration_ms time_signature track_name album_name short_album_name release_date album_id
0 0.716 0.522 5 -6.834 1 0.0582 0.1660 0.000065 0.1130 0.224 ... spotify:track:36DHxTW2xdr9GG15T9oK9L https://api.spotify.com/v1/tracks/36DHxTW2xdr9... https://api.spotify.com/v1/audio-analysis/36DH... 165199 4 EL MUNDO ES MÍO EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
1 0.811 0.637 10 -4.835 0 0.0591 0.2340 0.000572 0.1180 0.471 ... spotify:track:5RubKOuDoPn5Kj5TLVxSxY https://api.spotify.com/v1/tracks/5RubKOuDoPn5... https://api.spotify.com/v1/audio-analysis/5Rub... 130014 4 TE MUDASTE EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
2 0.860 0.725 11 -6.700 1 0.2490 0.0464 0.000091 0.0994 0.375 ... spotify:track:0tjZv2hChdHZCW1zFXpy1J https://api.spotify.com/v1/tracks/0tjZv2hChdHZ... https://api.spotify.com/v1/audio-analysis/0tjZ... 162151 4 HOY COBRÉ EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
3 0.762 0.861 4 -4.075 0 0.0652 0.1390 0.000001 0.0956 0.588 ... spotify:track:0Lsis3LB0XAK6XlTHXaJk2 https://api.spotify.com/v1/tracks/0Lsis3LB0XAK... https://api.spotify.com/v1/audio-analysis/0Lsi... 213609 4 MALDITA POBREZA EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
4 0.856 0.618 7 -4.892 1 0.2860 0.0303 0.000000 0.0866 0.391 ... spotify:track:2XIc1pqjXV3Cr2BQUGNBck https://api.spotify.com/v1/tracks/2XIc1pqjXV3C... https://api.spotify.com/v1/audio-analysis/2XIc... 203201 4 LA NOCHE DE ANOCHE EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW

5 rows × 23 columns

In [72]:
# Display the first rows and the last rows of the dataframe

df.head(-1)
Out[72]:
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence ... uri track_href analysis_url duration_ms time_signature track_name album_name short_album_name release_date album_id
0 0.716 0.522 5 -6.834 1 0.0582 0.1660 0.000065 0.1130 0.224 ... spotify:track:36DHxTW2xdr9GG15T9oK9L https://api.spotify.com/v1/tracks/36DHxTW2xdr9... https://api.spotify.com/v1/audio-analysis/36DH... 165199 4 EL MUNDO ES MÍO EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
1 0.811 0.637 10 -4.835 0 0.0591 0.2340 0.000572 0.1180 0.471 ... spotify:track:5RubKOuDoPn5Kj5TLVxSxY https://api.spotify.com/v1/tracks/5RubKOuDoPn5... https://api.spotify.com/v1/audio-analysis/5Rub... 130014 4 TE MUDASTE EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
2 0.860 0.725 11 -6.700 1 0.2490 0.0464 0.000091 0.0994 0.375 ... spotify:track:0tjZv2hChdHZCW1zFXpy1J https://api.spotify.com/v1/tracks/0tjZv2hChdHZ... https://api.spotify.com/v1/audio-analysis/0tjZ... 162151 4 HOY COBRÉ EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
3 0.762 0.861 4 -4.075 0 0.0652 0.1390 0.000001 0.0956 0.588 ... spotify:track:0Lsis3LB0XAK6XlTHXaJk2 https://api.spotify.com/v1/tracks/0Lsis3LB0XAK... https://api.spotify.com/v1/audio-analysis/0Lsi... 213609 4 MALDITA POBREZA EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
4 0.856 0.618 7 -4.892 1 0.2860 0.0303 0.000000 0.0866 0.391 ... spotify:track:2XIc1pqjXV3Cr2BQUGNBck https://api.spotify.com/v1/tracks/2XIc1pqjXV3C... https://api.spotify.com/v1/audio-analysis/2XIc... 203201 4 LA NOCHE DE ANOCHE EL ÚLTIMO TOUR DEL MUNDO EL ÚLTIMO TOUR DEL MUNDO 2020-11-27 2d9BCZeAAhiZWPpbX9aPCW
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
63 0.787 0.705 0 -7.582 1 0.0695 0.1370 0.000001 0.1080 0.499 ... spotify:track:5mj8WVFcKdGA8p9HWGTSLc https://api.spotify.com/v1/tracks/5mj8WVFcKdGA... https://api.spotify.com/v1/audio-analysis/5mj8... 188654 4 Cuando Perriabas X 100PRE X 100PRE 2018-12-23 7CjJb2mikwAWA1V6kewFBF
64 0.655 0.725 0 -5.497 1 0.1880 0.0327 0.002640 0.0611 0.326 ... spotify:track:1khmgu0pveJbkbpbkyvcQv https://api.spotify.com/v1/tracks/1khmgu0pveJb... https://api.spotify.com/v1/audio-analysis/1khm... 300579 4 La Romana X 100PRE X 100PRE 2018-12-23 7CjJb2mikwAWA1V6kewFBF
65 0.767 0.379 0 -10.348 1 0.0385 0.6680 0.000145 0.2170 0.252 ... spotify:track:69ZaPBHhRMRDjRpW1ivnOU https://api.spotify.com/v1/tracks/69ZaPBHhRMRD... https://api.spotify.com/v1/audio-analysis/69Za... 230578 4 Como Antes X 100PRE X 100PRE 2018-12-23 7CjJb2mikwAWA1V6kewFBF
66 0.600 0.528 0 -6.554 1 0.0308 0.2630 0.000000 0.5880 0.142 ... spotify:track:6pZHZndlo57dPCYnvlYFOE https://api.spotify.com/v1/tracks/6pZHZndlo57d... https://api.spotify.com/v1/audio-analysis/6pZH... 284853 4 RLNDT X 100PRE X 100PRE 2018-12-23 7CjJb2mikwAWA1V6kewFBF
67 0.759 0.536 9 -6.663 0 0.1730 0.8210 0.000005 0.1070 0.439 ... spotify:track:2OWVCFTolecLiGZPquvWvT https://api.spotify.com/v1/tracks/2OWVCFTolecL... https://api.spotify.com/v1/audio-analysis/2OWV... 208080 4 Estamos Bien X 100PRE X 100PRE 2018-12-23 7CjJb2mikwAWA1V6kewFBF

68 rows × 23 columns

In [65]:
# Data types of columns
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69 entries, 0 to 68
Data columns (total 23 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   danceability      69 non-null     float64
 1   energy            69 non-null     float64
 2   key               69 non-null     int64  
 3   loudness          69 non-null     float64
 4   mode              69 non-null     int64  
 5   speechiness       69 non-null     float64
 6   acousticness      69 non-null     float64
 7   instrumentalness  69 non-null     float64
 8   liveness          69 non-null     float64
 9   valence           69 non-null     float64
 10  tempo             69 non-null     float64
 11  type              69 non-null     object 
 12  id                69 non-null     object 
 13  uri               69 non-null     object 
 14  track_href        69 non-null     object 
 15  analysis_url      69 non-null     object 
 16  duration_ms       69 non-null     int64  
 17  time_signature    69 non-null     int64  
 18  track_name        69 non-null     object 
 19  album_name        69 non-null     object 
 20  short_album_name  69 non-null     object 
 21  release_date      69 non-null     object 
 22  album_id          69 non-null     object 
dtypes: float64(9), int64(4), object(10)
memory usage: 12.5+ KB
None
In [85]:
# summary of the data frame
df.describe(include='all')  
Out[85]:
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence ... uri track_href analysis_url duration_ms time_signature track_name album_name short_album_name release_date album_id
count 69.000000 69.000000 69.000000 69.000000 69.000000 69.000000 69.000000 69.000000 69.000000 69.000000 ... 69 69 69 69.000000 69.0 69 69 69 69 69
unique NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 69 69 69 NaN NaN 69 5 5 5 5
top NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... spotify:track:5RubKOuDoPn5Kj5TLVxSxY https://api.spotify.com/v1/tracks/4UEuIEv9Wc3w... https://api.spotify.com/v1/audio-analysis/53v2... NaN NaN Ser Bichote YHLQMDLG YHLQMDLG 2020-02-28 5lJqux7orBlA1QzyiBGti1
freq NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1 1 1 NaN NaN 1 20 20 20 20
mean 0.744130 0.663928 4.840580 -6.068261 0.550725 0.116468 0.209574 0.000281 0.156283 0.514128 ... NaN NaN NaN 198843.391304 4.0 NaN NaN NaN NaN NaN
std 0.106835 0.122930 3.632465 1.713921 0.501065 0.093154 0.210900 0.001258 0.112964 0.240325 ... NaN NaN NaN 39238.177695 0.0 NaN NaN NaN NaN NaN
min 0.430000 0.379000 0.000000 -10.805000 0.000000 0.028100 0.010300 0.000000 0.061100 0.050800 ... NaN NaN NaN 130014.000000 4.0 NaN NaN NaN NaN NaN
25% 0.683000 0.580000 1.000000 -7.125000 0.000000 0.058200 0.058900 0.000000 0.098000 0.326000 ... NaN NaN NaN 165199.000000 4.0 NaN NaN NaN NaN NaN
50% 0.762000 0.656000 5.000000 -5.749000 1.000000 0.077200 0.139000 0.000005 0.108000 0.514000 ... NaN NaN NaN 196500.000000 4.0 NaN NaN NaN NaN NaN
75% 0.826000 0.764000 7.000000 -4.835000 1.000000 0.131000 0.287000 0.000065 0.153000 0.685000 ... NaN NaN NaN 224512.000000 4.0 NaN NaN NaN NaN NaN
max 0.900000 0.881000 11.000000 -2.979000 1.000000 0.402000 0.869000 0.009910 0.659000 0.962000 ... NaN NaN NaN 300579.000000 4.0 NaN NaN NaN NaN NaN

11 rows × 23 columns

In [ ]:
# save the dataframe into csv
df.to_csv("spotify_music.csv")
In [92]:
print(df.max()['danceability'])
0.9
In [95]:
# find the row of max value of danceability

df.loc[df['danceability'].idxmax()]
Out[95]:
danceability                                                      0.9
energy                                                          0.603
key                                                                 2
loudness                                                       -5.313
mode                                                                1
speechiness                                                    0.0646
acousticness                                                    0.402
instrumentalness                                             0.000005
liveness                                                        0.134
valence                                                         0.824
tempo                                                         129.928
type                                                   audio_features
id                                             41wtwzCZkXwpnakmwJ239F
uri                              spotify:track:41wtwzCZkXwpnakmwJ239F
track_href          https://api.spotify.com/v1/tracks/41wtwzCZkXwp...
analysis_url        https://api.spotify.com/v1/audio-analysis/41wt...
duration_ms                                                    170972
time_signature                                                      4
track_name                                           Si Veo a Tu Mamá
album_name                                                   YHLQMDLG
short_album_name                                             YHLQMDLG
release_date                                               2020-02-28
album_id                                       5lJqux7orBlA1QzyiBGti1
Name: 26, dtype: object
In [110]:
# create a dataframe of row with maximum danceability 

df2= df.loc[df['danceability'].idxmax()]
print (df2)
danceability                                                      0.9
energy                                                          0.603
key                                                                 2
loudness                                                       -5.313
mode                                                                1
speechiness                                                    0.0646
acousticness                                                    0.402
instrumentalness                                             0.000005
liveness                                                        0.134
valence                                                         0.824
tempo                                                         129.928
type                                                   audio_features
id                                             41wtwzCZkXwpnakmwJ239F
uri                              spotify:track:41wtwzCZkXwpnakmwJ239F
track_href          https://api.spotify.com/v1/tracks/41wtwzCZkXwp...
analysis_url        https://api.spotify.com/v1/audio-analysis/41wt...
duration_ms                                                    170972
time_signature                                                      4
track_name                                           Si Veo a Tu Mamá
album_name                                                   YHLQMDLG
short_album_name                                             YHLQMDLG
release_date                                               2020-02-28
album_id                                       5lJqux7orBlA1QzyiBGti1
Name: 26, dtype: object
In [122]:
df2 == r
Out[122]:
danceability        False
energy              False
key                 False
loudness            False
mode                False
speechiness         False
acousticness        False
instrumentalness    False
liveness            False
valence             False
tempo               False
type                False
id                  False
uri                 False
track_href          False
analysis_url        False
duration_ms         False
time_signature      False
track_name          False
album_name          False
short_album_name    False
release_date        False
album_id            False
Name: 26, dtype: bool
In [ ]:
Refrence of the codes-
Steven Morse's blog-- https://stmorse.github.io/journal/spotify-api.html
Ujaval Gandhi's blog-- https://spatialthoughts.com/courses/python-foundation-for-spatial-analysis/