Sometimes, an unsupervised learning technique is preferred. Perhaps you do not have access to adequate training data, or perhaps the training data's labels are not completely clear. Maybe you just want to quickly sort real-world, unseen, data into groups based on its feature similarity.
In such cases, clustering is a great option!
Let's try clustering with a familiar bunch of audio files and code.
Download simpleLoop.wav
, and play it.
from urllib import urlretrieve
urlretrieve('https://ccrma.stanford.edu/workshops/mir2014/audio/simpleLoop.wav', filename='simpleLoop.wav')
from IPython.display import Audio
Audio('simpleLoop.wav')
Load the audio file into an array.
from essentia.standard import MonoLoader
simple_loop = MonoLoader(filename='simpleLoop.wav')()
Scale the features (using the scale function) from -1 to 1. (See Lab 2 if you need a reminder.)
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler(feature_range=(-1, 1))
mfccs_scaled = min_max_scaler.fit_transform(mfccs)
print mfccs.shape
print mfccs_scaled.shape
(260, 13) (260, 13)
It's cluster time! We're using NETLAB's implementation of the kmeans algorithm.
Use the kmeans algorithm to create clusters of your feature. kMeans will output 2 things of interest to you:
(1) The center-points of clusters. You can use the coordinates of the center of the cluster to measure the distance of any point from the center. This not only provides you with a distance metric of how "good" a point fits into a given cluster, but this allows you to sort by the points which are closest to the center of a given frame! Quite useful.
(2) Each point will be assigned a label, or cluster #. You can then use this label to produce a transcription, do creative stuff, or further train another downstream classifier.
Attention: There are 2 functions called kmeans - one from the CATBox and another from Netlab. You should be using the one from Netlab. Verify that you are by typing which kmeans in your command line to verify...
Here's the help function for kmeans:
help kmeans
KMEANS Trains a k means cluster model.
Description CENTRES = KMEANS(CENTRES, DATA, OPTIONS) uses the batch K-means algorithm to set the centres of a cluster model. The matrix DATA represents the data which is being clustered, with each row corresponding to a vector. The sum of squares error function is used. The point at which a local minimum is achieved is returned as CENTRES. The error value at that point is returned in OPTIONS(8).
[CENTRES, OPTIONS, POST, ERRLOG] = KMEANS(CENTRES, DATA, OPTIONS) also returns the cluster number (in a one-of-N encoding) for each data point in POST and a log of the error values after each cycle in ERRLOG. The optional parameters have the following interpretations.
OPTIONS(1) is set to 1 to display error values; also logs error values in the return argument ERRLOG. If OPTIONS(1) is set to 0, then only warning messages are displayed. If OPTIONS(1) is -1, then nothing is displayed.
OPTIONS(2) is a measure of the absolute precision required for the value of CENTRES at the solution. If the absolute difference between the values of CENTRES between two successive steps is less than OPTIONS(2), then this condition is satisfied.
OPTIONS(3) is a measure of the precision required of the error function at the solution. If the absolute difference between the error functions between two successive steps is less than OPTIONS(3), then this condition is satisfied. Both this and the previous condition must be satisfied for termination.
OPTIONS(14) is the maximum number of iterations; default 100.
Now, simply put, here are some examples of how you use it:
% Initialize # of clusters that you want to find and their initial conditions.
numCenters = 2; % the size of the initial centers; this is passed to k-means to determine the value of k.
numFeatures = 2; % replace the "2" with however many features you have extracted
centers = zeros(numCenters , numFeatures ); % inits center points to 0
% setup vector of options for kmeans trainer
options(1) = 1;
options(5) = 1;
options(14) = 50; % num of steps to wait for convergence
% train centers from data
[centers,options,post] = kmeans(centers , your_feature_data_matrix , options);
%Output:
% Centers contains the center coordinates of the clusters - we can use this to calculate the distance for each point
in the distance to the cluster center.
% Post contains the assigned cluster number for each point in your feature matrix. (from 1 to k)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2)
labels = kmeans.fit_predict(mfccs_scaled)
mfccs.shape
print labels
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Write a script to list which audio slices (or audio files) were categorized as Cluster # 1. Do the same or Cluster # 2. Do the clusters make sense? Now, modify the script to play the audio slices that in each cluster - listening to the clusters will help us build intuition of what's in each cluster.
Repeat this clustering (steps 3-7), and listening to the contents of the clusters with CongaGroove-mono.wav.
Repeat this clustering (steps 3-7) using the CongaGroove and 3 clusters. Listen to the results. Try again with 4 clusters. Listen to the results. (etc, etc…)
Once you complete this, try out some of the many, many other audio loops in the audio loops. (Located In audio\Miscellaneous Loops Samples and SFX)
Let's add MFCCs to the mix. Extract the mean of the 12 MFCCs (coefficients 1-12, do not use the "0th" coefficient) for each onset using the code that you wrote. Add those to the feature vectors, along with zero crossing and centroid. We should now have 14 features being extracted - this is started to get "real world"! With this simple example (and limited collection of audio slices, you probably won't notice a difference - but at least it didn't break, right?) Let's try it with the some other audio to truly appreciate the power of timbral clustering.
BONUS (ONLY IF YOU HAVE EXTRA TIME…)
Now that we can take ANY LOOP, onset detect, feature extract, and cluster it, let's have some fun.
Choose any audio file from our collection and use the above techniques break it up into clusters.
Listen to those clusters.
Some rules of thumb: since you need to pick the number of clusters ahead of time, listen to your audio files first.
You can break a drum kit or percussion loop into 3 - 6 clusters for it to segment well. More is OK too.
Musical loops: 3-6 clusters should work nicely.
Songs - lots of clusters for them to segment well. Try 'em out!
BONUS (ONLY IF YOU REALLY HAVE EXTRA TIME…)
Review your script that PLAYs all of the audio files that were categorized as Cluster # 1 or Cluster # 2.
Now, modify your script to play and plot the audio files which are closest to the center of your clusters.
This hopefully provides you with which files are representative of your cluster.