Genre recognition: experiment

Goal: Test influence of number of neighbors $K$.

Conclusion: Range 10-20 seems good. Decrease of 2% for 5 and 50.

Observations:

  • Features extraction is 10 times slower for $K=50$.
  • Sparsity decreased with increasing $K$ due to increasing Dirichlet energy.

Hyper-parameters

Parameter under test

In [1]:
Pname = 'K'
Pvalues = [6, 11, 21, 51]

# Regenerate the graph or the features at each iteration.
regen_graph = True
regen_features = True
regen_baseline = False

Model parameters

In [2]:
p = {}

# Preprocessing.

# Graph.
p['data_scaling_graph'] = 'features'
p['K'] = 10 + 1  # 5 to 10 + 1 for self-reference
p['dm'] = 'euclidean'
p['Csigma'] = 1
p['diag'] = True
p['laplacian'] = 'normalized'

# Feature extraction.
p['m'] = 128  # 64, 128, 512
p['ls'] = 1
p['ld'] = 10
p['le'] = None
p['lg'] = 100

# Classification.
p['scale'] = None
p['Nvectors'] = 6
p['svm_type'] = 'C'
p['kernel'] = 'linear'
p['C'] = 1
p['nu'] = 0.5
p['majority_voting'] = False

Data parameters

In [3]:
# HDF5 data stores.
p['folder'] = 'data'
p['filename_gtzan'] = 'gtzan.hdf5'
p['filename_audio'] = 'audio.hdf5'
p['filename_graph'] = 'graph.hdf5'
p['filename_features'] = 'features.hdf5'

# Dataset (10,100,644 | 5,100,149 | 2,10,644).
p['Ngenres'] = 5
p['Nclips'] = 100
p['Nframes'] = 149

# Added white noise.
p['noise_std'] = 0

Numerical parameters

In [4]:
# Graph.
p['tol'] = 1e-5

# Feature extraction.
p['rtol'] = 1e-5  # 1e-3, 1e-5, 1e-7
p['N_inner'] = 500
p['N_outer'] = 50

# Classification.
p['test_size'] = 0.1
p['Ncv'] = 20
p['dataset_classification'] = 'Z'

Processing

In [5]:
import numpy as np
import time

texperiment = time.time()

# Result dictionary.
res = ['accuracy', 'accuracy_std']
res += ['sparsity', 'atoms_D']
res += ['objective_g', 'objective_h', 'objective_i', 'objective_j']
res += ['time_features', 'iterations_inner', 'iterations_outer']
res = dict.fromkeys(res)
for key in res.keys():
    res[key] = []

def separator(name, parameter=False):
    if parameter:
        name += ', {} = {}'.format(Pname, p[Pname])
    dashes = 20 * '-'
    print('\n {} {} {} \n'.format(dashes, name, dashes))
    # Fair comparison when tuning parameters.
    # Randomnesses: dictionary initialization, training and testing sets.
    np.random.seed(1)
In [6]:
#%run gtzan.ipynb
#%run audio_preprocessing.ipynb
if not regen_graph:
    separator('Graph')
    %run audio_graph.ipynb
if not regen_features:
    separator('Features')
    %run audio_features.ipynb

# Hyper-parameter under test.
for p[Pname] in Pvalues:

    if regen_graph:
        separator('Graph', True)
        %run audio_graph.ipynb
    if regen_features:
        separator('Features', True)
        p['filename_features'] = 'features_{}_{}.hdf5'.format(Pname, p[Pname])
        %run audio_features.ipynb
    separator('Classification', True)
    %run audio_classification.ipynb
    
    # Collect results.
    for key in res:
        res[key].append(globals()[key])

# Baseline, i.e. classification with spectrograms.
p['dataset_classification'] = 'X'
p['scale'] = 'minmax'  # Todo: should be done in pre-processing.
if regen_baseline:
    res['baseline'] = []
    res['baseline_std'] = []
    for p[Pname] in Pvalues:
        separator('Baseline', True)
        %run audio_classification.ipynb
        res['baseline'].append(accuracy)
        res['baseline_std'].append(accuracy_std)
else:
    separator('Baseline')
    %run audio_classification.ipynb
    res['baseline'] = len(Pvalues) * [accuracy]
    res['baseline_std'] = accuracy_std
 -------------------- Graph, K = 6 -------------------- 

Data: (149000, 96), float32
Elapsed time: 166.97 seconds
All self-referenced in the first column: True
dist in [0.0, 1.44633090496]
w in [0.00253254896961, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (1215822,), float32
  L_indices : (1215822,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (1215822,), float32
  W_indices : (1215822,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 6
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 175.57 seconds

 -------------------- Features, K = 6 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 6
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (1215822,), float32
  L_indices : (1215822,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (1215822,), float32
  W_indices : (1215822,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2736 seconds
Inner loop: 1410 iterations
g(Z) = ||X-DZ||_2^2 = 6.054626e+04
rdiff: 0.00214526373413
i(Z) = ||Z||_1 = 6.122535e+04
j(Z) = tr(Z^TLZ) = 6.740329e+03
Global objective: 1.285119e+05
Outer loop: 6 iterations

Z in [-0.363849133253, 1.04555726051]
Sparsity of Z: 2,705,168 non-zero entries out of 19,072,000 entries, i.e. 14.2%.
D in [-0.425295501947, 0.918826341629]
d in [0.999999701977, 1.00000023842]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 2743 seconds

 -------------------- Classification, K = 6 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 74.4 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 60.5 %
Clips accuracy: 65.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  75 (+/- 2.0) <- [73 76 76 73 70 75 77 75 76 77]
  75 (+/- 1.9) <- [77 74 75 71 75 76 73 77 77 77]
  76 (+/- 1.4) <- [77 75 76 78 76 76 78 74 77 74]
  76 (+/- 1.5) <- [76 72 73 77 74 76 75 76 77 74]
  76 (+/- 0.7) <- [77 75 75 75 75 75 76 76 75 75]
  75 (+/- 1.1) <- [76 76 74 75 77 74 74 75 74 74]
  76 (+/- 1.5) <- [76 74 76 75 75 72 78 75 74 76]
  75 (+/- 1.2) <- [72 76 76 75 74 75 75 74 73 76]
  76 (+/- 1.2) <- [74 77 74 76 77 74 74 76 75 75]
  75 (+/- 1.3) <- [74 75 75 74 76 75 73 74 74 78]
  76 (+/- 1.9) <- [76 77 74 75 77 75 76 71 77 77]
  76 (+/- 1.4) <- [76 74 77 73 75 77 74 75 76 75]
  76 (+/- 1.2) <- [76 73 76 76 76 77 77 75 75 74]
  75 (+/- 1.8) <- [77 75 75 73 72 74 72 74 76 76]
  76 (+/- 1.0) <- [75 75 76 74 75 76 78 76 77 75]
  76 (+/- 1.7) <- [77 75 76 76 73 75 75 76 79 74]
  75 (+/- 1.8) <- [73 74 73 72 76 76 76 77 77 74]
  75 (+/- 1.3) <- [75 74 74 74 72 76 76 76 76 74]
  75 (+/- 1.2) <- [76 73 76 74 75 74 77 73 75 73]
  75 (+/- 1.1) <- [76 75 74 73 76 76 75 76 75 73]
Accuracy: 75.5 (+/- 1.51)
Mean time (20 cv): 25.08 seconds
Overall time: 505.95 seconds

 -------------------- Graph, K = 11 -------------------- 

Data: (149000, 96), float32
Elapsed time: 157.62 seconds
All self-referenced in the first column: True
dist in [0.0, 1.5599988699]
w in [0.00449881749228, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (2417250,), float32
  L_indices : (2417250,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2417250,), float32
  W_indices : (2417250,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 167.49 seconds

 -------------------- Features, K = 11 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2417250,), float32
  L_indices : (2417250,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2417250,), float32
  W_indices : (2417250,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2726 seconds
Inner loop: 1292 iterations
g(Z) = ||X-DZ||_2^2 = 7.209134e+04
rdiff: 0.000108572654307
i(Z) = ||Z||_1 = 5.659263e+04
j(Z) = tr(Z^TLZ) = 8.596201e+03
Global objective: 1.372802e+05
Outer loop: 8 iterations

Z in [-0.201653152704, 1.06266367435]
Sparsity of Z: 3,790,319 non-zero entries out of 19,072,000 entries, i.e. 19.9%.
D in [-0.0327109843493, 0.894884943962]
d in [0.999999642372, 1.00000047684]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 2734 seconds

 -------------------- Classification, K = 11 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 76.8 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 63.6 %
Clips accuracy: 68.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  76 (+/- 2.4) <- [79 74 77 75 70 77 77 78 78 75]
  78 (+/- 1.3) <- [76 78 77 75 78 76 79 79 76 79]
  79 (+/- 1.4) <- [81 77 78 78 79 80 78 76 80 77]
  78 (+/- 2.2) <- [78 72 77 78 79 77 79 78 80 77]
  78 (+/- 1.3) <- [79 78 80 77 77 77 79 78 77 75]
  79 (+/- 1.4) <- [79 77 78 80 76 80 77 79 77 77]
  78 (+/- 1.3) <- [78 76 79 76 78 76 79 76 77 79]
  78 (+/- 1.2) <- [75 78 79 77 77 79 77 76 76 77]
  79 (+/- 1.1) <- [77 79 77 80 80 78 78 79 78 78]
  78 (+/- 1.7) <- [79 77 79 76 78 77 75 76 80 82]
  78 (+/- 1.9) <- [80 80 78 75 79 78 79 74 78 77]
  78 (+/- 2.0) <- [79 79 79 79 75 75 75 78 81 78]
  78 (+/- 1.5) <- [79 76 78 77 78 79 78 74 78 75]
  77 (+/- 1.7) <- [79 77 79 76 76 76 74 75 78 78]
  78 (+/- 1.5) <- [76 76 78 78 78 79 81 78 79 76]
  78 (+/- 1.8) <- [80 77 78 79 76 76 78 77 82 76]
  78 (+/- 1.3) <- [78 80 76 78 79 76 77 79 78 76]
  78 (+/- 1.0) <- [77 78 76 78 76 79 77 77 77 77]
  78 (+/- 1.2) <- [77 77 77 77 79 78 78 79 75 77]
  77 (+/- 1.6) <- [77 79 79 73 76 77 78 77 78 75]
Accuracy: 78.0 (+/- 1.67)
Mean time (20 cv): 20.05 seconds
Overall time: 405.18 seconds

 -------------------- Graph, K = 21 -------------------- 

Data: (149000, 96), float32
Elapsed time: 222.16 seconds
All self-referenced in the first column: True
dist in [0.0, 1.57787835598]
w in [0.00692198285833, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (4860042,), float32
  L_indices : (4860042,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (4860042,), float32
  W_indices : (4860042,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 21
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 233.53 seconds

 -------------------- Features, K = 21 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 21
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (4860042,), float32
  L_indices : (4860042,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (4860042,), float32
  W_indices : (4860042,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2740 seconds
Inner loop: 1111 iterations
g(Z) = ||X-DZ||_2^2 = 7.942663e+04
rdiff: 0.000274427809718
i(Z) = ||Z||_1 = 5.415536e+04
j(Z) = tr(Z^TLZ) = 1.026438e+04
Global objective: 1.438464e+05
Outer loop: 7 iterations

Z in [-0.0817069783807, 0.914247214794]
Sparsity of Z: 4,382,076 non-zero entries out of 19,072,000 entries, i.e. 23.0%.
D in [-0.0374212861061, 0.891595244408]
d in [0.999999463558, 1.00000035763]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 2750 seconds

 -------------------- Classification, K = 21 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 75.9 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 65.0 %
Clips accuracy: 71.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  77 (+/- 1.4) <- [77 76 76 77 73 77 74 78 77 77]
  78 (+/- 1.7) <- [77 78 79 74 81 76 77 77 76 78]
  79 (+/- 1.4) <- [78 77 81 76 78 79 80 77 79 77]
  78 (+/- 2.5) <- [79 71 77 79 78 80 78 78 77 76]
  78 (+/- 0.7) <- [78 77 78 77 78 77 79 77 79 77]
  79 (+/- 1.7) <- [78 77 75 79 78 81 78 81 76 79]
  78 (+/- 1.1) <- [78 77 79 78 76 76 78 76 77 78]
  78 (+/- 1.3) <- [75 79 78 79 76 79 76 77 76 78]
  79 (+/- 1.2) <- [76 80 77 80 78 76 79 79 78 78]
  78 (+/- 1.4) <- [79 77 78 75 77 77 76 76 78 80]
  78 (+/- 1.9) <- [78 79 77 77 79 79 80 73 78 77]
  78 (+/- 1.6) <- [78 76 80 78 75 77 77 76 80 78]
  78 (+/- 0.8) <- [77 77 77 77 78 78 79 76 78 77]
  77 (+/- 1.4) <- [78 77 79 74 76 76 76 76 76 78]
  78 (+/- 1.4) <- [79 76 80 76 77 80 79 77 77 76]
  78 (+/- 1.9) <- [79 76 78 80 76 76 79 76 81 75]
  77 (+/- 1.0) <- [77 77 77 78 78 78 76 77 78 75]
  77 (+/- 1.1) <- [76 78 76 78 77 79 77 75 77 77]
  77 (+/- 1.3) <- [78 78 76 78 76 78 79 77 74 76]
  77 (+/- 1.8) <- [77 79 79 74 74 76 79 76 77 75]
Accuracy: 77.8 (+/- 1.57)
Mean time (20 cv): 19.80 seconds
Overall time: 400.37 seconds

 -------------------- Graph, K = 51 -------------------- 

Data: (149000, 96), float32
Elapsed time: 167.10 seconds
All self-referenced in the first column: True
dist in [0.0, 1.83867681026]
w in [0.0215746201575, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (13161952,), float32
  L_indices : (13161952,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (13161952,), float32
  W_indices : (13161952,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 51
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 182.70 seconds

 -------------------- Features, K = 51 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 51
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (13161952,), float32
  L_indices : (13161952,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (13161952,), float32
  W_indices : (13161952,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 3170 seconds
Inner loop: 848 iterations
g(Z) = ||X-DZ||_2^2 = 8.904239e+04
rdiff: 0.000695222748206
i(Z) = ||Z||_1 = 5.141387e+04
j(Z) = tr(Z^TLZ) = 1.205900e+04
Global objective: 1.525153e+05
Outer loop: 6 iterations

Z in [-0.0230858866125, 0.52791005373]
Sparsity of Z: 5,014,500 non-zero entries out of 19,072,000 entries, i.e. 26.3%.
D in [-0.0224548019469, 0.910473465919]
d in [0.999999582767, 1.00000035763]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 3181 seconds

 -------------------- Classification, K = 51 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 73.4 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 64.7 %
Clips accuracy: 70.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  75 (+/- 2.2) <- [75 74 77 77 69 76 75 74 74 72]
  76 (+/- 1.6) <- [76 76 74 72 78 77 75 76 74 75]
  77 (+/- 2.1) <- [75 78 80 74 76 78 78 73 77 74]
  76 (+/- 1.1) <- [77 73 74 76 75 77 75 75 76 76]
  76 (+/- 1.2) <- [76 74 77 76 76 77 77 74 73 77]
  77 (+/- 1.3) <- [77 77 75 78 78 78 75 77 78 76]
  76 (+/- 1.2) <- [75 75 77 77 74 73 75 75 77 77]
  76 (+/- 1.5) <- [76 77 75 73 75 79 75 74 75 76]
  76 (+/- 1.6) <- [75 76 74 77 75 71 75 75 75 77]
  76 (+/- 1.7) <- [75 73 77 73 75 75 76 75 75 79]
  75 (+/- 1.8) <- [77 75 77 71 76 74 77 73 75 75]
  76 (+/- 1.3) <- [76 77 77 74 74 75 77 75 78 76]
  75 (+/- 1.5) <- [74 75 76 73 77 77 76 73 75 73]
  75 (+/- 1.6) <- [76 75 77 72 74 73 74 73 75 77]
  76 (+/- 1.2) <- [76 74 77 76 75 76 76 74 75 78]
  76 (+/- 1.9) <- [76 77 76 78 72 76 76 78 77 73]
  76 (+/- 1.7) <- [74 75 75 75 76 75 76 77 79 72]
  76 (+/- 1.5) <- [76 75 78 74 74 77 74 73 78 76]
  76 (+/- 1.4) <- [75 76 77 73 78 76 76 74 74 74]
  76 (+/- 1.4) <- [75 76 76 74 76 73 78 76 75 73]
Accuracy: 75.8 (+/- 1.66)
Mean time (20 cv): 19.95 seconds
Overall time: 403.72 seconds

 -------------------- Baseline -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 298, 96)
Truncated and grouped:
  size: N=135,000 x n=96 -> 12,960,000 floats
  dim: 25,920 features per clip
  shape: (5, 100, 6, 45, 96)
Truncated and grouped:
  size: N=135,000 x n=96 -> 12,960,000 floats
  dim: 25,920 features per clip
  shape: (5, 100, 6, 45, 96)
Feature vectors:
  size: N=6,000 x n=96 -> 576,000 floats
  dim: 1,152 features per clip
  shape: (5, 100, 6, 2, 96)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 96), float64
Testing data: (2400, 96), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 68.8 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1152), float64
Testing data: (200, 1152), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 62.1 %
Clips accuracy: 68.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 96), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  69 (+/- 2.2) <- [70 68 71 67 64 67 70 68 72 66]
  69 (+/- 1.4) <- [71 68 67 67 72 69 67 70 69 70]
  70 (+/- 2.0) <- [71 69 73 68 68 71 71 68 71 66]
  69 (+/- 1.4) <- [70 67 66 67 67 70 68 70 70 68]
  70 (+/- 2.4) <- [71 69 72 68 70 75 71 68 68 66]
  71 (+/- 0.7) <- [71 71 70 70 71 71 70 69 70 70]
  70 (+/- 1.4) <- [70 68 72 69 69 67 71 69 70 69]
  70 (+/- 1.5) <- [69 70 68 71 65 70 70 69 70 69]
  69 (+/- 1.7) <- [68 69 68 69 68 65 70 70 69 72]
  69 (+/- 1.8) <- [72 69 66 68 69 66 69 69 70 71]
  70 (+/- 1.4) <- [71 70 69 69 69 69 73 70 71 68]
  70 (+/- 1.6) <- [71 70 71 68 69 69 73 68 69 71]
  69 (+/- 1.3) <- [69 70 66 68 69 70 68 71 68 68]
  69 (+/- 2.0) <- [71 66 68 66 67 69 66 69 69 72]
  70 (+/- 1.1) <- [69 70 72 68 69 70 69 71 71 69]
  70 (+/- 2.0) <- [72 71 69 71 67 68 71 72 69 67]
  70 (+/- 1.6) <- [67 71 69 69 68 70 71 71 72 69]
  70 (+/- 0.9) <- [70 68 71 70 68 70 69 69 71 69]
  70 (+/- 1.2) <- [72 71 69 69 71 69 72 70 68 69]
  69 (+/- 1.7) <- [67 70 71 67 67 72 67 70 70 69]
Accuracy: 69.7 (+/- 1.71)
Mean time (20 cv): 16.77 seconds
Overall time: 339.64 seconds

Results

In [7]:
print('{} = {}'.format(Pname, Pvalues))
for key, value in res.items():
    if key is not 'atoms_D':
        print('res[\'{}\'] = {}'.format(key, value))

def plot(*args, **kwargs):
    plt.figure(figsize=(8,5))
    x = range(len(Pvalues))
    log = 'log' in kwargs and kwargs['log'] is True
    pltfunc = plt.semilogy if log else plt.plot
    params = {}
    params['linestyle'] = '-'
    params['marker'] = '.'
    params['markersize'] = 10
    for i, var in enumerate(args):
        if 'err' in kwargs:
            pltfunc = plt.errorbar
            params['yerr'] = res[kwargs['err'][i]]
            params['capsize'] = 5
        pltfunc(x, res[var], label=var, **params)
        for i,j in zip(x, res[var]):
            plt.annotate('{:.2f}'.format(j), xy=(i,j), xytext=(5,5), textcoords='offset points')
    margin = 0.25
    params['markersize'] = 10
    plt.xlim(-margin, len(Pvalues)-1+margin)
    if 'ylim' in kwargs:
        plt.ylim(kwargs['ylim'])
    plt.title('{} vs {}'.format(', '.join(args), Pname))
    plt.xlabel(Pname)
    plt.ylabel(' ,'.join(args))
    plt.xticks(x, Pvalues)
    plt.grid(True); plt.legend(loc='best'); plt.show()

def div(l):
    div = Pvalues if Pname is l else [p[l]]
    return np.array([1 if v is None else v for v in div])

# Classification results.
res['chance'] = len(Pvalues) * [100./p['Ngenres']]
res['chance_std'] = 0
err=['accuracy_std', 'baseline_std', 'chance_std']
plot('accuracy', 'baseline', 'chance', err=err, ylim=[0,100])

# Features extraction results.
if regen_features:
    plot('objective_g', 'objective_h', 'objective_i', 'objective_j', log=True)
    # Unweighted objectives.
    print('g(Z) = ||X-DZ||_2^2, h(Z) = ||Z-EX||_2^2, i(Z) = ||Z||_1, j(Z) = tr(Z^TLZ)')
    res['objective_g_un'] = res['objective_g'] / div('ld')
    res['objective_h_un'] = res['objective_h'] / div('le')
    res['objective_i_un'] = res['objective_i'] / div('ls')
    res['objective_j_un'] = res['objective_j'] / div('lg')
    plot('objective_g_un', 'objective_h_un', 'objective_i_un', 'objective_j_un', log=True)
    plot('sparsity', ylim=[0,100])
    plot('time_features')
    plot('iterations_inner')
    plot('iterations_outer')

    for i, fig in enumerate(res['atoms_D']):
        print('Dictionary atoms for {} = {}'.format(Pname, Pvalues[i]))
        fig.show()

print('Experiment time: {:.0f} seconds'.format(time.time() - texperiment))
K = [6, 11, 21, 51]
res['accuracy_std'] = [1.5132709238232562, 1.6719813180109941, 1.5700707698133303, 1.6648748701729308]
res['objective_j'] = [6740.3289794921875, 8596.2005615234375, 10264.377593994141, 12059.004211425781]
res['objective_i'] = [61225.3515625, 56592.6328125, 54155.359375, 51413.87109375]
res['objective_h'] = [0, 0, 0, 0]
res['objective_g'] = [60546.259765625, 72091.3427734375, 79426.6259765625, 89042.392578125]
res['baseline'] = [69.729166666666742, 69.729166666666742, 69.729166666666742, 69.729166666666742]
res['time_features'] = [2735.5493268966675, 2725.535964012146, 2740.2448320388794, 3169.648545026779]
res['baseline_std'] = 1.71029866625
res['sparsity'] = [14.183976510067113, 19.873736367449663, 22.97648909395973, 26.292470637583893]
res['iterations_inner'] = [1410, 1292, 1111, 848]
res['iterations_outer'] = [6, 8, 7, 6]
res['accuracy'] = [75.496666666666655, 77.964166666666614, 77.776666666666628, 75.845000000000041]
/usr/lib/python2.7/dist-packages/numpy/ma/core.py:3847: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")
g(Z) = ||X-DZ||_2^2, h(Z) = ||Z-EX||_2^2, i(Z) = ||Z||_1, j(Z) = tr(Z^TLZ)
Dictionary atoms for K = 6
Dictionary atoms for K = 11
Dictionary atoms for K = 21
Dictionary atoms for K = 51
Experiment time: 14237 seconds
/usr/lib/pymodules/python2.7/matplotlib/figure.py:371: UserWarning: matplotlib is currently using a non-GUI backend, so cannot show the figure
  "matplotlib is currently using a non-GUI backend, "