Genre recognition: experiment

Goal: Assess which distance metric is the more stable.

Conclusion: euclidean is less stable but can potentially lead to better results.

Observations:

  • All the dm leads to approximately the same accuracy.
  • euclidean is less stable, its accuracy changed by 1.5% between two runs.

Hyper-parameters

Parameter under test

In [1]:
Pname = 'dm'
Pvalues = 2 * ['cosine_sim','euclidean','cosine_dist']

# Regenerate the graph or the features at each iteration.
regen_graph = True
regen_features = True
regen_baseline = False

Model parameters

In [2]:
p = {}

# Preprocessing.

# Graph.
p['data_scaling_graph'] = 'features'
p['K'] = 10 + 1  # 5 to 10 + 1 for self-reference
p['dm'] = 'euclidean'
p['Csigma'] = 1
p['diag'] = True
p['laplacian'] = 'normalized'

# Feature extraction.
p['m'] = 128  # 64, 128, 512
p['ls'] = 1
p['ld'] = 10
p['le'] = None
p['lg'] = 100

# Classification.
p['scale'] = None
p['Nvectors'] = 6
p['svm_type'] = 'C'
p['kernel'] = 'linear'
p['C'] = 1
p['nu'] = 0.5
p['majority_voting'] = False

Data parameters

In [3]:
# HDF5 data stores.
p['folder'] = 'data'
p['filename_gtzan'] = 'gtzan.hdf5'
p['filename_audio'] = 'audio.hdf5'
p['filename_graph'] = 'graph.hdf5'
p['filename_features'] = 'features.hdf5'

# Dataset (10,100,644 | 5,100,149 | 2,10,644).
p['Ngenres'] = 5
p['Nclips'] = 100
p['Nframes'] = 149

# Added white noise.
p['noise_std'] = 0.1

Numerical parameters

In [4]:
# Graph.
p['tol'] = 1e-5

# Feature extraction.
p['rtol'] = 1e-5  # 1e-3, 1e-5, 1e-7
p['N_inner'] = 500
p['N_outer'] = 50

# Classification.
p['test_size'] = 0.1
p['Ncv'] = 20
p['dataset_classification'] = 'Z'

Processing

In [5]:
import numpy as np
import time

texperiment = time.time()

# Result dictionary.
res = ['accuracy', 'accuracy_std']
res += ['sparsity', 'atoms']
res += ['objective_g', 'objective_h', 'objective_i', 'objective_j']
res += ['time_features', 'iterations_inner', 'iterations_outer']
res = dict.fromkeys(res)
for key in res.keys():
    res[key] = []

def separator(name, parameter=False):
    if parameter:
        name += ', {} = {}'.format(Pname, p[Pname])
    dashes = 20 * '-'
    print('\n {} {} {} \n'.format(dashes, name, dashes))
    # Fair comparison when tuning parameters.
    # Randomnesses: dictionary initialization, training and testing sets.
    np.random.seed(1)
In [6]:
#%run gtzan.ipynb
#%run audio_preprocessing.ipynb
if not regen_graph:
    separator('Graph')
    %run audio_graph.ipynb
if not regen_features:
    separator('Features')
    %run audio_features.ipynb

# Hyper-parameter under test.
for p[Pname] in Pvalues:

    if regen_graph:
        separator('Graph', True)
        %run audio_graph.ipynb
    if regen_features:
        separator('Features', True)
        p['filename_features'] = 'features_{}_{}.hdf5'.format(Pname, p[Pname])
        %run audio_features.ipynb
    separator('Classification', True)
    %run audio_classification.ipynb
    
    # Collect results.
    for key in res:
        res[key].append(globals()[key])

# Baseline, i.e. classification with spectrograms.
p['dataset_classification'] = 'X'
p['scale'] = 'minmax'  # Todo: should be done in pre-processing.
if regen_baseline:
    res['baseline'] = []
    res['baseline_std'] = []
    for p[Pname] in Pvalues:
        separator('Baseline', True)
        %run audio_classification.ipynb
        res['baseline'].append(accuracy)
        res['baseline_std'].append(accuracy_std)
else:
    separator('Baseline')
    %run audio_classification.ipynb
    res['baseline'] = len(Pvalues) * [accuracy]
    res['baseline_std'] = accuracy_std
 -------------------- Graph, dm = cosine_sim -------------------- 

Data: (149000, 96), float32
Elapsed time: 523.94 seconds
All self-referenced in the first column: True
dist in [0.66959297657, 1.0]
w in [0.66959297657, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (2430692,), float32
  L_indices : (2430692,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2430692,), float32
  W_indices : (2430692,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 11
  dm = cosine_sim
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 533.73 seconds

 -------------------- Features, dm = cosine_sim -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = cosine_sim
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2430692,), float32
  L_indices : (2430692,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2430692,), float32
  W_indices : (2430692,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 779 seconds
Inner loop: 314 iterations
g(Z) = ||X-DZ||_2^2 = 7.391266e+05
rdiff: 0.000265621767921
i(Z) = ||Z||_1 = 7.507485e+04
j(Z) = tr(Z^TLZ) = 2.182541e+04
Global objective: 8.360269e+05
Outer loop: 5 iterations

Z in [-0.052473615855, 0.141105085611]
Sparsity of Z: 8,656,904 non-zero entries out of 19,072,000 entries, i.e. 45.4%.
D in [-0.149452388287, 0.568888425827]
d in [0.999999642372, 1.00000023842]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 787 seconds

 -------------------- Classification, dm = cosine_sim -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 63.5 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 56.6 %
Clips accuracy: 58.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  63 (+/- 1.4) <- [63 62 64 63 61 63 63 66 64 61]
  64 (+/- 2.0) <- [64 65 63 61 65 65 61 59 63 65]
  66 (+/- 2.2) <- [65 66 67 64 68 67 67 61 66 62]
  64 (+/- 2.5) <- [65 59 65 63 60 66 66 62 62 67]
  64 (+/- 1.1) <- [63 64 63 66 65 65 63 64 63 62]
  64 (+/- 1.2) <- [63 64 63 65 65 65 66 63 62 63]
  64 (+/- 1.8) <- [64 63 65 65 65 59 63 62 62 63]
  63 (+/- 2.0) <- [61 62 64 66 63 67 62 61 62 62]
  64 (+/- 1.3) <- [64 63 62 65 61 62 62 65 64 65]
  66 (+/- 1.8) <- [66 63 65 63 67 64 65 65 65 69]
  66 (+/- 2.5) <- [69 66 66 59 65 65 68 63 65 65]
  65 (+/- 1.3) <- [65 65 66 65 62 62 65 64 63 65]
  64 (+/- 1.5) <- [65 65 63 66 63 63 61 64 63 62]
  63 (+/- 2.8) <- [65 62 64 57 62 63 62 63 61 69]
  64 (+/- 1.4) <- [63 63 65 65 62 64 63 65 67 63]
  64 (+/- 1.5) <- [66 63 61 63 62 62 63 64 65 63]
  64 (+/- 1.1) <- [64 64 65 64 64 63 64 63 67 63]
  64 (+/- 2.0) <- [68 66 64 64 62 63 65 60 63 63]
  64 (+/- 1.8) <- [62 66 64 63 67 63 66 64 61 64]
  64 (+/- 1.1) <- [64 65 62 63 65 63 65 66 63 64]
Accuracy: 64.2 (+/- 1.90)
Mean time (20 cv): 24.69 seconds
Overall time: 498.70 seconds

 -------------------- Graph, dm = euclidean -------------------- 

Data: (149000, 96), float32
Elapsed time: 642.64 seconds
All self-referenced in the first column: True
dist in [0.0, 2.01710319519]
w in [0.173404276371, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (2913592,), float32
  L_indices : (2913592,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2913592,), float32
  W_indices : (2913592,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 653.03 seconds

 -------------------- Features, dm = euclidean -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2913592,), float32
  L_indices : (2913592,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2913592,), float32
  W_indices : (2913592,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 769 seconds
Inner loop: 287 iterations
g(Z) = ||X-DZ||_2^2 = 7.469461e+05
rdiff: 0.000393581782487
i(Z) = ||Z||_1 = 7.001193e+04
j(Z) = tr(Z^TLZ) = 2.276549e+04
Global objective: 8.397235e+05
Outer loop: 5 iterations

Z in [-0.101718649268, 0.194122523069]
Sparsity of Z: 8,337,151 non-zero entries out of 19,072,000 entries, i.e. 43.7%.
D in [-0.142467901111, 0.522177219391]
d in [0.999999701977, 1.00000023842]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 779 seconds

 -------------------- Classification, dm = euclidean -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 63.0 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 57.2 %
Clips accuracy: 60.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  63 (+/- 1.9) <- [63 63 62 62 59 64 62 65 66 60]
  64 (+/- 2.1) <- [65 65 63 61 67 65 61 60 63 64]
  65 (+/- 2.0) <- [64 65 66 64 66 67 67 60 65 62]
  64 (+/- 2.3) <- [65 60 64 63 60 66 66 62 63 66]
  64 (+/- 1.6) <- [62 64 63 67 64 64 62 64 62 60]
  64 (+/- 1.0) <- [63 64 63 66 63 64 64 62 62 63]
  64 (+/- 1.6) <- [64 64 63 64 65 59 65 63 63 63]
  63 (+/- 1.6) <- [60 63 63 63 61 66 61 61 63 61]
  64 (+/- 1.6) <- [65 62 62 64 61 61 62 64 66 65]
  65 (+/- 1.4) <- [65 63 65 62 66 63 63 64 64 67]
  65 (+/- 2.5) <- [66 66 64 58 63 65 67 63 65 65]
  64 (+/- 1.5) <- [65 65 65 64 62 63 65 61 63 66]
  64 (+/- 1.8) <- [65 64 63 66 63 62 63 65 63 59]
  63 (+/- 2.7) <- [64 62 64 57 61 62 60 65 61 68]
  64 (+/- 1.2) <- [63 62 64 64 62 64 63 66 66 63]
  63 (+/- 1.6) <- [67 63 61 62 62 62 61 64 64 63]
  64 (+/- 1.4) <- [62 64 65 64 64 62 63 63 67 63]
  64 (+/- 1.9) <- [67 65 64 64 62 64 65 60 63 63]
  64 (+/- 1.3) <- [62 64 63 63 66 65 65 65 62 63]
  64 (+/- 1.1) <- [63 66 62 64 63 64 64 65 63 62]
Accuracy: 63.8 (+/- 1.88)
Mean time (20 cv): 24.30 seconds
Overall time: 491.24 seconds

 -------------------- Graph, dm = cosine_dist -------------------- 

Data: (149000, 96), float32
Elapsed time: 577.82 seconds
All self-referenced in the first column: True
dist in [0.0, 0.681089401245]
w in [0.283994346857, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (2789250,), float32
  L_indices : (2789250,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2789250,), float32
  W_indices : (2789250,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 11
  dm = cosine_dist
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 588.28 seconds

 -------------------- Features, dm = cosine_dist -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = cosine_dist
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2789250,), float32
  L_indices : (2789250,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2789250,), float32
  W_indices : (2789250,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 657 seconds
Inner loop: 246 iterations
g(Z) = ||X-DZ||_2^2 = 7.462377e+05
rdiff: 0.000593498029647
i(Z) = ||Z||_1 = 7.127560e+04
j(Z) = tr(Z^TLZ) = 2.056625e+04
Global objective: 8.380796e+05
Outer loop: 4 iterations

Z in [-0.056636184454, 0.14264112711]
Sparsity of Z: 8,786,613 non-zero entries out of 19,072,000 entries, i.e. 46.1%.
D in [-0.145910441875, 0.507630586624]
d in [0.999999761581, 1.00000023842]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 667 seconds

 -------------------- Classification, dm = cosine_dist -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 63.2 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 57.2 %
Clips accuracy: 62.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  63 (+/- 2.0) <- [63 63 62 61 59 63 62 65 66 60]
  64 (+/- 1.8) <- [65 65 63 61 64 65 61 61 64 65]
  66 (+/- 1.9) <- [65 66 67 64 66 67 66 62 66 62]
  64 (+/- 2.1) <- [64 60 63 62 60 66 66 62 63 65]
  64 (+/- 1.5) <- [62 64 64 65 63 64 62 65 62 61]
  64 (+/- 1.1) <- [63 63 63 64 62 64 66 63 62 64]
  64 (+/- 1.8) <- [65 64 64 64 65 58 63 63 62 64]
  64 (+/- 1.5) <- [62 63 64 65 63 67 62 62 63 62]
  64 (+/- 1.6) <- [64 63 62 65 61 61 61 65 65 64]
  65 (+/- 2.2) <- [67 63 65 63 67 63 62 63 65 69]
  65 (+/- 2.3) <- [67 64 65 59 63 65 68 63 66 66]
  64 (+/- 1.1) <- [65 66 65 64 63 63 65 62 64 64]
  64 (+/- 1.5) <- [64 66 63 65 63 63 62 65 63 61]
  63 (+/- 2.7) <- [63 63 64 56 63 62 61 63 62 68]
  64 (+/- 1.3) <- [62 62 64 64 65 64 64 64 67 62]
  64 (+/- 1.4) <- [65 64 62 62 63 64 60 64 65 63]
  64 (+/- 1.5) <- [62 64 65 64 63 62 64 63 67 61]
  64 (+/- 2.2) <- [69 65 65 64 61 64 65 61 64 61]
  64 (+/- 1.6) <- [63 64 63 62 67 64 64 65 61 62]
  64 (+/- 1.3) <- [62 65 62 62 64 63 65 66 63 63]
Accuracy: 64.0 (+/- 1.88)
Mean time (20 cv): 24.58 seconds
Overall time: 496.99 seconds

 -------------------- Graph, dm = cosine_sim -------------------- 

Data: (149000, 96), float32
Elapsed time: 539.57 seconds
All self-referenced in the first column: True
dist in [0.66959297657, 1.0]
w in [0.66959297657, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (2430692,), float32
  L_indices : (2430692,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2430692,), float32
  W_indices : (2430692,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 11
  dm = cosine_sim
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 549.33 seconds

 -------------------- Features, dm = cosine_sim -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = cosine_sim
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2430692,), float32
  L_indices : (2430692,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2430692,), float32
  W_indices : (2430692,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 805 seconds
Inner loop: 314 iterations
g(Z) = ||X-DZ||_2^2 = 7.391266e+05
rdiff: 0.000265621767921
i(Z) = ||Z||_1 = 7.507485e+04
j(Z) = tr(Z^TLZ) = 2.182541e+04
Global objective: 8.360269e+05
Outer loop: 5 iterations

Z in [-0.052473615855, 0.141105085611]
Sparsity of Z: 8,656,904 non-zero entries out of 19,072,000 entries, i.e. 45.4%.
D in [-0.149452388287, 0.568888425827]
d in [0.999999642372, 1.00000023842]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 817 seconds

 -------------------- Classification, dm = cosine_sim -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 63.5 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 56.6 %
Clips accuracy: 58.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  63 (+/- 1.4) <- [63 62 64 63 61 63 63 66 64 61]
  64 (+/- 2.0) <- [64 65 63 61 65 65 61 59 63 65]
  66 (+/- 2.2) <- [65 66 67 64 68 67 67 61 66 62]
  64 (+/- 2.5) <- [65 59 65 63 60 66 66 62 62 67]
  64 (+/- 1.1) <- [63 64 63 66 65 65 63 64 63 62]
  64 (+/- 1.2) <- [63 64 63 65 65 65 66 63 62 63]
  64 (+/- 1.8) <- [64 63 65 65 65 59 63 62 62 63]
  63 (+/- 2.0) <- [61 62 64 66 63 67 62 61 62 62]
  64 (+/- 1.3) <- [64 63 62 65 61 62 62 65 64 65]
  66 (+/- 1.8) <- [66 63 65 63 67 64 65 65 65 69]
  66 (+/- 2.5) <- [69 66 66 59 65 65 68 63 65 65]
  65 (+/- 1.3) <- [65 65 66 65 62 62 65 64 63 65]
  64 (+/- 1.5) <- [65 65 63 66 63 63 61 64 63 62]
  63 (+/- 2.8) <- [65 62 64 57 62 63 62 63 61 69]
  64 (+/- 1.4) <- [63 63 65 65 62 64 63 65 67 63]
  64 (+/- 1.5) <- [66 63 61 63 62 62 63 64 65 63]
  64 (+/- 1.1) <- [64 64 65 64 64 63 64 63 67 63]
  64 (+/- 2.0) <- [68 66 64 64 62 63 65 60 63 63]
  64 (+/- 1.8) <- [62 66 64 63 67 63 66 64 61 64]
  64 (+/- 1.1) <- [64 65 62 63 65 63 65 66 63 64]
Accuracy: 64.2 (+/- 1.90)
Mean time (20 cv): 24.42 seconds
Overall time: 493.99 seconds

 -------------------- Graph, dm = euclidean -------------------- 

Data: (149000, 96), float32
Elapsed time: 1489.04 seconds
All self-referenced in the first column: True
dist in [0.0, 2.08512282372]
w in [0.199098289013, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (2923166,), float32
  L_indices : (2923166,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2923166,), float32
  W_indices : (2923166,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 1498.97 seconds

 -------------------- Features, dm = euclidean -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2923166,), float32
  L_indices : (2923166,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2923166,), float32
  W_indices : (2923166,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 944 seconds
Inner loop: 357 iterations
g(Z) = ||X-DZ||_2^2 = 7.395080e+05
rdiff: 0.000191850515487
i(Z) = ||Z||_1 = 7.341332e+04
j(Z) = tr(Z^TLZ) = 2.210159e+04
Global objective: 8.350230e+05
<