Genre recognition: experiment

Goal: Explore the influence of $\lambda_g$ in a noiseless setting.

Conclusion: $\lambda_g$ has not much influence, accuracy varies due to graph construction.

Observations:

  • There is again a difference of 1.3% in accuracy compared with 13f_noiseless and 0.5% with 13g_ratio. Again, the sole difference I see is the kNN step of the graph construction.
  • With an accuracy of 78.4%, we are 2.4% better than without the graph, in a noiseless setting.
  • $\lambda_g$ does not seem to have a great impact in a noiseless setting.
  • There is no trend in accuracy. Ok for sub-objectives.
  • This experiment is consistent with 12j_lg.

Hyper-parameters

Parameter under test

In [1]:
Pname = 'lg'
Pvalues = [50, 70, 100, 160, 280, 500]

# Regenerate the graph or the features at each iteration.
regen_graph = False
regen_features = True
regen_baseline = False

Model parameters

In [2]:
p = {}

# Preprocessing.

# Graph.
p['data_scaling_graph'] = 'features'
p['K'] = 10 + 1  # 5 to 10 + 1 for self-reference
p['dm'] = 'euclidean'
p['Csigma'] = 1
p['diag'] = True
p['laplacian'] = 'normalized'

# Feature extraction.
p['m'] = 128  # 64, 128, 512
p['ls'] = 1
p['ld'] = 10
p['le'] = None
p['lg'] = 100

# Classification.
p['scale'] = None
p['Nvectors'] = 6
p['svm_type'] = 'C'
p['kernel'] = 'linear'
p['C'] = 1
p['nu'] = 0.5
p['majority_voting'] = False

Data parameters

In [3]:
# HDF5 data stores.
p['folder'] = 'data'
p['filename_gtzan'] = 'gtzan.hdf5'
p['filename_audio'] = 'audio.hdf5'
p['filename_graph'] = 'graph.hdf5'
p['filename_features'] = 'features.hdf5'

# Dataset (10,100,644 | 5,100,149 | 2,10,644).
p['Ngenres'] = 5
p['Nclips'] = 100
p['Nframes'] = 149

# Added white noise.
p['noise_std'] = 0

Numerical parameters

In [4]:
# Graph.
p['tol'] = 1e-5

# Feature extraction.
p['rtol'] = 1e-5  # 1e-3, 1e-5, 1e-7
p['N_inner'] = 500
p['N_outer'] = 50

# Classification.
p['test_size'] = 0.1
p['Ncv'] = 20
p['dataset_classification'] = 'Z'

Processing

In [5]:
import numpy as np
import time

texperiment = time.time()

# Result dictionary.
res = ['accuracy', 'accuracy_std']
res += ['sparsity', 'atoms']
res += ['objective_g', 'objective_h', 'objective_i', 'objective_j']
res += ['time_features', 'iterations_inner', 'iterations_outer']
res = dict.fromkeys(res)
for key in res.keys():
    res[key] = []

def separator(name, parameter=False):
    if parameter:
        name += ', {} = {}'.format(Pname, p[Pname])
    dashes = 20 * '-'
    print('\n {} {} {} \n'.format(dashes, name, dashes))
    # Fair comparison when tuning parameters.
    # Randomnesses: dictionary initialization, training and testing sets.
    np.random.seed(1)
In [6]:
#%run gtzan.ipynb
#%run audio_preprocessing.ipynb
if not regen_graph:
    separator('Graph')
    %run audio_graph.ipynb
if not regen_features:
    separator('Features')
    %run audio_features.ipynb

# Hyper-parameter under test.
for p[Pname] in Pvalues:

    if regen_graph:
        separator('Graph', True)
        %run audio_graph.ipynb
    if regen_features:
        separator('Features', True)
        p['filename_features'] = 'features_{}_{}.hdf5'.format(Pname, p[Pname])
        %run audio_features.ipynb
    separator('Classification', True)
    %run audio_classification.ipynb
    
    # Collect results.
    for key in res:
        res[key].append(globals()[key])

# Baseline, i.e. classification with spectrograms.
p['dataset_classification'] = 'X'
p['scale'] = 'minmax'  # Todo: should be done in pre-processing.
if regen_baseline:
    res['baseline'] = []
    res['baseline_std'] = []
    for p[Pname] in Pvalues:
        separator('Baseline', True)
        %run audio_classification.ipynb
        res['baseline'].append(accuracy)
        res['baseline_std'].append(accuracy_std)
else:
    separator('Baseline')
    %run audio_classification.ipynb
    res['baseline'] = len(Pvalues) * [accuracy]
    res['baseline_std'] = accuracy_std
 -------------------- Graph -------------------- 

Data: (149000, 96), float32
Elapsed time: 168.01 seconds
All self-referenced in the first column: True
dist in [0.0, 1.46401679516]
w in [0.00534866191447, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
  L_data    : (2389134,), float32
  L_indices : (2389134,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2389134,), float32
  W_indices : (2389134,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Overall time: 176.89 seconds

 -------------------- Features, lg = 50 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2389134,), float32
  L_indices : (2389134,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2389134,), float32
  W_indices : (2389134,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2568 seconds
Inner loop: 1220 iterations
g(Z) = ||X-DZ||_2^2 = 6.037088e+04
rdiff: 0.00258323447744
i(Z) = ||Z||_1 = 6.107589e+04
j(Z) = tr(Z^TLZ) = 7.681577e+03
Global objective: 1.291283e+05
Outer loop: 8 iterations

Z in [-0.299206078053, 1.01999557018]
Sparsity of Z: 2,560,188 non-zero entries out of 19,072,000 entries, i.e. 13.4%.
D in [-0.37070453167, 0.918632388115]
d in [0.999999701977, 1.00000047684]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 2575 seconds

 -------------------- Classification, lg = 50 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 75.4 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 63.0 %
Clips accuracy: 69.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  77 (+/- 1.7) <- [76 75 77 76 73 75 79 78 76 78]
  77 (+/- 1.6) <- [79 77 76 73 76 78 75 77 77 77]
  78 (+/- 0.8) <- [78 77 79 78 78 78 79 77 78 76]
  77 (+/- 1.6) <- [79 73 75 78 77 74 76 77 77 75]
  76 (+/- 1.2) <- [77 76 76 74 76 76 78 77 74 75]
  77 (+/- 1.4) <- [80 77 77 77 77 75 76 78 75 77]
  77 (+/- 1.3) <- [77 76 77 77 76 74 78 76 76 79]
  77 (+/- 1.2) <- [75 77 78 76 77 77 76 76 74 77]
  77 (+/- 1.3) <- [76 79 75 77 76 75 75 78 76 78]
  77 (+/- 1.9) <- [79 77 76 76 75 76 74 77 75 81]
  77 (+/- 1.4) <- [75 80 77 76 79 77 76 75 77 76]
  77 (+/- 1.3) <- [77 76 79 75 76 78 76 76 79 76]
  77 (+/- 1.2) <- [77 75 77 75 77 78 77 75 78 74]
  77 (+/- 1.4) <- [79 77 77 76 76 77 74 75 77 78]
  78 (+/- 0.9) <- [77 76 78 76 77 79 78 76 78 76]
  77 (+/- 2.0) <- [79 76 78 79 73 75 76 78 77 74]
  77 (+/- 1.3) <- [75 76 76 75 76 77 77 78 79 75]
  77 (+/- 1.3) <- [77 78 74 77 75 76 76 77 78 76]
  76 (+/- 1.3) <- [77 76 76 76 76 77 79 75 74 74]
  77 (+/- 1.2) <- [77 79 76 74 77 75 77 76 76 75]
Accuracy: 76.9 (+/- 1.46)
Mean time (20 cv): 23.65 seconds
Overall time: 477.13 seconds

 -------------------- Features, lg = 70 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2389134,), float32
  L_indices : (2389134,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2389134,), float32
  W_indices : (2389134,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2499 seconds
Inner loop: 1161 iterations
g(Z) = ||X-DZ||_2^2 = 6.483655e+04
rdiff: 0.0021986628786
i(Z) = ||Z||_1 = 5.946181e+04
j(Z) = tr(Z^TLZ) = 8.108680e+03
Global objective: 1.324070e+05
Outer loop: 7 iterations

Z in [-0.19576600194, 1.00336170197]
Sparsity of Z: 2,974,690 non-zero entries out of 19,072,000 entries, i.e. 15.6%.
D in [-0.191132932901, 0.923759222031]
d in [0.999999701977, 1.00000035763]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 2508 seconds

 -------------------- Classification, lg = 70 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 75.0 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 62.0 %
Clips accuracy: 69.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  75 (+/- 1.5) <- [75 76 76 74 71 76 76 75 76 75]
  77 (+/- 2.1) <- [78 76 75 72 77 80 75 75 78 77]
  77 (+/- 1.1) <- [77 77 77 78 78 78 78 76 78 75]
  76 (+/- 1.3) <- [77 73 74 76 76 75 73 76 76 76]
  76 (+/- 0.8) <- [75 75 75 75 76 75 76 77 74 75]
  77 (+/- 0.9) <- [78 77 76 77 76 76 77 78 75 75]
  77 (+/- 1.2) <- [77 77 78 78 76 74 76 75 76 76]
  76 (+/- 1.1) <- [74 75 77 74 74 75 76 74 74 77]
  77 (+/- 1.5) <- [74 79 75 77 77 75 75 77 76 77]
  76 (+/- 1.3) <- [78 76 77 75 77 75 75 75 75 78]
  77 (+/- 2.0) <- [75 79 76 74 79 76 75 73 78 75]
  77 (+/- 1.5) <- [75 76 78 75 74 78 77 74 78 75]
  76 (+/- 2.0) <- [77 73 76 75 79 77 77 75 78 72]
  76 (+/- 2.2) <- [79 75 78 75 73 76 73 73 77 78]
  77 (+/- 1.5) <- [77 75 75 77 75 77 79 74 78 76]
  77 (+/- 1.7) <- [78 75 77 78 73 76 76 77 79 75]
  76 (+/- 1.4) <- [75 76 75 75 76 75 76 78 79 74]
  76 (+/- 1.3) <- [77 76 74 76 76 74 77 76 79 75]
  76 (+/- 1.5) <- [77 77 78 76 77 76 77 75 73 73]
  76 (+/- 1.4) <- [76 78 75 74 76 76 77 78 76 73]
Accuracy: 76.4 (+/- 1.58)
Mean time (20 cv): 22.85 seconds
Overall time: 461.44 seconds

 -------------------- Features, lg = 100 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2389134,), float32
  L_indices : (2389134,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2389134,), float32
  W_indices : (2389134,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2891 seconds
Inner loop: 1346 iterations
g(Z) = ||X-DZ||_2^2 = 7.120840e+04
rdiff: 0.000110467452867
i(Z) = ||Z||_1 = 5.688165e+04
j(Z) = tr(Z^TLZ) = 8.255097e+03
Global objective: 1.363451e+05
Outer loop: 8 iterations

Z in [-0.194608673453, 1.11607587337]
Sparsity of Z: 3,676,435 non-zero entries out of 19,072,000 entries, i.e. 19.3%.
D in [-0.0401657149196, 0.884639799595]
d in [0.999999642372, 1.00000035763]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 2901 seconds

 -------------------- Classification, lg = 100 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 76.7 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 63.2 %
Clips accuracy: 71.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  77 (+/- 2.3) <- [76 75 76 77 71 77 76 79 79 77]
  78 (+/- 1.5) <- [79 76 77 75 81 78 79 78 77 79]
  79 (+/- 1.2) <- [79 79 80 77 80 81 79 78 79 78]
  78 (+/- 2.0) <- [78 72 77 80 78 78 79 78 79 79]
  78 (+/- 1.1) <- [79 78 78 77 77 79 78 77 78 75]
  80 (+/- 1.4) <- [81 78 79 80 79 82 79 79 77 78]
  79 (+/- 1.4) <- [80 76 80 79 79 76 80 78 77 79]
  78 (+/- 1.4) <- [76 77 79 77 78 80 76 76 75 79]
  79 (+/- 0.9) <- [77 78 79 79 78 78 78 78 79 77]
  78 (+/- 1.8) <- [81 78 76 77 78 76 76 78 79 81]
  79 (+/- 1.3) <- [78 80 78 76 80 78 79 76 80 78]
  79 (+/- 1.0) <- [80 78 79 78 77 77 78 78 81 78]
  78 (+/- 1.5) <- [79 77 79 78 78 78 80 74 78 77]
  78 (+/- 1.6) <- [79 76 80 76 77 77 77 75 79 79]
  79 (+/- 1.2) <- [79 78 78 79 79 82 80 78 78 78]
  78 (+/- 1.9) <- [80 77 79 80 75 75 78 77 81 78]
  78 (+/- 1.7) <- [78 79 78 77 79 75 77 79 81 77]
  78 (+/- 1.5) <- [76 80 79 80 75 79 78 77 79 78]
  78 (+/- 1.2) <- [79 79 77 77 79 79 79 78 76 76]
  78 (+/- 2.1) <- [78 80 81 75 77 76 80 77 76 76]
Accuracy: 78.4 (+/- 1.66)
Mean time (20 cv): 20.38 seconds
Overall time: 411.99 seconds

 -------------------- Features, lg = 160 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2389134,), float32
  L_indices : (2389134,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2389134,), float32
  W_indices : (2389134,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2453 seconds
Inner loop: 1168 iterations
g(Z) = ||X-DZ||_2^2 = 7.613062e+04
rdiff: 0.001185257247
i(Z) = ||Z||_1 = 5.564121e+04
j(Z) = tr(Z^TLZ) = 8.949708e+03
Global objective: 1.407215e+05
Outer loop: 5 iterations

Z in [-0.195195972919, 0.834997117519]
Sparsity of Z: 4,178,494 non-zero entries out of 19,072,000 entries, i.e. 21.9%.
D in [-0.0923499763012, 0.911262571812]
d in [0.999999463558, 1.00000035763]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 2464 seconds

 -------------------- Classification, lg = 160 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 76.1 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 62.7 %
Clips accuracy: 69.5 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  76 (+/- 1.9) <- [77 76 74 77 71 78 75 76 78 76]
  77 (+/- 1.5) <- [78 76 78 73 77 76 76 76 77 77]
  78 (+/- 1.1) <- [77 78 78 77 78 79 80 77 77 76]
  76 (+/- 1.3) <- [76 73 76 78 77 76 76 76 78 75]
  77 (+/- 1.3) <- [80 79 77 76 77 75 78 77 77 75]
  77 (+/- 1.0) <- [77 75 75 77 77 77 76 78 78 76]
  77 (+/- 1.6) <- [78 75 78 77 75 74 79 75 76 76]
  76 (+/- 1.3) <- [75 75 77 75 77 79 76 75 75 75]
  78 (+/- 1.2) <- [77 79 76 79 76 75 77 78 78 78]
  77 (+/- 1.2) <- [79 76 77 76 78 75 76 76 76 79]
  77 (+/- 1.6) <- [79 78 78 75 79 76 77 74 79 76]
  78 (+/- 0.9) <- [78 78 78 76 76 78 77 76 79 77]
  77 (+/- 1.8) <- [77 76 75 76 78 79 78 76 77 73]
  77 (+/- 2.2) <- [79 78 80 75 73 76 74 74 78 78]
  77 (+/- 1.3) <- [77 76 76 78 76 76 80 75 78 77]
  78 (+/- 2.2) <- [80 77 79 80 76 75 76 78 80 73]
  77 (+/- 1.0) <- [77 76 76 75 77 78 77 77 78 75]
  78 (+/- 0.7) <- [78 78 76 78 77 78 77 77 78 77]
  77 (+/- 0.9) <- [77 76 77 76 77 77 77 77 75 75]
  77 (+/- 1.1) <- [75 78 78 76 77 78 77 77 77 75]
Accuracy: 77.1 (+/- 1.50)
Mean time (20 cv): 20.08 seconds
Overall time: 406.15 seconds

 -------------------- Features, lg = 280 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2389134,), float32
  L_indices : (2389134,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2389134,), float32
  W_indices : (2389134,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2765 seconds
Inner loop: 1317 iterations
g(Z) = ||X-DZ||_2^2 = 8.593313e+04
rdiff: 0.000702308401347
i(Z) = ||Z||_1 = 5.210842e+04
j(Z) = tr(Z^TLZ) = 9.010239e+03
Global objective: 1.470518e+05
Outer loop: 6 iterations

Z in [-0.100338108838, 0.809743523598]
Sparsity of Z: 5,424,104 non-zero entries out of 19,072,000 entries, i.e. 28.4%.
D in [-0.0333576500416, 0.860081493855]
d in [0.999999642372, 1.00000047684]
Constraints on D: True
Datasets:
  D : (128, 96)             , float32
  X : (5, 100, 149, 2, 96)  , float32
  Z : (5, 100, 149, 2, 128) , float32
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Overall time: 2776 seconds

 -------------------- Classification, lg = 280 -------------------- 

Software versions:
  numpy: 1.8.2
  sklearn: 0.14.1
Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  D : (128, 96)               , float32
  X : (5, 100, 149, 2, 96)    , float32
  Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
  size: N=149,000 x n=128 -> 19,072,000 floats
  dim: 38,144 features per clip
  shape: (5, 100, 298, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Truncated and grouped:
  size: N=135,000 x n=128 -> 17,280,000 floats
  dim: 34,560 features per clip
  shape: (5, 100, 6, 45, 128)
Feature vectors:
  size: N=6,000 x n=128 -> 768,000 floats
  dim: 1,536 features per clip
  shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 76.0 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 66.0 %
Clips accuracy: 73.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (6000, 128), float64
Labels: (6000,), uint8
Ratio: 5400.0 training, 600.0 testing
  76 (+/- 1.7) <- [76 75 77 74 71 78 76 76 76 75]
  77 (+/- 1.9) <- [78 76 76 72 79 77 76 78 78 76]
  78 (+/- 1.3) <- [77 78 79 77 77 79 78 74 78 78]
  77 (+/- 1.7) <- [75 71 76 77 77 77 76 78 76 77]
  77 (+/- 1.1) <- [78 78 78 76 78 77 78 75 77 77]
  78 (+/- 1.2) <- [76 77 76 78 79 79 77 79 78 79]
  77 (+/- 1.3) <- [78 76 78 78 76 75 76 75 77 78]
  77 (+/- 1.1) <- [75 77 79 77 76 77 75 76 76 77]
  78 (+/- 1.1) <- [77 79 76 79 79 76 78 77 78 78]
  77 (+/- 1.6) <- [78 75 77 76 76 75 76 78 78 80]
  77 (+/- 1.6) <- [79 79 78 75 77 76 78 74 76 78]
  77 (+/- 1.4) <- [78 77 80 75 75 76 76 77 79 77]
  78 (+/- 1.0) <- [77 77 77 77 78 79 79 76 78 76]
  77 (+/- 1.4) <- [78 76 78 75 76 77 74 75 76 78]
  78 (+/- 1.1) <- [78 76 76 77 77 79 79 77 77 76]
  77 (+/- 1.5) <- [79 78 76 78 75 76 76 79 78 75]
  76 (+/- 1.4) <- [75 77 75 76 77 76 75 78 78 73]
  77 (+/- 1.2) <- [76 79 76 77 77 79 78 75 77 77]
  77 (+/- 0.7) <- [75 77 76 77 77 77 76 77 76 77]
  77 (+/- 1.7) <- [76 79 78 74 75 75 79 77 77 76]
Accuracy: 77.3 (+/- 1.51)
Mean time (20 cv): 19.79 seconds
Overall time: 400.58 seconds

 -------------------- Features, lg = 500 -------------------- 

Attributes:
  sr = 22050
  labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
 'reggae' 'rock']
Datasets:
  Xa: (10, 100, 644, 2, 1024) , float32
  Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
  size: N=1,288,000 x n=96 -> 123,648,000 floats
  dim: 123,648 features per clip
  shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
  size: N=149,000 x n=96 -> 14,304,000 floats
  dim: 28,608 features per clip
  shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
  K = 11
  dm = euclidean
  Csigma = 1
  diag = True
  laplacian = normalized
Datasets:
  L_data    : (2389134,), float32
  L_indices : (2389134,), int32
  L_indptr  : (149001,) , int32
  L_shape   : (2,)      , int64
  W_data    : (2389134,), float32
  W_indices : (2389134,), int32
  W_indptr  : (149001,) , int32
  W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2484 seconds