# Genre recognition: experiment¶

Goal: Explore the effect of $\lambda_d$.

Conclusion: A value of $\lambda_d$ between 10 and 100 seems reasonable (when $\lambda_g=100$ and $\lambda_s=1$). It corresponds to the range of $\lambda_s$, i.e. between 1 and 10 for $\lambda_d=\lambda_g=100$. We want a ratio $\frac{\lambda_d}{\lambda_s}$ between 10 and 100. This ratio controls sparsity and, indirectly, speed.

Observations:

• In the previous experiment with $\lambda_d = \lambda_g = 100$, $\lambda_s = 1$ was found to be the best option. So we fixed $\lambda_g = 100$ and $\lambda_s=1$.
• The experiment with $\lambda_d = \lambda_g = 100$ and $\lambda_s=1$ is 1.5 times faster with rtol=1e-5 (compared with the previous experiment). Accuracy dropped from 73.06 to 72.49.
• Time to extract features increases with $\lambda_d$. It is the term which couples the two variables we optimize for.
• Ran for 16h30.

## Hyper-parameters¶

### Parameter under test¶

In [1]:
Pname = 'ld'
Pvalues = [1, 10, 100, 1e3, 1e4]

# Regenerate the graph or the features at each iteration.
regen_graph = False
regen_features = True


### Model parameters¶

In [2]:
p = {}

# Preprocessing.

# Graph.
p['K'] = 10 + 1  # 5 to 10 + 1 for self-reference
p['dm'] = 'cosine'
p['Csigma'] = 1
p['diag'] = True
p['laplacian'] = 'normalized'

# Feature extraction.
p['m'] = 128  # 64, 128, 512
p['ls'] = 1
p['ld'] = 100
p['le'] = None
p['lg'] = 100

# Classification.
p['scale'] = None
p['Nvectors'] = 6
p['svm_type'] = 'C'
p['kernel'] = 'linear'
p['C'] = 1
p['nu'] = 0.5


### Numerical parameters¶

In [3]:
# HDF5 data stores.
p['folder'] = 'data'
p['filename_gtzan'] = 'gtzan.hdf5'
p['filename_audio'] = 'audio.hdf5'
p['filename_graph'] = 'graph.hdf5'
p['filename_features'] = 'features.hdf5'

# Dataset (10,100,644 | 5,100,149 | 2,10,644).
p['Ngenres'] = 5
p['Nclips'] = 100
p['Nframes'] = 149

# Graph.
p['tol'] = 1e-5

# Feature extraction.
p['rtol'] = 1e-5  # 1e-3, 1e-5, 1e-7
p['N_inner'] = 500
p['N_outer'] = 50

# Classification.
p['Nfolds'] = 10
p['Ncv'] = 40
p['dataset_classification'] = 'Z'


## Processing¶

In [4]:
import numpy as np
import time

texperiment = time.time()

# Result dictionary.
res = ['accuracy', 'accuracy_std']
res += ['sparsity', 'atoms']
res += ['objective_g', 'objective_h', 'objective_i', 'objective_j']
res += ['time_features', 'iterations_inner', 'iterations_outer']
res = dict.fromkeys(res)
for key in res.keys():
res[key] = []

def separator(name, parameter=False):
if parameter:
name += ', {} = {}'.format(Pname, p[Pname])
dashes = 20 * '-'
print('\n {} {} {} \n'.format(dashes, name, dashes))
# Fair comparison when tuning parameters.
# Randomnesses: dictionary initialization, training and testing sets.
np.random.seed(1)

In [5]:
#%run gtzan.ipynb
#%run audio_preprocessing.ipynb
if not regen_graph:
separator('Graph')
%run audio_graph.ipynb
if not regen_features:
separator('Features')
%run audio_features.ipynb

# Hyper-parameter under test.
for p[Pname] in Pvalues:

if regen_graph:
separator('Graph', True)
%run audio_graph.ipynb
if regen_features:
separator('Features', True)
p['filename_features'] = 'features_{}_{}.hdf5'.format(Pname, p[Pname])
%run audio_features.ipynb
separator('Classification', True)
%run audio_classification.ipynb

# Collect results.
for key in res:
res[key].append(globals()[key])

# Baseline, i.e. classification with spectrograms.
p['dataset_classification'] = 'X'
p['scale'] = 'minmax'  # Todo: should be done in pre-processing.
if not regen_graph and not regen_features:
# Classifier parameters are being tested.
for p[Pname] in Pvalues:
separator('Baseline', True)
%run audio_classification.ipynb
else:
separator('Baseline')
%run audio_classification.ipynb
res['baseline'] = len(Pvalues) * [accuracy]
res['baseline_std'] = len(Pvalues) * [accuracy_std]

 -------------------- Graph --------------------

Data: (149000, 96), float32
Elapsed time: 221.14 seconds
All self-referenced in the first column: True
dist in [0.0, 0.432167828083]
w in [0.0362241715193, 1.0]
Ones on the diagonal: 149000 (over 149000)
assert: True
W in [0.0, 1.0]
Datasets:
L_data    : (2279908,), float32
L_indices : (2279908,), int32
L_indptr  : (149001,) , int32
L_shape   : (2,)      , int64
W_data    : (2279908,), float32
W_indices : (2279908,), int32
W_indptr  : (149001,) , int32
W_shape   : (2,)      , int64
Attributes:
K = 11
dm = cosine
Csigma = 1
diag = True
laplacian = normalized
Overall time: 230.34 seconds

-------------------- Features, ld = 1 --------------------

Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
Xa: (10, 100, 644, 2, 1024) , float32
Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
size: N=1,288,000 x n=96 -> 123,648,000 floats
dim: 123,648 features per clip
shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=96 -> 14,304,000 floats
dim: 28,608 features per clip
shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
K = 11
dm = cosine
Csigma = 1
diag = True
laplacian = normalized
Datasets:
L_data    : (2279908,), float32
L_indices : (2279908,), int32
L_indptr  : (149001,) , int32
L_shape   : (2,)      , int64
W_data    : (2279908,), float32
W_indices : (2279908,), int32
W_indptr  : (149001,) , int32
W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 132 seconds

Inner loop: 63 iterations
g(Z) = ||X-DZ||_2^2 = 2.619447e+04
rdiff: 0.00844531997372
i(Z) = ||Z||_1 = 4.153597e+02
j(Z) = tr(Z^TLZ) = 8.017119e+00

Global objective: 2.661785e+04

Outer loop: 3 iterations

Z in [0.0, 0.0081849405542]
Sparsity of Z: 412,527 non-zero entries out of 19,072,000 entries, i.e. 2.2%.

D in [0.00812118500471, 0.281162023544]
d in [0.999999582767, 1.00000023842]
Constraints on D: True

Datasets:
D : (128, 96)             , float32
X : (5, 100, 149, 2, 96)  , float32
Z : (5, 100, 149, 2, 128) , float32
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Overall time: 140 seconds

-------------------- Classification, ld = 1 --------------------

Software versions:
numpy: 1.8.2
sklearn: 0.14.1
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
D : (128, 96)               , float32
X : (5, 100, 149, 2, 96)    , float32
Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 298, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Feature vectors:
size: N=6,000 x n=128 -> 768,000 floats
dim: 1,536 features per clip
shape: (5, 100, 6, 2, 128)

5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 21.5 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 22.1 %
Clips accuracy: 22.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (500, 1536), float64
Labels: (500,), uint8
16 (+/- 4.8) <- [18 24 10 10 14 20 12 10 18 20]
16 (+/- 3.3) <- [12 12 22 18 18 14 14 20 14 14]
17 (+/- 2.5) <- [22 16 16 14 16 18 18 14 20 20]
16 (+/- 3.6) <- [16 14 16 16 22 12 12 12 18 22]
16 (+/- 3.3) <- [10 18 18 12 16 18 14 18 22 16]
16 (+/- 2.4) <- [14 16 18 16 18 12 12 20 16 16]
17 (+/- 4.8) <- [ 6 18 20 16 24 18 16 22 12 16]
17 (+/- 4.1) <- [18 22 18 16 22 18 12 20 18  8]
16 (+/- 3.3) <- [16 12 20 14 22 12 16 18 14 20]
18 (+/- 3.7) <- [16 24 24 14 14 18 16 16 20 14]
16 (+/- 3.5) <- [22 12 20 12 14 14 20 18 14 14]
16 (+/- 3.8) <- [20 16 18 18 10 20 20 14 12 10]
17 (+/- 2.7) <- [20 16 20 16 16 22 14 14 14 16]
17 (+/- 4.4) <- [16 18 14 20 26 14 22 12 12 14]
17 (+/- 5.3) <- [24 22 22 14 22 14 14  8 12 22]
16 (+/- 2.9) <- [16 22 18 10 18 14 16 16 16 18]
16 (+/- 2.7) <- [16 16 14 14 16 20 14 14 14 22]
17 (+/- 2.5) <- [18 22 20 14 20 14 16 16 18 16]
16 (+/- 4.1) <- [14 24 12 14 18  8 18 18 18 18]
16 (+/- 4.6) <- [20 18  6 20 18 18 18  8 16 18]
18 (+/- 2.5) <- [16 22 20 16 14 16 16 16 20 20]
17 (+/- 4.1) <- [14 12 24 18 16 20 18 22 16 10]
17 (+/- 3.0) <- [18 22 12 12 18 18 14 18 16 18]
16 (+/- 3.0) <- [10 18 16 18 18 14 22 16 16 14]
18 (+/- 1.9) <- [18 16 18 18 20 16 16 20 22 18]
16 (+/- 2.7) <- [16 12 20 18 12 18 18 18 14 14]
17 (+/- 2.6) <- [14 20 16 16 14 22 18 16 16 20]
16 (+/- 4.6) <- [20 18 16 18  6 16 24 14 12 16]
17 (+/- 3.1) <- [16 20 18 18 16 22 12 18 20 12]
19 (+/- 1.6) <- [18 18 18 18 20 18 22 20 18 16]
18 (+/- 1.9) <- [20 18 18 14 18 16 20 16 18 20]
17 (+/- 2.5) <- [14 14 18 12 18 20 16 20 18 16]
14 (+/- 3.5) <- [12 20  8 14 16 16 14 14 16  8]
17 (+/- 2.7) <- [18 16 14 16 20 18 12 16 22 18]
16 (+/- 2.4) <- [14 16 18 18 18 16 18 14 16 10]
17 (+/- 4.6) <- [18 16 12 18 20  8 18 14 16 26]
15 (+/- 3.4) <- [16 10 16 14 10 16 20 18 14 20]
15 (+/- 3.9) <- [12 20 20 20 14 18 14 12  8 16]
17 (+/- 2.5) <- [20 22 16 16 16 18 14 18 14 20]
17 (+/- 3.1) <- [20 14 18 12 14 14 20 18 16 22]
Accuracy: 16.6 (+/- 3.53)
Mean time (40 cv): 43.54 seconds
Overall time: 1749.48 seconds

-------------------- Features, ld = 10 --------------------

Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
Xa: (10, 100, 644, 2, 1024) , float32
Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
size: N=1,288,000 x n=96 -> 123,648,000 floats
dim: 123,648 features per clip
shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=96 -> 14,304,000 floats
dim: 28,608 features per clip
shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
K = 11
dm = cosine
Csigma = 1
diag = True
laplacian = normalized
Datasets:
L_data    : (2279908,), float32
L_indices : (2279908,), int32
L_indptr  : (149001,) , int32
L_shape   : (2,)      , int64
W_data    : (2279908,), float32
W_indices : (2279908,), int32
W_indptr  : (149001,) , int32
W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 2309 seconds

Inner loop: 1079 iterations
g(Z) = ||X-DZ||_2^2 = 7.238450e+04
rdiff: 0.000892518293893
i(Z) = ||Z||_1 = 5.829241e+04
j(Z) = tr(Z^TLZ) = 1.074934e+04

Global objective: 1.414263e+05

Outer loop: 6 iterations

Z in [-0.0834140777588, 0.889378726482]
Sparsity of Z: 3,604,776 non-zero entries out of 19,072,000 entries, i.e. 18.9%.

D in [-0.0359300412238, 0.915727615356]
d in [0.999999642372, 1.00000035763]
Constraints on D: True

Datasets:
D : (128, 96)             , float32
X : (5, 100, 149, 2, 96)  , float32
Z : (5, 100, 149, 2, 128) , float32
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Overall time: 2318 seconds

-------------------- Classification, ld = 10 --------------------

Software versions:
numpy: 1.8.2
sklearn: 0.14.1
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
D : (128, 96)               , float32
X : (5, 100, 149, 2, 96)    , float32
Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 298, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Feature vectors:
size: N=6,000 x n=128 -> 768,000 floats
dim: 1,536 features per clip
shape: (5, 100, 6, 2, 128)

5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 76.5 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 63.0 %
Clips accuracy: 71.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (500, 1536), float64
Labels: (500,), uint8
71 (+/- 6.0) <- [72 76 68 70 68 86 68 62 72 72]
73 (+/- 4.4) <- [68 74 80 68 78 78 74 70 68 70]
72 (+/- 6.0) <- [68 82 82 68 70 72 66 76 66 66]
69 (+/- 3.8) <- [64 68 68 70 66 70 74 72 76 64]
71 (+/- 3.4) <- [66 70 66 78 70 72 74 70 70 72]
71 (+/- 5.2) <- [66 82 70 76 76 70 70 70 64 66]
71 (+/- 6.0) <- [70 74 57 78 72 66 76 66 68 78]
72 (+/- 5.9) <- [66 68 72 68 86 64 76 74 72 72]
71 (+/- 7.6) <- [78 68 72 76 80 84 64 68 57 66]
72 (+/- 5.9) <- [78 70 80 68 76 68 70 68 60 78]
72 (+/- 5.9) <- [76 80 64 70 78 74 78 68 62 70]
69 (+/- 3.6) <- [68 62 74 74 70 70 64 68 70 68]
72 (+/- 8.2) <- [78 66 70 86 80 60 68 76 60 76]
72 (+/- 4.7) <- [76 68 76 74 80 70 74 68 68 64]
72 (+/- 5.5) <- [70 64 80 62 68 78 72 76 72 74]
74 (+/- 3.8) <- [76 80 68 72 68 74 72 76 72 78]
70 (+/- 4.1) <- [66 70 68 68 66 76 78 74 66 70]
73 (+/- 4.3) <- [76 80 72 66 72 68 78 70 76 70]
71 (+/- 6.2) <- [62 70 70 70 84 68 74 66 80 68]
70 (+/- 8.4) <- [64 84 60 82 68 68 60 66 80 70]
72 (+/- 7.3) <- [74 74 80 76 76 66 62 80 78 57]
70 (+/- 5.6) <- [66 70 80 76 60 72 70 66 76 68]
70 (+/- 4.8) <- [68 72 66 66 76 74 64 70 68 80]
71 (+/- 5.6) <- [68 70 66 76 80 64 80 66 68 74]
70 (+/- 5.7) <- [68 76 70 66 62 70 64 68 80 78]
72 (+/- 6.4) <- [70 68 62 70 68 78 78 70 70 86]
73 (+/- 5.5) <- [68 68 72 76 64 84 76 78 72 70]
71 (+/- 9.5) <- [64 70 74 86 54 72 62 68 82 82]
72 (+/- 6.7) <- [68 66 70 80 72 60 68 80 80 78]
69 (+/- 5.3) <- [72 62 68 57 72 66 72 70 74 76]
74 (+/- 6.1) <- [76 72 68 70 72 84 74 84 64 72]
70 (+/- 4.4) <- [68 74 66 80 64 68 70 70 70 74]
71 (+/- 2.6) <- [70 66 74 70 70 74 74 70 72 68]
72 (+/- 8.0) <- [86 76 78 60 70 72 57 76 68 76]
69 (+/- 8.8) <- [82 72 70 48 76 70 72 72 72 60]
73 (+/- 5.6) <- [66 70 70 78 70 74 72 64 80 82]
71 (+/- 5.9) <- [74 72 62 68 64 64 80 72 76 78]
71 (+/- 4.7) <- [66 80 70 76 68 68 74 72 64 74]
70 (+/- 4.5) <- [68 74 66 78 74 70 72 62 68 66]
71 (+/- 4.0) <- [66 72 66 78 72 68 70 76 70 76]
Accuracy: 71.3 (+/- 5.96)
Mean time (40 cv): 18.66 seconds
Overall time: 750.68 seconds

-------------------- Features, ld = 100 --------------------

Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
Xa: (10, 100, 644, 2, 1024) , float32
Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
size: N=1,288,000 x n=96 -> 123,648,000 floats
dim: 123,648 features per clip
shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=96 -> 14,304,000 floats
dim: 28,608 features per clip
shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
K = 11
dm = cosine
Csigma = 1
diag = True
laplacian = normalized
Datasets:
L_data    : (2279908,), float32
L_indices : (2279908,), int32
L_indptr  : (149001,) , int32
L_shape   : (2,)      , int64
W_data    : (2279908,), float32
W_indices : (2279908,), int32
W_indptr  : (149001,) , int32
W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 8804 seconds

Inner loop: 4632 iterations
g(Z) = ||X-DZ||_2^2 = 7.864703e+04
rdiff: 0.00310425725601
i(Z) = ||Z||_1 = 1.668191e+05
j(Z) = tr(Z^TLZ) = 9.046062e+04

Global objective: 3.359268e+05

Outer loop: 35 iterations

Z in [-0.906480252743, 1.10867500305]
Sparsity of Z: 7,934,225 non-zero entries out of 19,072,000 entries, i.e. 41.6%.

D in [-0.948690414429, 0.983948886395]
d in [0.999999642372, 1.00000023842]
Constraints on D: True

Datasets:
D : (128, 96)             , float32
X : (5, 100, 149, 2, 96)  , float32
Z : (5, 100, 149, 2, 128) , float32
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Overall time: 8814 seconds

-------------------- Classification, ld = 100 --------------------

Software versions:
numpy: 1.8.2
sklearn: 0.14.1
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
D : (128, 96)               , float32
X : (5, 100, 149, 2, 96)    , float32
Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 298, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Feature vectors:
size: N=6,000 x n=128 -> 768,000 floats
dim: 1,536 features per clip
shape: (5, 100, 6, 2, 128)

5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 76.6 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 64.7 %
Clips accuracy: 74.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (500, 1536), float64
Labels: (500,), uint8
74 (+/- 4.6) <- [74 76 76 70 66 82 72 70 80 74]
71 (+/- 5.6) <- [62 76 62 64 72 78 72 76 72 72]
74 (+/- 4.3) <- [78 78 80 70 74 72 66 78 74 70]
73 (+/- 6.0) <- [64 82 72 68 72 76 74 74 82 64]
73 (+/- 4.5) <- [64 78 68 78 78 74 74 70 74 70]
74 (+/- 7.0) <- [74 92 72 66 78 76 68 68 72 72]
73 (+/- 6.6) <- [62 66 66 76 84 76 68 78 74 78]
74 (+/- 3.8) <- [72 74 80 78 78 66 74 72 72 72]
73 (+/- 4.1) <- [74 72 78 74 78 64 70 78 72 74]
72 (+/- 6.4) <- [74 76 82 72 74 70 76 72 56 70]
71 (+/- 4.9) <- [70 66 66 76 78 68 76 66 68 78]
73 (+/- 3.2) <- [74 68 74 78 70 76 68 74 76 74]
73 (+/- 3.9) <- [80 70 74 78 74 74 68 70 68 70]
73 (+/- 6.1) <- [78 76 78 74 76 74 56 74 74 70]
72 (+/- 7.4) <- [76 62 80 56 72 78 72 80 72 76]
73 (+/- 6.9) <- [70 86 70 80 66 82 72 68 74 64]
71 (+/- 6.3) <- [62 76 68 70 66 84 78 68 66 74]
74 (+/- 5.9) <- [76 68 80 78 72 78 82 62 70 70]
74 (+/- 6.7) <- [68 76 80 76 84 72 74 74 76 57]
72 (+/- 3.9) <- [74 78 70 76 76 68 68 66 70 70]
72 (+/- 4.6) <- [72 72 74 74 80 66 64 76 74 68]
71 (+/- 5.9) <- [76 72 80 72 57 70 74 68 64 74]
71 (+/- 5.2) <- [70 64 78 68 74 78 76 66 64 74]
74 (+/- 9.3) <- [66 80 74 76 78 72 86 60 57 86]
71 (+/- 4.9) <- [76 70 80 70 68 64 66 72 70 78]
72 (+/- 6.1) <- [74 70 64 76 70 70 70 70 68 88]
73 (+/- 7.2) <- [66 74 76 78 56 82 68 78 76 72]
72 (+/- 5.2) <- [76 76 72 74 64 64 70 66 78 78]
72 (+/- 6.9) <- [70 70 66 76 66 60 74 76 86 76]
73 (+/- 3.5) <- [66 70 70 74 74 74 78 70 76 76]
71 (+/- 4.7) <- [66 62 68 70 70 76 76 72 68 78]
71 (+/- 6.7) <- [66 72 72 78 82 62 76 64 64 78]
71 (+/- 7.5) <- [70 78 80 60 80 78 68 57 68 70]
75 (+/- 9.9) <- [94 80 86 62 66 70 68 76 64 80]
72 (+/- 8.4) <- [88 70 76 56 78 72 72 68 76 62]
73 (+/- 3.1) <- [74 70 72 76 68 76 76 70 78 74]
72 (+/- 5.2) <- [72 68 66 78 78 64 72 74 68 80]
74 (+/- 6.0) <- [72 80 78 84 68 68 78 70 64 74]
72 (+/- 5.9) <- [70 72 84 76 66 78 74 62 70 72]
73 (+/- 7.1) <- [62 74 66 68 78 80 64 84 74 78]
Accuracy: 72.5 (+/- 6.08)
Mean time (40 cv): 37.65 seconds
Overall time: 1511.87 seconds

-------------------- Features, ld = 1000.0 --------------------

Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
Xa: (10, 100, 644, 2, 1024) , float32
Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
size: N=1,288,000 x n=96 -> 123,648,000 floats
dim: 123,648 features per clip
shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=96 -> 14,304,000 floats
dim: 28,608 features per clip
shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
K = 11
dm = cosine
Csigma = 1
diag = True
laplacian = normalized
Datasets:
L_data    : (2279908,), float32
L_indices : (2279908,), int32
L_indptr  : (149001,) , int32
L_shape   : (2,)      , int64
W_data    : (2279908,), float32
W_indices : (2279908,), int32
W_indptr  : (149001,) , int32
W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 11851 seconds

Inner loop: 7359 iterations
g(Z) = ||X-DZ||_2^2 = 1.860465e+04
rdiff: 0.000243997582579
i(Z) = ||Z||_1 = 3.820568e+05
j(Z) = tr(Z^TLZ) = 1.803437e+05

Global objective: 5.810052e+05

Outer loop: 50 iterations

Z in [-0.423380792141, 0.545486330986]
Sparsity of Z: 16,502,271 non-zero entries out of 19,072,000 entries, i.e. 86.5%.

D in [-0.345819354057, 0.441300600767]
d in [0.999999761581, 1.00000023842]
Constraints on D: True

Datasets:
D : (128, 96)             , float32
X : (5, 100, 149, 2, 96)  , float32
Z : (5, 100, 149, 2, 128) , float32
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Overall time: 11862 seconds

-------------------- Classification, ld = 1000.0 --------------------

Software versions:
numpy: 1.8.2
sklearn: 0.14.1
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
D : (128, 96)               , float32
X : (5, 100, 149, 2, 96)    , float32
Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 298, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Feature vectors:
size: N=6,000 x n=128 -> 768,000 floats
dim: 1,536 features per clip
shape: (5, 100, 6, 2, 128)

5 genres: blues, classical, country, disco, hiphop
Training data: (3600, 128), float64
Testing data: (2400, 128), float64
Training labels: (3600,), uint8
Testing labels: (2400,), uint8
Accuracy: 69.2 %
5 genres: blues, classical, country, disco, hiphop
Training data: (300, 1536), float64
Testing data: (200, 1536), float64
Training labels: (300,), uint8
Testing labels: (200,), uint8
Feature vectors accuracy: 57.2 %
Clips accuracy: 67.0 %
5 genres: blues, classical, country, disco, hiphop
Data: (500, 1536), float64
Labels: (500,), uint8
67 (+/- 5.9) <- [60 74 64 62 70 78 57 68 70 68]
68 (+/- 5.8) <- [74 68 64 54 70 74 68 68 64 74]
69 (+/- 5.9) <- [60 78 72 68 57 70 66 74 72 72]
68 (+/- 5.3) <- [64 74 74 66 56 74 68 66 70 68]
68 (+/- 6.4) <- [57 70 68 74 68 80 70 57 66 64]
69 (+/- 3.6) <- [68 70 76 68 74 68 64 66 66 66]
69 (+/- 4.9) <- [66 68 57 72 70 70 64 76 74 70]
69 (+/- 3.6) <- [72 68 64 70 74 62 70 68 72 72]
67 (+/- 4.1) <- [70 64 66 74 60 72 64 68 64 70]
69 (+/- 6.2) <- [76 62 78 60 70 70 74 72 60 66]
68 (+/- 5.0) <- [78 60 66 72 68 64 72 64 70 64]
68 (+/- 7.8) <- [70 57 74 80 72 64 54 72 76 64]
68 (+/- 6.2) <- [74 70 60 78 62 70 62 74 60 66]
68 (+/- 7.0) <- [60 68 78 68 78 72 57 70 66 57]
69 (+/- 6.6) <- [72 57 74 64 76 74 64 64 66 80]
70 (+/- 5.3) <- [64 80 66 70 72 72 62 64 74 72]
68 (+/- 4.0) <- [62 64 70 64 64 76 68 70 70 70]
68 (+/- 6.6) <- [70 66 76 64 72 68 82 57 64 64]
69 (+/- 5.7) <- [60 68 70 72 78 70 70 60 76 64]
67 (+/- 5.7) <- [64 68 54 76 74 64 66 66 68 68]
67 (+/- 4.7) <- [66 72 66 70 66 68 64 74 68 56]
68 (+/- 7.4) <- [78 74 68 62 66 54 72 68 76 57]
66 (+/- 4.5) <- [60 66 60 66 72 68 70 60 64 72]
69 (+/- 8.6) <- [62 74 72 80 74 68 82 56 56 68]
68 (+/- 3.4) <- [66 68 68 66 64 64 74 68 66 74]
67 (+/- 6.5) <- [60 66 64 66 70 70 74 60 62 82]
70 (+/- 4.9) <- [74 68 64 66 64 78 68 72 78 70]
68 (+/- 6.2) <- [62 76 76 72 62 66 57 68 68 76]
69 (+/- 6.9) <- [60 72 56 68 70 72 64 72 80 76]
67 (+/- 5.4) <- [68 66 66 57 60 68 78 66 64 72]
67 (+/- 5.3) <- [76 72 64 57 60 72 64 68 66 66]
69 (+/- 5.2) <- [66 74 72 78 64 66 66 66 62 76]
67 (+/- 4.5) <- [68 72 70 60 66 72 68 57 68 70]
71 (+/- 8.0) <- [82 74 76 72 64 68 54 72 64 80]
69 (+/- 6.3) <- [86 70 62 64 68 64 70 68 70 70]
70 (+/- 3.4) <- [72 68 70 68 70 78 74 66 68 68]
69 (+/- 6.3) <- [74 57 68 68 76 60 74 66 68 78]
69 (+/- 6.5) <- [74 76 72 74 57 68 76 66 60 62]
70 (+/- 7.5) <- [78 68 64 88 74 66 66 66 64 64]
68 (+/- 6.0) <- [66 70 56 74 66 66 76 76 62 66]
Accuracy: 68.3 (+/- 5.96)
Mean time (40 cv): 43.01 seconds
Overall time: 1727.59 seconds

-------------------- Features, ld = 10000.0 --------------------

Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
Xa: (10, 100, 644, 2, 1024) , float32
Xs: (10, 100, 644, 2, 96)   , float32
Full dataset:
size: N=1,288,000 x n=96 -> 123,648,000 floats
dim: 123,648 features per clip
shape: (10, 100, 644, 2, 96)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=96 -> 14,304,000 floats
dim: 28,608 features per clip
shape: (5, 100, 149, 2, 96)
<type 'numpy.ndarray'>
Data: (149000, 96), float32
Attributes:
K = 11
dm = cosine
Csigma = 1
diag = True
laplacian = normalized
Datasets:
L_data    : (2279908,), float32
L_indices : (2279908,), int32
L_indptr  : (149001,) , int32
L_shape   : (2,)      , int64
W_data    : (2279908,), float32
W_indices : (2279908,), int32
W_indptr  : (149001,) , int32
W_shape   : (2,)      , int64
Size X: 13.6 M --> 54.6 MiB
Size Z: 18.2 M --> 72.8 MiB
Size D: 12.0 k --> 48.0 kiB
Size E: 12.0 k --> 48.0 kiB
Elapsed time: 25790 seconds

Inner loop: 15951 iterations
g(Z) = ||X-DZ||_2^2 = 3.124855e+03
rdiff: 0.0119490451035
i(Z) = ||Z||_1 = 6.699093e+05
j(Z) = tr(Z^TLZ) = 4.560384e+05

Global objective: 1.129073e+06

Outer loop: 50 iterations

Z in [-0.83953756094, 0.733539998531]
Sparsity of Z: 17,965,510 non-zero entries out of 19,072,000 entries, i.e. 94.2%.

D in [-0.133339911699, 0.277615457773]
d in [0.911229729652, 1.00000023842]
Constraints on D: True

Datasets:
D : (128, 96)             , float32
X : (5, 100, 149, 2, 96)  , float32
Z : (5, 100, 149, 2, 128) , float32
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Overall time: 25802 seconds

-------------------- Classification, ld = 10000.0 --------------------

Software versions:
numpy: 1.8.2
sklearn: 0.14.1
Attributes:
sr = 22050
labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop'
'reggae' 'rock']
Datasets:
D : (128, 96)               , float32
X : (5, 100, 149, 2, 96)    , float32
Z : (5, 100, 149, 2, 128)   , float32
Full dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<class 'h5py._hl.dataset.Dataset'>
Reduced dataset:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 149, 2, 128)
<type 'numpy.ndarray'>
Flattened frames:
size: N=149,000 x n=128 -> 19,072,000 floats
dim: 38,144 features per clip
shape: (5, 100, 298, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Truncated and grouped:
size: N=135,000 x n=128 -> 17,280,000 floats
dim: 34,560 features per clip
shape: (5, 100, 6, 45, 128)
Feature vectors:
size: N=6,000 x n=128 -> 768,000 floats
dim: 1,536 features per clip
shape: (5, 100, 6, 2, 128)