Goal: observe the effect of $\lambda_g$ on a small dataset.
Conclusion: no ideal value for $\lambda_g$ has been found. Gave the idea that we want to trade smoothness with sparsity.
Observations:
time_features = [1915, 3067, 3095, 3485, 3213, 916]
inner_iterations = [1555, 1711, 1610, 1683, 1509, 425]
outer_iterations = [26, 29, 14, 10, 12, 3]
sparsity = [4.6, 5.1, 7.7, 18.4, 46.2, 88.8]
objective_g = [4.156377e+04, 4.197443e+04, 4.792657e+04, 7.133035e+04, 1.145442e+05, 1.600268e+05]
objective_i = [6.914378e+04, 6.879492e+04, 6.621795e+04, 5.868650e+04, 4.573782e+04, 3.421723e+04]
objective_j = [0, 1.349397e+03, 6.038886e+03, 1.143229e+04, 1.177662e+04, 6.200758e+03]
accuracy = [73, 74, 72, 73, 69, 53]
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def plot(*args, **kwargs):
plt.figure(figsize=(8,5))
x = range(len(Pvalues))
log = 'log' in kwargs and kwargs['log'] is True
pltfunc = plt.semilogy if log else plt.plot
for var in args:
pltfunc(x, globals()[var], '.-', label=var)
plt.xlim(0, len(Pvalues)-1)
plt.title('{} vs {}'.format(', '.join(args), Pname))
plt.xlabel(Pname)
plt.ylabel(' ,'.join(args))
plt.xticks(x, Pvalues)
plt.grid(True); plt.legend(loc='best'); plt.show()
plot('accuracy')
plot('objective_g', 'objective_i', 'objective_j', log=True)
plot('sparsity')
plot('time_features')
plot('inner_iterations')
plot('outer_iterations')
Pname = 'lg'
Pvalues = [None, 1, 10, 1e2, 1e3, 1e4]
# Regenerate the graph or the features at each iteration.
regen_graph = False
regen_features = True
p = {}
# Preprocessing.
# Graph.
p['K'] = 10 + 1 # 5 to 10 + 1 for self-reference
p['dm'] = 'cosine'
p['Csigma'] = 1
p['diag'] = True
p['laplacian'] = 'normalized'
# Feature extraction.
p['m'] = 128 # 64, 128, 512
p['ld'] = 10
p['le'] = None
p['lg'] = 100
# Classification.
p['scale'] = None
p['Nvectors'] = 6
p['svm_type'] = 'C'
p['kernel'] = 'linear'
p['C'] = 1
p['nu'] = 0.5
# Dataset (10,100,644 | 5,100,149 | 2,10,644).
p['Ngenres'] = 5
p['Nclips'] = 100
p['Nframes'] = 149
# Graph.
p['tol'] = 1e-5
# Feature extraction.
p['rtol'] = 1e-6 # 1e-3, 1e-5, 1e-7
p['N_outer'] = 40 # 10, 15, 20
# Classification.
p['Ncv'] = 10
p['dataset_classification'] = 'Z'
import numpy as np
import time
texperiment = time.time()
def separator():
print('\n' + 50 * '-' + '\n')
# Fair comparison when tuning parameters.
np.random.seed(1)
#%run gtzan.ipynb
#%run audio_preprocessing.ipynb
if not regen_graph:
%run audio_graph.ipynb
separator()
if not regen_features:
%run audio_features.ipynb
separator()
# Hyper-parameter under test.
for p[Pname] in Pvalues:
if regen_graph:
%run audio_graph.ipynb
separator()
if regen_features:
%run audio_features.ipynb
separator()
%run audio_classification.ipynb
separator()
# Baseline, i.e. classification with spectrograms.
p['dataset_classification'] = 'X'
p['scale'] = 'minmax' # Todo: should be done in pre-processing.
if not regen_graph and not regen_features:
# Classifier parameters are being tested.
for p[Pname] in Pvalues:
%run audio_classification.ipynb
separator()
else:
%run audio_classification.ipynb
separator()
print('Experiment time: {:.0f} seconds'.format(time.time() - texperiment))
Data: (149000, 96), float32 Elapsed time: 174.13 seconds All self-referenced in the first column: True dist in [0.0, 0.418610453606] w in [0.0562636144459, 1.0] Ones on the diagonal: 149000 (over 149000) assert: True W in [0.0, 1.0] Datasets: L_data : (2367336,), float32 L_indices : (2367336,), int32 L_indptr : (149001,) , int32 L_shape : (2,) , int64 W_data : (2367336,), float32 W_indices : (2367336,), int32 W_indptr : (149001,) , int32 W_shape : (2,) , int64 Attributes: K = 11 dm = cosine Csigma = 1 diag = True laplacian = normalized Overall time: 183.07 seconds -------------------------------------------------- Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: Xa: (10, 100, 644, 2, 1024) , float32 Xs: (10, 100, 644, 2, 96) , float32 Full dataset: size: N=1,288,000 x n=96 -> 123,648,000 floats dim: 123,648 features per clip shape: (10, 100, 644, 2, 96) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 149, 2, 96) <type 'numpy.ndarray'> Data: (149000, 96), float32 Attributes: K = 11 dm = cosine Csigma = 1 diag = True laplacian = normalized Datasets: L_data : (2367336,), float32 L_indices : (2367336,), int32 L_indptr : (149001,) , int32 L_shape : (2,) , int64 W_data : (2367336,), float32 W_indices : (2367336,), int32 W_indptr : (149001,) , int32 W_shape : (2,) , int64 Size X: 13.6 M --> 54.6 MiB Size Z: 18.2 M --> 72.8 MiB Size D: 12.0 k --> 48.0 kiB Size E: 12.0 k --> 48.0 kiB Elapsed time: 1915 seconds
Inner loop: 1555 iterations g(Z) = ||X-DZ||_2^2 = 4.156377e+04 rdiff: 0.00281476331612 i(Z) = ||Z||_1 = 6.914378e+04
Global objective: 1.107076e+05
Outer loop: 26 iterations Z in [-1.25696384907, 1.62027561665] Sparsity of Z: 875,165 non-zero entries out of 19,072,000 entries, i.e. 4.6%.
D in [-0.809242725372, 0.939565658569] d in [0.999999642372, 1.00000035763] Constraints on D: True
Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Overall time: 1922 seconds -------------------------------------------------- Software versions: numpy: 1.8.2 sklearn: 0.14.1 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Full dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <type 'numpy.ndarray'> Flattened frames: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 298, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Feature vectors: size: N=6,000 x n=128 -> 768,000 floats dim: 1,536 features per clip shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop Training data: (3600, 128), float64 Testing data: (2400, 128), float64 Training labels: (3600,), uint8 Testing labels: (2400,), uint8 Accuracy: 74.1 % 5 genres: blues, classical, country, disco, hiphop Training data: (300, 1536), float64 Testing data: (200, 1536), float64 Training labels: (300,), uint8 Testing labels: (200,), uint8 Feature vectors accuracy: 62.3 % Clips accuracy: 70.5 % 5 genres: blues, classical, country, disco, hiphop Data: (500, 1536), float64 Labels: (500,), uint8 71 (+/- 7.5) <- [ 0.7 0.8 0.66 0.58 0.68 0.76 0.72 0.6 0.78 0.8 ] 70 (+/- 6.5) <- [ 0.66 0.68 0.74 0.56 0.76 0.72 0.76 0.74 0.62 0.76] 71 (+/- 5.5) <- [ 0.76 0.72 0.8 0.74 0.66 0.66 0.62 0.76 0.74 0.66] 73 (+/- 4.8) <- [ 0.7 0.7 0.72 0.7 0.68 0.72 0.7 0.82 0.82 0.7 ] 71 (+/- 4.8) <- [ 0.64 0.7 0.78 0.8 0.68 0.7 0.72 0.68 0.76 0.68] 71 (+/- 4.7) <- [ 0.66 0.8 0.72 0.62 0.7 0.76 0.7 0.74 0.7 0.72] 70 (+/- 5.4) <- [ 0.64 0.7 0.62 0.72 0.7 0.76 0.64 0.66 0.72 0.8 ] 72 (+/- 5.0) <- [ 0.76 0.68 0.64 0.7 0.82 0.7 0.74 0.72 0.74 0.66] 71 (+/- 5.1) <- [ 0.76 0.6 0.64 0.68 0.76 0.72 0.7 0.74 0.7 0.76] 70 (+/- 5.3) <- [ 0.74 0.66 0.74 0.64 0.76 0.74 0.68 0.6 0.68 0.76] Mean time (10 cv): 29.66 seconds Overall time: 301.45 seconds -------------------------------------------------- The memory_profiler extension is already loaded. To reload it, use: %reload_ext memory_profiler The line_profiler extension is already loaded. To reload it, use: %reload_ext line_profiler Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: Xa: (10, 100, 644, 2, 1024) , float32 Xs: (10, 100, 644, 2, 96) , float32 Full dataset: size: N=1,288,000 x n=96 -> 123,648,000 floats dim: 123,648 features per clip shape: (10, 100, 644, 2, 96) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 149, 2, 96) <type 'numpy.ndarray'> Data: (149000, 96), float32 Attributes: K = 11 dm = cosine Csigma = 1 diag = True laplacian = normalized Datasets: L_data : (2367336,), float32 L_indices : (2367336,), int32 L_indptr : (149001,) , int32 L_shape : (2,) , int64 W_data : (2367336,), float32 W_indices : (2367336,), int32 W_indptr : (149001,) , int32 W_shape : (2,) , int64 Size X: 13.6 M --> 54.6 MiB Size Z: 18.2 M --> 72.8 MiB Size D: 12.0 k --> 48.0 kiB Size E: 12.0 k --> 48.0 kiB Elapsed time: 3067 seconds
Inner loop: 1711 iterations g(Z) = ||X-DZ||_2^2 = 4.197443e+04 rdiff: 0.00267776003901 i(Z) = ||Z||_1 = 6.879492e+04 j(Z) = tr(Z^TLZ) = 1.349397e+03
Global objective: 1.121188e+05
Outer loop: 29 iterations Z in [-1.05637359619, 1.47232508659] Sparsity of Z: 970,772 non-zero entries out of 19,072,000 entries, i.e. 5.1%.
D in [-0.818693161011, 0.937262773514] d in [0.999999582767, 1.00000047684] Constraints on D: True
Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Overall time: 3074 seconds -------------------------------------------------- Software versions: numpy: 1.8.2 sklearn: 0.14.1 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Full dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <type 'numpy.ndarray'> Flattened frames: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 298, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Feature vectors: size: N=6,000 x n=128 -> 768,000 floats dim: 1,536 features per clip shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop Training data: (3600, 128), float64 Testing data: (2400, 128), float64 Training labels: (3600,), uint8 Testing labels: (2400,), uint8 Accuracy: 74.1 % 5 genres: blues, classical, country, disco, hiphop Training data: (300, 1536), float64 Testing data: (200, 1536), float64 Training labels: (300,), uint8 Testing labels: (200,), uint8 Feature vectors accuracy: 61.7 % Clips accuracy: 69.0 % 5 genres: blues, classical, country, disco, hiphop Data: (500, 1536), float64 Labels: (500,), uint8 72 (+/- 6.9) <- [ 0.64 0.82 0.68 0.6 0.72 0.8 0.68 0.68 0.76 0.78] 71 (+/- 5.9) <- [ 0.66 0.72 0.7 0.64 0.78 0.84 0.72 0.68 0.66 0.66] 71 (+/- 4.7) <- [ 0.74 0.72 0.76 0.74 0.7 0.68 0.62 0.74 0.78 0.66] 74 (+/- 5.5) <- [ 0.72 0.72 0.72 0.72 0.66 0.76 0.72 0.8 0.86 0.68] 72 (+/- 7.6) <- [ 0.56 0.66 0.76 0.78 0.66 0.72 0.78 0.68 0.84 0.72] 72 (+/- 4.6) <- [ 0.68 0.82 0.72 0.64 0.76 0.72 0.76 0.7 0.72 0.72] 71 (+/- 6.9) <- [ 0.6 0.8 0.62 0.76 0.72 0.74 0.6 0.74 0.74 0.76] 73 (+/- 4.2) <- [ 0.78 0.74 0.7 0.72 0.82 0.72 0.76 0.68 0.72 0.68] 70 (+/- 5.4) <- [ 0.7 0.64 0.68 0.68 0.76 0.74 0.58 0.7 0.74 0.76] 70 (+/- 4.3) <- [ 0.74 0.7 0.74 0.66 0.72 0.74 0.66 0.62 0.68 0.76] Mean time (10 cv): 28.96 seconds Overall time: 294.16 seconds -------------------------------------------------- The memory_profiler extension is already loaded. To reload it, use: %reload_ext memory_profiler The line_profiler extension is already loaded. To reload it, use: %reload_ext line_profiler Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: Xa: (10, 100, 644, 2, 1024) , float32 Xs: (10, 100, 644, 2, 96) , float32 Full dataset: size: N=1,288,000 x n=96 -> 123,648,000 floats dim: 123,648 features per clip shape: (10, 100, 644, 2, 96) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 149, 2, 96) <type 'numpy.ndarray'> Data: (149000, 96), float32 Attributes: K = 11 dm = cosine Csigma = 1 diag = True laplacian = normalized Datasets: L_data : (2367336,), float32 L_indices : (2367336,), int32 L_indptr : (149001,) , int32 L_shape : (2,) , int64 W_data : (2367336,), float32 W_indices : (2367336,), int32 W_indptr : (149001,) , int32 W_shape : (2,) , int64 Size X: 13.6 M --> 54.6 MiB Size Z: 18.2 M --> 72.8 MiB Size D: 12.0 k --> 48.0 kiB Size E: 12.0 k --> 48.0 kiB Elapsed time: 3095 seconds
Inner loop: 1610 iterations g(Z) = ||X-DZ||_2^2 = 4.792657e+04 rdiff: 0.00346171679269 i(Z) = ||Z||_1 = 6.621795e+04 j(Z) = tr(Z^TLZ) = 6.038886e+03
Global objective: 1.201834e+05
Outer loop: 14 iterations Z in [-0.477136224508, 1.17480039597] Sparsity of Z: 1,475,033 non-zero entries out of 19,072,000 entries, i.e. 7.7%.
D in [-0.740027308464, 0.912485957146] d in [0.999999642372, 1.00000023842] Constraints on D: True
Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Overall time: 3102 seconds -------------------------------------------------- Software versions: numpy: 1.8.2 sklearn: 0.14.1 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Full dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <type 'numpy.ndarray'> Flattened frames: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 298, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Feature vectors: size: N=6,000 x n=128 -> 768,000 floats dim: 1,536 features per clip shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop Training data: (3600, 128), float64 Testing data: (2400, 128), float64 Training labels: (3600,), uint8 Testing labels: (2400,), uint8 Accuracy: 75.0 % 5 genres: blues, classical, country, disco, hiphop Training data: (300, 1536), float64 Testing data: (200, 1536), float64 Training labels: (300,), uint8 Testing labels: (200,), uint8 Feature vectors accuracy: 61.1 % Clips accuracy: 70.0 % 5 genres: blues, classical, country, disco, hiphop Data: (500, 1536), float64 Labels: (500,), uint8 70 (+/- 4.2) <- [ 0.68 0.76 0.7 0.66 0.64 0.74 0.68 0.64 0.74 0.74] 69 (+/- 8.4) <- [ 0.62 0.7 0.76 0.54 0.82 0.8 0.72 0.68 0.6 0.68] 71 (+/- 4.3) <- [ 0.74 0.68 0.74 0.66 0.7 0.7 0.64 0.78 0.76 0.74] 71 (+/- 7.5) <- [ 0.56 0.68 0.7 0.7 0.62 0.7 0.74 0.82 0.82 0.72] 71 (+/- 4.2) <- [ 0.62 0.68 0.7 0.76 0.74 0.7 0.76 0.7 0.76 0.7 ] 72 (+/- 5.0) <- [ 0.66 0.74 0.72 0.66 0.82 0.72 0.74 0.76 0.68 0.66] 70 (+/- 5.4) <- [ 0.64 0.74 0.68 0.74 0.66 0.68 0.62 0.72 0.78 0.78] 70 (+/- 4.6) <- [ 0.66 0.68 0.68 0.74 0.78 0.64 0.72 0.74 0.64 0.74] 71 (+/- 4.7) <- [ 0.78 0.66 0.66 0.76 0.7 0.72 0.62 0.72 0.74 0.7 ] 70 (+/- 5.0) <- [ 0.8 0.68 0.74 0.7 0.68 0.68 0.66 0.6 0.72 0.72] Mean time (10 cv): 24.68 seconds Overall time: 251.12 seconds -------------------------------------------------- The memory_profiler extension is already loaded. To reload it, use: %reload_ext memory_profiler The line_profiler extension is already loaded. To reload it, use: %reload_ext line_profiler Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: Xa: (10, 100, 644, 2, 1024) , float32 Xs: (10, 100, 644, 2, 96) , float32 Full dataset: size: N=1,288,000 x n=96 -> 123,648,000 floats dim: 123,648 features per clip shape: (10, 100, 644, 2, 96) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 149, 2, 96) <type 'numpy.ndarray'> Data: (149000, 96), float32 Attributes: K = 11 dm = cosine Csigma = 1 diag = True laplacian = normalized Datasets: L_data : (2367336,), float32 L_indices : (2367336,), int32 L_indptr : (149001,) , int32 L_shape : (2,) , int64 W_data : (2367336,), float32 W_indices : (2367336,), int32 W_indptr : (149001,) , int32 W_shape : (2,) , int64 Size X: 13.6 M --> 54.6 MiB Size Z: 18.2 M --> 72.8 MiB Size D: 12.0 k --> 48.0 kiB Size E: 12.0 k --> 48.0 kiB Elapsed time: 3485 seconds
Inner loop: 1683 iterations g(Z) = ||X-DZ||_2^2 = 7.133035e+04 rdiff: 0.00123024645659 i(Z) = ||Z||_1 = 5.868650e+04 j(Z) = tr(Z^TLZ) = 1.143229e+04
Global objective: 1.414491e+05
Outer loop: 10 iterations Z in [-0.0622419342399, 1.01885557175] Sparsity of Z: 3,506,138 non-zero entries out of 19,072,000 entries, i.e. 18.4%.
D in [-0.0309991780668, 0.920943379402] d in [0.999999582767, 1.00000035763] Constraints on D: True
Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Overall time: 3492 seconds -------------------------------------------------- Software versions: numpy: 1.8.2 sklearn: 0.14.1 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Full dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <type 'numpy.ndarray'> Flattened frames: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 298, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Feature vectors: size: N=6,000 x n=128 -> 768,000 floats dim: 1,536 features per clip shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop Training data: (3600, 128), float64 Testing data: (2400, 128), float64 Training labels: (3600,), uint8 Testing labels: (2400,), uint8 Accuracy: 76.0 % 5 genres: blues, classical, country, disco, hiphop Training data: (300, 1536), float64 Testing data: (200, 1536), float64 Training labels: (300,), uint8 Testing labels: (200,), uint8 Feature vectors accuracy: 63.7 % Clips accuracy: 70.0 % 5 genres: blues, classical, country, disco, hiphop Data: (500, 1536), float64 Labels: (500,), uint8 73 (+/- 4.4) <- [ 0.76 0.72 0.68 0.7 0.7 0.82 0.7 0.68 0.78 0.72] 72 (+/- 3.5) <- [ 0.72 0.76 0.76 0.64 0.74 0.72 0.72 0.74 0.72 0.68] 73 (+/- 5.1) <- [ 0.76 0.82 0.78 0.74 0.78 0.66 0.66 0.74 0.7 0.7 ] 72 (+/- 4.6) <- [ 0.64 0.78 0.68 0.74 0.68 0.74 0.74 0.74 0.8 0.7 ] 71 (+/- 4.9) <- [ 0.74 0.78 0.64 0.78 0.7 0.7 0.76 0.64 0.72 0.68] 73 (+/- 3.4) <- [ 0.7 0.8 0.74 0.72 0.74 0.72 0.74 0.68 0.74 0.68] 73 (+/- 5.0) <- [ 0.7 0.76 0.64 0.76 0.7 0.68 0.72 0.78 0.74 0.82] 73 (+/- 5.7) <- [ 0.68 0.68 0.68 0.8 0.82 0.68 0.8 0.78 0.72 0.68] 71 (+/- 5.0) <- [ 0.76 0.62 0.68 0.78 0.72 0.76 0.68 0.76 0.66 0.7 ] 73 (+/- 3.7) <- [ 0.78 0.74 0.78 0.72 0.78 0.7 0.72 0.72 0.66 0.74] Mean time (10 cv): 18.83 seconds Overall time: 192.12 seconds -------------------------------------------------- The memory_profiler extension is already loaded. To reload it, use: %reload_ext memory_profiler The line_profiler extension is already loaded. To reload it, use: %reload_ext line_profiler Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: Xa: (10, 100, 644, 2, 1024) , float32 Xs: (10, 100, 644, 2, 96) , float32 Full dataset: size: N=1,288,000 x n=96 -> 123,648,000 floats dim: 123,648 features per clip shape: (10, 100, 644, 2, 96) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 149, 2, 96) <type 'numpy.ndarray'> Data: (149000, 96), float32 Attributes: K = 11 dm = cosine Csigma = 1 diag = True laplacian = normalized Datasets: L_data : (2367336,), float32 L_indices : (2367336,), int32 L_indptr : (149001,) , int32 L_shape : (2,) , int64 W_data : (2367336,), float32 W_indices : (2367336,), int32 W_indptr : (149001,) , int32 W_shape : (2,) , int64 Size X: 13.6 M --> 54.6 MiB Size Z: 18.2 M --> 72.8 MiB Size D: 12.0 k --> 48.0 kiB Size E: 12.0 k --> 48.0 kiB Elapsed time: 3213 seconds
Inner loop: 1509 iterations g(Z) = ||X-DZ||_2^2 = 1.145442e+05 rdiff: 0.000884876305308 i(Z) = ||Z||_1 = 4.573782e+04 j(Z) = tr(Z^TLZ) = 1.177662e+04
Global objective: 1.720586e+05
Outer loop: 12 iterations Z in [-0.0212081614882, 0.280105501413] Sparsity of Z: 8,819,964 non-zero entries out of 19,072,000 entries, i.e. 46.2%.
D in [-0.0482563264668, 0.844439685345] d in [0.999999642372, 1.00000035763] Constraints on D: True
Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Overall time: 3220 seconds -------------------------------------------------- Software versions: numpy: 1.8.2 sklearn: 0.14.1 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Full dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <type 'numpy.ndarray'> Flattened frames: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 298, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Feature vectors: size: N=6,000 x n=128 -> 768,000 floats dim: 1,536 features per clip shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop Training data: (3600, 128), float64 Testing data: (2400, 128), float64 Training labels: (3600,), uint8 Testing labels: (2400,), uint8 Accuracy: 70.0 % 5 genres: blues, classical, country, disco, hiphop Training data: (300, 1536), float64 Testing data: (200, 1536), float64 Training labels: (300,), uint8 Testing labels: (200,), uint8 Feature vectors accuracy: 62.4 % Clips accuracy: 67.0 % 5 genres: blues, classical, country, disco, hiphop Data: (500, 1536), float64 Labels: (500,), uint8 68 (+/- 8.2) <- [ 0.64 0.82 0.7 0.68 0.66 0.82 0.58 0.56 0.66 0.7 ] 68 (+/- 3.6) <- [ 0.6 0.68 0.68 0.66 0.72 0.7 0.7 0.74 0.66 0.68] 69 (+/- 5.7) <- [ 0.68 0.74 0.82 0.62 0.66 0.64 0.72 0.64 0.72 0.68] 68 (+/- 4.6) <- [ 0.6 0.64 0.76 0.68 0.7 0.68 0.74 0.66 0.64 0.7 ] 67 (+/- 4.2) <- [ 0.72 0.68 0.7 0.62 0.66 0.66 0.68 0.62 0.76 0.64] 67 (+/- 6.1) <- [ 0.66 0.68 0.76 0.72 0.64 0.54 0.74 0.68 0.62 0.7 ] 68 (+/- 4.0) <- [ 0.68 0.7 0.6 0.7 0.6 0.7 0.68 0.68 0.72 0.7 ] 68 (+/- 7.1) <- [ 0.6 0.64 0.72 0.64 0.84 0.6 0.74 0.68 0.62 0.7 ] 69 (+/- 4.9) <- [ 0.72 0.68 0.64 0.7 0.74 0.78 0.6 0.66 0.7 0.72] 67 (+/- 5.9) <- [ 0.76 0.64 0.8 0.68 0.66 0.64 0.64 0.62 0.6 0.68] Mean time (10 cv): 18.98 seconds Overall time: 194.10 seconds -------------------------------------------------- The memory_profiler extension is already loaded. To reload it, use: %reload_ext memory_profiler The line_profiler extension is already loaded. To reload it, use: %reload_ext line_profiler Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: Xa: (10, 100, 644, 2, 1024) , float32 Xs: (10, 100, 644, 2, 96) , float32 Full dataset: size: N=1,288,000 x n=96 -> 123,648,000 floats dim: 123,648 features per clip shape: (10, 100, 644, 2, 96) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 149, 2, 96) <type 'numpy.ndarray'> Data: (149000, 96), float32 Attributes: K = 11 dm = cosine Csigma = 1 diag = True laplacian = normalized Datasets: L_data : (2367336,), float32 L_indices : (2367336,), int32 L_indptr : (149001,) , int32 L_shape : (2,) , int64 W_data : (2367336,), float32 W_indices : (2367336,), int32 W_indptr : (149001,) , int32 W_shape : (2,) , int64 Size X: 13.6 M --> 54.6 MiB Size Z: 18.2 M --> 72.8 MiB Size D: 12.0 k --> 48.0 kiB Size E: 12.0 k --> 48.0 kiB Elapsed time: 916 seconds
Inner loop: 425 iterations g(Z) = ||X-DZ||_2^2 = 1.600268e+05 rdiff: 0.000294506304058 i(Z) = ||Z||_1 = 3.421723e+04 j(Z) = tr(Z^TLZ) = 6.200758e+03
Global objective: 2.004448e+05
Outer loop: 3 iterations Z in [-1.27479197545e-05, 0.0141088813543] Sparsity of Z: 16,930,816 non-zero entries out of 19,072,000 entries, i.e. 88.8%.
D in [0.0195936579257, 0.24978607893] d in [0.999999642372, 1.00000023842] Constraints on D: True
Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Overall time: 923 seconds -------------------------------------------------- Software versions: numpy: 1.8.2 sklearn: 0.14.1 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Full dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 149, 2, 128) <type 'numpy.ndarray'> Flattened frames: size: N=149,000 x n=128 -> 19,072,000 floats dim: 38,144 features per clip shape: (5, 100, 298, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Truncated and grouped: size: N=135,000 x n=128 -> 17,280,000 floats dim: 34,560 features per clip shape: (5, 100, 6, 45, 128) Feature vectors: size: N=6,000 x n=128 -> 768,000 floats dim: 1,536 features per clip shape: (5, 100, 6, 2, 128)
5 genres: blues, classical, country, disco, hiphop Training data: (3600, 128), float64 Testing data: (2400, 128), float64 Training labels: (3600,), uint8 Testing labels: (2400,), uint8 Accuracy: 52.4 % 5 genres: blues, classical, country, disco, hiphop Training data: (300, 1536), float64 Testing data: (200, 1536), float64 Training labels: (300,), uint8 Testing labels: (200,), uint8 Feature vectors accuracy: 50.7 % Clips accuracy: 51.0 % 5 genres: blues, classical, country, disco, hiphop Data: (500, 1536), float64 Labels: (500,), uint8 51 (+/- 7.1) <- [ 0.6 0.52 0.44 0.4 0.56 0.54 0.6 0.48 0.58 0.42] 52 (+/- 4.6) <- [ 0.44 0.54 0.58 0.46 0.56 0.5 0.58 0.56 0.52 0.5 ] 52 (+/- 7.4) <- [ 0.52 0.46 0.56 0.54 0.4 0.48 0.46 0.66 0.62 0.5 ] 52 (+/- 5.9) <- [ 0.48 0.4 0.56 0.5 0.54 0.58 0.54 0.5 0.62 0.48] 51 (+/- 6.9) <- [ 0.5 0.54 0.54 0.56 0.46 0.42 0.6 0.4 0.62 0.48] 51 (+/- 8.6) <- [ 0.4 0.6 0.58 0.54 0.46 0.4 0.48 0.66 0.42 0.56] 52 (+/- 6.5) <- [ 0.42 0.6 0.46 0.46 0.58 0.54 0.56 0.52 0.44 0.6 ] 53 (+/- 5.1) <- [ 0.5 0.5 0.62 0.48 0.6 0.46 0.48 0.56 0.54 0.54] 52 (+/- 8.3) <- [ 0.66 0.34 0.48 0.54 0.6 0.52 0.44 0.56 0.52 0.5 ] 52 (+/- 7.2) <- [ 0.46 0.48 0.56 0.42 0.6 0.6 0.6 0.4 0.56 0.5 ] Mean time (10 cv): 26.70 seconds Overall time: 272.53 seconds -------------------------------------------------- Software versions: numpy: 1.8.2 sklearn: 0.14.1 Attributes: sr = 22050 labels = ['blues' 'classical' 'country' 'disco' 'hiphop' 'jazz' 'metal' 'pop' 'reggae' 'rock'] Datasets: D : (128, 96) , float32 X : (5, 100, 149, 2, 96) , float32 Z : (5, 100, 149, 2, 128) , float32 Full dataset: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 149, 2, 96) <class 'h5py._hl.dataset.Dataset'> Reduced dataset: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 149, 2, 96) <type 'numpy.ndarray'> Flattened frames: size: N=149,000 x n=96 -> 14,304,000 floats dim: 28,608 features per clip shape: (5, 100, 298, 96) Truncated and grouped: size: N=135,000 x n=96 -> 12,960,000 floats dim: 25,920 features per clip shape: (5, 100, 6, 45, 96) Truncated and grouped: size: N=135,000 x n=96 -> 12,960,000 floats dim: 25,920 features per clip shape: (5, 100, 6, 45, 96) Feature vectors: size: N=6,000 x n=96 -> 576,000 floats dim: 1,152 features per clip shape: (5, 100, 6, 2, 96)
5 genres: blues, classical, country, disco, hiphop Training data: (3600, 96), float64 Testing data: (2400, 96), float64 Training labels: (3600,), uint8 Testing labels: (2400,), uint8 Accuracy: 68.8 % 5 genres: blues, classical, country, disco, hiphop Training data: (300, 1152), float64 Testing data: (200, 1152), float64 Training labels: (300,), uint8 Testing labels: (200,), uint8 Feature vectors accuracy: 62.1 % Clips accuracy: 68.5 % 5 genres: blues, classical, country, disco, hiphop Data: (500, 1152), float64 Labels: (500,), uint8 67 (+/- 8.0) <- [ 0.72 0.74 0.6 0.48 0.62 0.74 0.72 0.66 0.66 0.74] 66 (+/- 6.5) <- [ 0.7 0.6 0.62 0.52 0.64 0.7 0.72 0.68 0.66 0.76] 68 (+/- 6.7) <- [ 0.74 0.68 0.84 0.62 0.68 0.7 0.62 0.62 0.7 0.62] 68 (+/- 5.5) <- [ 0.6 0.72 0.62 0.66 0.74 0.74 0.72 0.64 0.76 0.64] 67 (+/- 5.1) <- [ 0.58 0.74 0.66 0.7 0.68 0.64 0.72 0.62 0.74 0.64] 68 (+/- 6.1) <- [ 0.68 0.78 0.74 0.64 0.72 0.7 0.68 0.58 0.58 0.66] 66 (+/- 3.7) <- [ 0.62 0.7 0.6 0.68 0.68 0.66 0.6 0.68 0.7 0.68] 67 (+/- 5.9) <- [ 0.68 0.6 0.58 0.68 0.8 0.64 0.64 0.68 0.64 0.72] 65 (+/- 6.0) <- [ 0.68 0.54 0.68 0.66 0.76 0.62 0.68 0.7 0.62 0.58] 66 (+/- 6.3) <- [ 0.72 0.62 0.78 0.64 0.74 0.64 0.62 0.58 0.6 0.7 ] Mean time (10 cv): 14.97 seconds Overall time: 153.30 seconds -------------------------------------------------- Experiment time: 17581 seconds