MiniRocket¶

MiniRocket transforms input time series using a small, fixed set of convolutional kernels. MiniRocket uses PPV pooling to compute a single feature for each of the resulting feature maps (i.e., the proportion of positive values). The transformed features are used to train a linear classifier.

Dempster A, Schmidt DF, Webb GI (2020) MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Series Classification arXiv:2012.08791

1 Univariate Time Series¶

1.1 Imports¶

Import example data, MiniRocket, RidgeClassifierCV (scikit-learn), and NumPy.

Note: MiniRocket and MiniRocketMultivariate are compiled by Numba on import. The compiled functions are cached, so this should only happen once (i.e., the first time you import MiniRocket or MiniRocketMultivariate).

In [1]:

# !pip install --upgrade numba

In [2]:

import numpy as np
from sklearn.linear_model import RidgeClassifierCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

from aeon.datasets import load_arrow_head  # univariate dataset
from aeon.datasets import load_basic_motions  # multivariate dataset
from aeon.datasets import (
    load_japanese_vowels,  # multivariate dataset with unequal length
)
from aeon.transformations.collection.convolution_based import (
    MiniRocket,
    MiniRocketMultivariate,
    MiniRocketMultivariateVariable,
)

1.2 Load the Training Data¶

For more details on the data set, see the univariate time series classification notebook.

Note: Input time series must be at least of length 9. Pad shorter time series using, e.g., PaddingTransformer (aeon.transformers.panel.padder).

1.3 Initialise MiniRocket and Transform the Training Data¶

In [3]:

X_train, y_train = load_arrow_head(split="train")
minirocket = MiniRocket()  # by default, MiniRocket uses ~10_000 kernels
minirocket.fit(X_train)
X_train_transform = minirocket.transform(X_train)
# test shape of transformed training data -> (n_cases, 9_996)
X_train_transform.shape

Out[3]:

(36, 9996)

1.4 Fit a Classifier¶

We suggest using RidgeClassifierCV (scikit-learn) for smaller datasets (fewer than ~10,000 training examples), and using logistic regression trained using stochastic gradient descent for larger datasets.

Note: For larger datasets, this means integrating MiniRocket with stochastic gradient descent such that the transform is performed per minibatch, not simply substituting RidgeClassifierCV for, e.g., LogisticRegression.

Note: While the input time-series of MiniRocket is unscaled, the output features of MiniRocket may need to be adjusted for following models. E.g. for RidgeClassifierCV, we scale the features using the sklearn StandardScaler.

In [4]:

scaler = StandardScaler(with_mean=False)
classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))

X_train_scaled_transform = scaler.fit_transform(X_train_transform)
classifier.fit(X_train_scaled_transform, y_train)

Out[4]:

RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,
       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,
       2.15443469e+02, 1.00000000e+03]))

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

1.5 Load and Transform the Test Data¶

In [5]:

X_test, y_test = load_arrow_head(split="test")
X_test_transform = minirocket.transform(X_test)

1.6 Classify the Test Data¶

In [6]:

X_test_scaled_transform = scaler.transform(X_test_transform)
classifier.score(X_test_scaled_transform, y_test)

Out[6]:

0.8742857142857143

2 Multivariate Time Series¶

We can use the multivariate version of MiniRocket for multivariate time series input.

2.1 Imports¶

Import MiniRocketMultivariate.

Note: MiniRocketMultivariate compiles via Numba on import.

2.2 Load the Training Data¶

Note: Input time series must be at least of length 9. Pad shorter time series using, e.g., PaddingTransformer (aeon.transformers.panel.padder).

In [7]:

X_train, y_train = load_basic_motions(split="train")

2.3 Initialise MiniRocket and Transform the Training Data¶

In [8]:

minirocket_multi = MiniRocketMultivariate()
minirocket_multi.fit(X_train)
X_train_transform = minirocket_multi.transform(X_train)

2.4 Fit a Classifier¶

In [9]:

scaler = StandardScaler(with_mean=False)
X_train_scaled_transform = scaler.fit_transform(X_train_transform)

classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))
classifier.fit(X_train_scaled_transform, y_train)

Out[9]:

RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,
       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,
       2.15443469e+02, 1.00000000e+03]))

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

2.5 Load and Transform the Test Data¶

In [10]:

X_test, y_test = load_basic_motions(split="test")
X_test_transform = minirocket_multi.transform(X_test)

2.6 Classify the Test Data¶

In [11]:

X_test_scaled_transform = scaler.transform(X_test_transform)
classifier.score(X_test_scaled_transform, y_test)

Out[11]:

1.0

3 Pipeline Example¶

We can use MiniRocket together with RidgeClassifierCV (or another classifier) in a pipeline. We can then use the pipeline like a self-contained classifier, with a single call to fit, and without having to separately transform the data, etc.

3.1 Imports¶

In [12]:

# (above)

3.2 Initialise the Pipeline¶

In [13]:

minirocket_pipeline = make_pipeline(
    MiniRocket(),
    StandardScaler(with_mean=False),
    RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)),
)

3.3 Load and Fit the Training Data¶

Note: Input time series must be at least of length 9. Pad shorter time series using, e.g., PaddingTransformer (aeon.transformers.panel.padder).

In [14]:

X_train, y_train = load_arrow_head(split="train")

# it is necessary to pass y_train to the pipeline
# y_train is not used for the transform, but it is used by the classifier
minirocket_pipeline.fit(X_train, y_train)

Out[14]:

Pipeline(steps=[('minirocket', MiniRocket()),
                ('standardscaler', StandardScaler(with_mean=False)),
                ('ridgeclassifiercv',
                 RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,
       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,
       2.15443469e+02, 1.00000000e+03])))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

3.4 Load and Classify the Test Data¶

In [15]:

X_test, y_test = load_arrow_head(split="test")

minirocket_pipeline.score(X_test, y_test)

Out[15]:

0.8457142857142858

4 Pipeline Example with MiniRocketMultivariateVariable and unequal length time-series data¶

For a further pipeline, we use the extended version of MiniRocket, the MiniRocketMultivariateVariable for variable / unequal length time series data. Following the code implementation of the original paper of miniRocket, we combine it with RidgeClassifierCV in a sklearn pipeline. We can then use the pipeline like a self-contained classifier, with a single call to fit, and without having to separately transform the data, etc.

4.1 Load japanese_vowels as unequal length dataset¶

Japanese vowels is a a UCI Archive dataset. 9 Japanese-male speakers were recorded saying the vowels ‘a’ and ‘e’. The raw recordings are preprocessed to get a 12-dimensional (multivariate) classification probem. The series lengths are between 7 and 29.

In [16]:

X_train_jv, y_train_jv = load_japanese_vowels(split="train")
# lets visualize the first three voice recordings with dimension 0-11

print("number of samples training: ", len(X_train_jv))
print("series length of recoding 0, dimension 5: ", X_train_jv[0][5].shape)
print("series length of recoding 1, dimension 0: ", X_train_jv[1][0].shape)

number of samples training:  270
series length of recoding 0, dimension 5:  (20,)
series length of recoding 1, dimension 0:  (26,)

4.2 Create a pipeline, train on it¶

As before, we create a sklearn pipeline. MiniRocketMultivariateVariable requires a minimum series length of 9, where missing values are padded up to a length of 9, with the value "-10.0". Afterwards a scaler and a RidgeClassifierCV are added.

In [17]:

minirocket_mv_var_pipeline = make_pipeline(
    MiniRocketMultivariateVariable(
        pad_value_short_series=-10.0, random_state=42, max_dilations_per_kernel=16
    ),
    StandardScaler(with_mean=False),
    RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)),
)
print(minirocket_mv_var_pipeline)

minirocket_mv_var_pipeline.fit(X_train_jv, y_train_jv)

Pipeline(steps=[('minirocketmultivariatevariable',
                 MiniRocketMultivariateVariable(max_dilations_per_kernel=16,
                                                pad_value_short_series=-10.0,
                                                random_state=42)),
                ('standardscaler', StandardScaler(with_mean=False)),
                ('ridgeclassifiercv',
                 RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,
       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,
       2.15443469e+02, 1.00000000e+03])))])

Out[17]:

Pipeline(steps=[('minirocketmultivariatevariable',
                 MiniRocketMultivariateVariable(max_dilations_per_kernel=16,
                                                pad_value_short_series=-10.0,
                                                random_state=42)),
                ('standardscaler', StandardScaler(with_mean=False)),
                ('ridgeclassifiercv',
                 RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,
       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,
       2.15443469e+02, 1.00000000e+03])))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

4.3 Score the Pipeline on japanese vowels¶

Using the MiniRocketMultivariateVariable, we are able to process also process slightly larger input series than at train time. train max series length: 27, test max series length 29

In [18]:

X_test_jv, y_test_jv = load_japanese_vowels(split="test")

minirocket_mv_var_pipeline.score(X_test_jv, y_test_jv)

Out[18]:

0.9945945945945946