Topology in time series forecasting

This notebook shows how giotto-tda can be used to create topological features for time series forecasting tasks, and how to integrate them into scikit-learn–compatible pipelines.

In particular, we will concentrate on topological features which are created from consecutive sliding windows over the data. In sliding window models, a single time series array X of shape (n_timestamps, n_features) is turned into a time series of windows over the data, with a new shape (n_windows, n_samples_per_window, n_features). There are two main issues that arise when building forecasting models with sliding windows:

  1. n_windows is smaller than n_timestamps. This is because we cannot have more windows than there are timestamps without padding X, and this is not done by giotto-tda. n_timestamps - n_windows is even larger if we decide to pick a large stride between consecutive windows.
  2. The target variable y needs to be properly "aligned" with each window so that the forecasting problem is meaningful and e.g. we don't "leak" information from the future. In particular, y needs to be "resampled" so that it too has length n_windows.

To deal with these issues, giotto-tda provides a selection of transformers with resample, transform_resample and fit_transform_resample methods. These are inherited from a TransformerResamplerMixin base class. Furthermore, giotto-tda provides a drop-in replacement for scikit-learn's Pipeline which extends it to allow chaining TransformerResamplerMixins with regular scikit-learn estimators.

If you are looking at a static version of this notebook and would like to run its contents, head over to GitHub and download the source.

See also

License: AGPLv3

SlidingWindow

Let us start with a simple example of a "time series" X with a corresponding target y of the same length.

In [ ]:
import numpy as np

n_timestamps = 10
X, y = np.arange(n_timestamps), np.arange(n_timestamps) - n_timestamps
X, y

We can instantiate our sliding window transformer-resampler and run it on the pair (X, y):

In [ ]:
from gtda.time_series import SlidingWindow

window_size = 3
stride = 2

SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X, y)
X_sw, yr

We note a couple of things:

  • fit_transform_resample returns a pair: the window-transformed X and the resampled and aligned y.
  • SlidingWindow has made a choice for us on how to resample y and line it up with the windows from X: a window on X corresponds to the last value in a corresponding window over y. This is common in time series forecasting where, for example, y could be a shift of X by one timestamp.
  • Some of the initial values of X may not be found in X_sw. This is because SlidingWindow only ensures the last value is represented in the last window, regardless of the stride.

Multivariate time series example: Sliding window + topology Pipeline

giotto-tda's topology transformers expect 3D input. But our X_sw above is 2D. How do we capture interesting properties of the topology of input time series then? For univariate time series, it turns out that a good way is to use the "time delay embedding" or "Takens embedding" technique explained in the first part of Topology of time series. But as this involves an extra layer of complexity, we leave it for later and concentrate for now on an example with a simpler API which also demonstrates the use of a giotto-tda Pipeline.

Surprisingly, this involves multivariate time series input!

In [ ]:
rng = np.random.default_rng(42)

n_features = 2

X = rng.integers(0, high=20, size=(n_timestamps, n_features), dtype=int)
X

We are interpreting this input as a time series in two variables, of length n_timestamps. The target variable is the same y as before.

In [ ]:
SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X, y)
X_sw, yr

X_sw is now a complicated-looking array, but it has a simple interpretation. Again, X_sw[i] is the i-th window on X, and it contains window_size samples from the original time series. This time, the samples are not scalars but 1D arrays.

What if we suspect that the way in which the correlations between the variables evolve over time can help forecast the target y? This is a common situation in neuroscience, where each variable could be data from a single EEG sensor, for instance.

giotto-tda exposes a PearsonDissimilarity transformer which creates a 2D dissimilarity matrix from each window in X_sw, and stacks them together into a single 3D object. This is the correct format (and information content!) for a typical topological transformer in gtda.homology. See also Topological feature extraction from graphs for an in-depth look. Finally, we can extract simple scalar features using a selection of transformers in gtda.diagrams.

In [ ]:
from gtda.time_series import PearsonDissimilarity
from gtda.homology import VietorisRipsPersistence
from gtda.diagrams import Amplitude

PD = PearsonDissimilarity()
X_pd = PD.fit_transform(X_sw)
VR = VietorisRipsPersistence(metric="precomputed")
X_vr = VR.fit_transform(X_pd)  # "precomputed" required on dissimilarity data
Ampl = Amplitude()
X_a = Ampl.fit_transform(X_vr)
X_a

Notice that we are not acting on y above. We are simply creating features from each window using topology! Note: it's two features per window because we used the default value for homology_dimensions in VietorisRipsPersistence, not because we had two variables in the time series initially!

We can now put this all together into a giotto-tda Pipeline which combines both the sliding window transformation on X and resampling of y with the feature extraction from the windows on X.

Note: while we could import the Pipeline class and use its constructor, we use the convenience function make_pipeline instead, which is a drop-in replacement for scikit-learn's.

In [ ]:
from sklearn import set_config
set_config(display='diagram')  # For HTML representations of pipelines

from gtda.pipeline import make_pipeline

pipe = make_pipeline(SW, PD, VR, Ampl)
pipe

Finally, if we have a regression task on y we can add a final estimator such as scikit-learn's RandomForestRegressor as a final step in the previous pipeline, and fit it!

In [ ]:
from sklearn.ensemble import RandomForestRegressor

RFR = RandomForestRegressor()

pipe = make_pipeline(SW, PD, VR, Ampl, RFR)
pipe
In [ ]:
pipe.fit(X, y)
y_pred = pipe.predict(X)
score = pipe.score(X, y)
y_pred, score

Univariate time series – TakensEmbedding and SingleTakensEmbedding

The first part of Topology of time series explains a commonly used technique for converting a univariate time series into a single point cloud. Since topological features can be extracted from any point cloud, this is a gateway to time series analysis using topology. The second part of that notebook shows how to transform a batch of time series into a batch of point clouds, and how to extract topological descriptors from each of them independently. While in that notebook this is applied to a time series classification task, in this notebook we are concerned with topology-powered forecasting from a single time series.

Reasoning by analogy with the multivariate case above, we can look at sliding windows over X as small time series in their own right and track the evolution of their topology against the variable of interest (or against itself, if we are interested in unsupervised tasks such as anomaly detection).

There are two ways in which we can implement this idea in giotto-tda:

  1. We can first apply a SlidingWindow, and then an instance of TakensEmbedding.
  2. We can first compute a global Takens embedding of the time series via SingleTakensEmbedding, which takes us from 1D/column data to 2D data, and then partition the 2D data of vectors into sliding windows via SlidingWindow.

The first route ensures that we can run our "topological feature extraction track" in parallel with other feature-generation pipelines from sliding windows, without experiencing shape mismatches. The second route seems a little upside-down and it is not generally recommended, but it has the advantange that globally "optimal" parameters for the "time delay" and "embedding dimension" parameters can be computed automatically by SingleTakensEmbedding.

Below is what each route would look like.

Remark: In the presence of noise, a small sliding window size is likely to reduce the reliability of the estimate of the time series' local topology.

Option 1: SlidingWindow + TakensEmbedding

TakensEmbedding is not a TransformerResamplerMixin, but this is not a problem in the context of a Pipeline when we order things in this way.

In [ ]:
from gtda.time_series import TakensEmbedding

X = np.arange(n_timestamps)

window_size = 5
stride = 2

SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X, y)
X_sw, yr
In [ ]:
time_delay = 1
dimension = 2

TE = TakensEmbedding(time_delay=time_delay, dimension=dimension)
X_te = TE.fit_transform(X_sw)
X_te
In [ ]:
VR = VietorisRipsPersistence()  # No "precomputed" for point clouds
Ampl = Amplitude()
RFR = RandomForestRegressor()

pipe = make_pipeline(SW, TE, VR, Ampl, RFR)
pipe
In [ ]:
pipe.fit(X, y)
y_pred = pipe.predict(X)
score = pipe.score(X, y)
y_pred, score

Option 2: SingleTakensEmbeding + SlidingWindow

Note that SingleTakensEmbedding is also a TransformerResamplerMixin, and that the logic for resampling/aligning y is the same as in SlidingWindow.

In [ ]:
from gtda.time_series import SingleTakensEmbedding

X = np.arange(n_timestamps)

STE = SingleTakensEmbedding(parameters_type="search", time_delay=2, dimension=3)
X_ste, yr = STE.fit_transform_resample(X, y)
X_ste, yr
In [ ]:
window_size = 5
stride = 2

SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X_ste, yr)
X_sw, yr

From here on, it is easy to push a very similar pipeline through as in the multivariate case:

In [ ]:
VR = VietorisRipsPersistence()  # No "precomputed" for point clouds
Ampl = Amplitude()
RFR = RandomForestRegressor()

pipe = make_pipeline(STE, SW, VR, Ampl, RFR)
pipe
In [ ]:
pipe.fit(X, y)
y_pred = pipe.predict(X)
score = pipe.score(X, y)
y_pred, score

Integrating non-topological features

The best results are obtained when topological methods are used not in isolation but in combination with other methods. Here's an example where, in parallel with the topological feature extraction from local sliding windows using Option 2 above, we also compute the mean and variance in each sliding window. A scikit-learn FeatureUnion is used to combine these very different sets of features into a single pipeline object.

In [ ]:
from functools import partial
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import FeatureUnion
from sklearn.base import clone

mean = FunctionTransformer(partial(np.mean, axis=1, keepdims=True))
var = FunctionTransformer(partial(np.var, axis=1, keepdims=True))

pipe_topology = make_pipeline(TE, VR, Ampl)

feature_union = FeatureUnion([("window_mean", mean),
                              ("window_variance", var),
                              ("window_topology", pipe_topology)])
    
pipe = make_pipeline(SW, feature_union, RFR)
pipe
In [ ]:
pipe.fit(X, y)
y_pred = pipe.predict(X)
score = pipe.score(X, y)
y_pred, score

Endogeneous target preparation with Labeller

Let us say that we simply wish to predict the future of a time series from itself. This is very common in the study of financial markets for example. giotto-tda provides convenience classes for target preparation from a time series. This notebook only shows a very simple example: many more options are described in Labeller's documentation.

If we wished to create a target y from X such that y[i] is equal to X[i + 1], while also modifying X and y so that they still have the same length, we could proceed as follows:

In [ ]:
from gtda.time_series import Labeller

X = np.arange(10)

Lab = Labeller(size=1, func=np.max)
Xl, yl = Lab.fit_transform_resample(X, X)
Xl, yl

Notice that we are feeding two copies of X to fit_transform_resample in this case!

This is what fitting an end-to-end pipeline for future prediction using topology could look like. Again, you are encouraged to include your own non-topological features in the mix!

In [ ]:
SW = SlidingWindow(size=5)
TE = TakensEmbedding(time_delay=1, dimension=2)
VR = VietorisRipsPersistence()
Ampl = Amplitude()
RFR = RandomForestRegressor()

# Full pipeline including the regressor
pipe = make_pipeline(Lab, SW, TE, VR, Ampl, RFR)
pipe
In [ ]:
pipe.fit(X, X)
y_pred = pipe.predict(X)
y_pred

Where to next?

  1. There are two additional simple TransformerResamplerMixins in gtda.time_series: Resampler and Stationarizer.
  2. The sort of pipeline for topological feature extraction using Takens embedding is a bit crude. More sophisticated methods exist for extracting robust topological summaries from (windows on) time series. A good source of inspiration is the following paper:

    Persistent Homology of Complex Networks for Dynamic State Detection, by A. Myers, E. Munch, and F. A. Khasawneh.

    The module gtda.graphs contains several transformers implementing the main algorithms proposed there.

  3. Advanced users may be interested in ConsecutiveRescaling, which can be found in gtda.point_clouds.
  4. The notebook Lorenz attractor is an advanced use-case for TakensEmbedding and other time series forecasting techniques inspired by topology.