Welcome to giotto-time
, our new library for time series forecasting!
Let's start with an example.
These are the main ingredients of giotto-time
:
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from gtime.preprocessing import TimeSeriesPreparation
from gtime.compose import FeatureCreation
from gtime.feature_extraction import Shift, MovingAverage
from gtime.feature_generation import PeriodicSeasonal, Constant, Calendar
from gtime.model_selection import horizon_shift, FeatureSplitter
from gtime.forecasting import GAR
TimeSeriesPreparation
: checks the input format of the time series and converts it to the expected formatDataFrameTransformer
: scikit-learn's ColumnTransformer
wrapper that returns DataFrameShift
, MovingAverage
: create the desired features on the time series for the forecastingFeatureSplitter
: prepares the custom giotto-time
train-test matrices that are used in the modelGAR
: Generalized Auto Regressive model, scikit-learn's MultiOutputRegressor
wrapper. This is the only time series forecasting model available for the first releaseWe also need a scikit-learn regression model. We go for a standard LinearRegression
for this example.
from sklearn.linear_model import LinearRegression
We use the pandas.testing
module to create a testing time series
def test_time_series():
from pandas.util import testing as testing
testing.N, testing.K = 500, 1
df = testing.makeTimeDataFrame( freq="D" )
return df
time_series = test_time_series()
print(f'Time series shape: {time_series.shape}')
print(f'Time series index type: {time_series.index.__class__}')
Time series shape: (500, 1) Time series index type: <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
The input time series has to be a DataFrame
with a PeriodIndex
. Use the provided class TimeSeriesPreparation
to convert the time series into this format.
time_series_preparation = TimeSeriesPreparation()
period_index_time_series = time_series_preparation.transform(time_series)
print(f'Time series index type after the preprocessing: {period_index_time_series.index.__class__}')
Time series index type after the preprocessing: <class 'pandas.core.indexes.period.PeriodIndex'>
period_index_time_series.plot(figsize=(20, 5))
plt.show()
The feature extraction part is aimed at providing a scikit-learn paradigm with a time-series forecasting perspective
Our DataFrameTransformer
inherits from scikit-learn's ColumnTransformer
, it will create a feature DataFrame with the provided Transformers.
For simplicity we will create only Shift
and MovingAverage
features.
Shift
provides a temporal shift of the time series. Adding two Shift
features (by 1 and 2) is equivalent to an AR(2)
model.
Since the DataFrameTransformer
is a ColumnTransformer
wrapper, you can easily include features from scikit-learn
, tsfresh
, topological features from giotto-tda
(\o/) or your own custom features.
cal = Calendar(
start_date="ignored",
end_date="ignored",
region="america",
country="Brazil",
kernel=np.array([0, 1]),
)
# New API
dft = FeatureCreation(
[('s1', Shift(1), ['time_series']),
('s2', Shift(2), ['time_series']),
('ma3', MovingAverage(window_size=3), ['time_series']),
# ('cal', cal, ['time_series']),
# ('ct', Constant(2), ['time_series']),
])
X = dft.fit_transform(period_index_time_series)
X.head(6)
s1__time_series__Shift | s2__time_series__Shift | ma3__time_series__MovingAverage | |
---|---|---|---|
2000-01-01 | NaN | NaN | NaN |
2000-01-02 | -1.366772 | NaN | NaN |
2000-01-03 | -0.439480 | -1.366772 | -0.410613 |
2000-01-04 | 0.574412 | -0.439480 | -0.171874 |
2000-01-05 | -0.650554 | 0.574412 | 0.097240 |
2000-01-06 | 0.367861 | -0.650554 | 0.634610 |
y = horizon_shift(period_index_time_series, horizon=3)
y.head()
y_1 | y_2 | y_3 | |
---|---|---|---|
2000-01-01 | -0.439480 | 0.574412 | -0.650554 |
2000-01-02 | 0.574412 | -0.650554 | 0.367861 |
2000-01-03 | -0.650554 | 0.367861 | 2.186524 |
2000-01-04 | 0.367861 | 2.186524 | 0.001702 |
2000-01-05 | 2.186524 | 0.001702 | -0.173121 |
We use FeatureSplitter
to split the matrices X and y in train and test.
feature_splitter = FeatureSplitter()
X_train, y_train, X_test, y_test = feature_splitter.transform(X, y)
We rewrapped scikit-learn's MultiOutputRegressor
as GAR
(Generalized Auto Regressive) model to better fit time series forecasting frameworks.
The traditional AR model is equivalent to the GAR
model that uses only Shift
columns in the X
matrix.
GAR
supports all the features compatible with the feature extraction step.
lr = LinearRegression()
model = GAR(lr)
model = model.fit(X_train, y_train)
We forecast 3 time steps of the time series (we set this parameter in horizon_shift
method).
The format of the output is the following:
y_1
is the prediction one time step after and so on for y_2
and y_3
predictions = model.predict(X_test)
predictions
y_1 | y_2 | y_3 | |
---|---|---|---|
2001-05-12 | 0.016174 | -0.099421 | -0.119078 |
2001-05-13 | -0.146184 | -0.009442 | -0.095851 |
2001-05-14 | 0.007748 | -0.052065 | -0.107488 |