How to use this notebook

This Jupyter notebook and the other files in this directory present the code related to the Medium blog post titled "What is Giotto and Why James Bond Should Use It to Extract Secret Messages". The idea is to use topological data analysis (TDA) to predict the regime change from a chaotic regime to a non-chaotic regime in time series with different levels of noise. For further information please refer to the blog post.

As the feature creation takes a long time (around 45 minutes on a MacBook Pro with 16 cores), a precomputed dataset is loaded. In the 'Plot Features' section the features are directly calculated for a smaller time series for presentation purposes. In case you are interested in creating the features for all the time series yourself, run the bash script 'create_features.sh'.

Library Imports and Some Utility Functions

In [2]:
# Imports from Scikit-learn and XGBoost respectively
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import balanced_accuracy_score, make_scorer
from sklearn.linear_model import SGDClassifier
from xgboost import XGBClassifier

# For bootstrap confidence intervals
from numpy.random import seed
from numpy.random import rand
from numpy.random import randint

# Others
import pandas as pd
import numpy as np
from datetime import datetime
from itertools import product
import os
from pandarallel import pandarallel
from joblib import Parallel, delayed
from functools import reduce
from scipy.fftpack import rfft
import openml
from openml.datasets.functions import get_dataset

# Giotto
import giotto as gt
import giotto.diagrams as diag
import giotto.homology as hl

# Plotting functions
from plotting import plot_diagram, plot_landscapes
from plotting import plot_betti_surfaces, plot_betti_curves
from plotting import plot_point_cloud
import matplotlib.pyplot as plt
import plotly.express as px

# Our own feature creation and plotting functions
from chaos_detection import *
In [3]:
# Make balanced accuracy scorer
bal_acc_score = make_scorer(balanced_accuracy_score)

Plot Features

Here we create the features for a small time series in order to present the TDA features.

In [4]:
df_res, X_betti_surfaces = create_all_features(42203, noise_level=0.0, return_betti_surface=True)
df_res.head()
New pandarallel memory created - Size: 2000 MB
Pandarallel will run on 16 workers
Optimal embedding time delay based on mutual information:  5
Optimal embedding dimension based on false nearest neighbors:  14
Out[4]:
time y x x_dot max_10 max_20 max_50 mean_10 mean_20 mean_50 ... fourier_w_1 fourier_w_2 num_holes avg_lifetime betti_0 betti_1 betti_2 betti_argmax_1 betti_argmax_2 amplitude
133 13.30133 0 0.919056 1.520114 0.919056 0.919056 0.919056 0.895947 0.871955 0.801546 ... -0.086702 1.198089 100 0.011424 0.0 0.0 0.0 0 0 0.080962
134 13.40134 0 0.924506 1.521954 0.924506 0.924506 0.924506 0.900792 0.877009 0.806352 ... -0.085955 1.199410 100 0.011424 0.0 0.0 0.0 0 0 0.080962
135 13.50135 0 0.922732 1.517234 0.924506 0.924506 0.924506 0.904971 0.881477 0.810933 ... -0.091852 1.199812 100 0.011424 0.0 0.0 0.0 0 0 0.080962
136 13.60136 0 0.933383 1.522403 0.933383 0.933383 0.933383 0.910297 0.886202 0.815624 ... -0.086803 1.201020 100 0.011424 0.0 0.0 0.0 0 0 0.080962
137 13.70137 0 0.938219 1.523874 0.938219 0.938219 0.938219 0.915080 0.891244 0.820432 ... -0.086749 1.202251 100 0.011424 0.0 0.0 0.0 0 0 0.080962

5 rows × 23 columns

In [5]:
#Betti surface
plot_betti_surfaces(X_betti_surfaces)