Topological feature extraction using `VietorisRipsPersistence` and `PersistenceEntropy`¶

In this notebook, we showcase the ease of use of one of the core components of giotto-tda: VietorisRipsPersistence, along with vectorisation methods. We first list steps in a typical, topological-feature extraction routine and then show to encapsulate them with a standard scikit-learn–like pipeline.

If you are looking at a static version of this notebook and would like to run its contents, head over to github.

License: AGPLv3

Import libraries¶

In [ ]:

from gtda.diagrams import PersistenceEntropy
from gtda.homology import VietorisRipsPersistence
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

Generate data¶

Let's begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. Note: You will need the auxiliary module datasets.py to run this cell.

In [ ]:

from datasets import generate_point_clouds
point_clouds, labels = generate_point_clouds(100, 10, 0.1)

Calculate persistent homology¶

Instantiate a VietorisRipsPersistence transformer and calculate persistence diagrams for this collection of point clouds.

In [ ]:

vietorisrips_tr = VietorisRipsPersistence()
diagrams = vietorisrips_tr.fit_transform(point_clouds)

Extract features¶

Instantiate a PersistenceEntropy transformer and extract features from the persistence diagrams.

In [ ]:

entropy_tr = PersistenceEntropy()
features = entropy_tr.fit_transform(diagrams)

Use the new features in a standard classifier¶

Leverage the compatibility with scikit-learn to perform a train-test split and score the features.

In [ ]:

X_train, X_valid, y_train, y_valid = train_test_split(features, labels)
model = RandomForestClassifier()
model.fit(X_train, y_train)
model.score(X_valid, y_valid)

Encapsulates the steps above in a pipeline¶

Subdivide into train-validation first, and use the pipeline.

In [ ]:

from gtda.pipeline import make_pipeline

Define the pipeline¶

Chain transformers from giotto-tda with scikit-learn ones.

In [ ]:

steps = [VietorisRipsPersistence(),
         PersistenceEntropy(),
         RandomForestClassifier()]
pipeline = make_pipeline(*steps)

Prepare the data¶

Train-test split on the point-cloud data

In [ ]:

pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(
    point_clouds, labels)

Train and score¶

In [ ]:

pipeline.fit(pcs_train, labels_train)
pipeline.score(pcs_valid, labels_valid)

In [ ]:

Topological feature extraction using VietorisRipsPersistence and PersistenceEntropy¶

Import libraries¶

Generate data¶

Calculate persistent homology¶

Extract features¶

Use the new features in a standard classifier¶

Encapsulates the steps above in a pipeline¶

Define the pipeline¶

Prepare the data¶

Train and score¶

Topological feature extraction using `VietorisRipsPersistence` and `PersistenceEntropy`¶