VietorisRipsPersistence
and PersistenceEntropy
¶In this notebook, we showcase the ease of use of one of the core components of giotto-tda
: VietorisRipsPersistence
, along with vectorization methods. We first list steps in a typical, topological-feature extraction routine and then show to encapsulate them with a standard scikit-learn
–like pipeline.
If you are looking at a static version of this notebook and would like to run its contents, head over to GitHub and download the source.
License: AGPLv3
Let's begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. Note: You will need the auxiliary module generate_datasets.py to run this cell. You can change the second argument of generate_point_clouds
to obtain a finer or coarser sampling, or tune the level of noise via the third.
from data.generate_datasets import generate_point_clouds
n_samples_per_class = 10
point_clouds, labels = generate_point_clouds(n_samples_per_class, 10, 0.1)
point_clouds.shape
print(f"There are {point_clouds.shape[0]} point clouds in {point_clouds.shape[2]} dimensions, "
f"each with {point_clouds.shape[1]} points.")
Instantiate a VietorisRipsPersistence
transformer and calculate so-called persistence diagrams for this collection of point clouds.
from gtda.homology import VietorisRipsPersistence
VR = VietorisRipsPersistence(homology_dimensions=[0, 1, 2]) # Parameter explained in the text
diagrams = VR.fit_transform(point_clouds)
diagrams.shape
Important note: VietorisRipsPersistence
, and all other "persistent homology" transformers in gtda.homology
, expect input in the form of a 3D array or, in some cases, a list of 2D arrays. For each entry in the input (here, for each point cloud in point_clouds
) they compute a topological summary which is also a 2D array, and then stack all these summaries into a single output 3D array. So, in our case, diagrams[i]
represents the topology of point_clouds[i]
. diagrams[i]
is interpreted as follows:
point_clouds[i]
.VietorisRipsPersistence
on point clouds it has the interpretation of a length scale.VietorisRipsPersistence
to look for these by using the homology_dimensions
parameter!If we make one scatter plot per available homology dimension, and plot births and deaths as x- and y-coordinates of points in 2D, we end up with a 2D representation of diagrams[i]
, and the reason why it is called a persistence diagram:
from gtda.plotting import plot_diagram
i = 0
plot_diagram(diagrams[i])
The notebook Plotting in giotto-tda delves deeper into the plotting functions and class methods available in giotto-tda
.
Instantiate a PersistenceEntropy
transformer and extract scalar features from the persistence diagrams.
from gtda.diagrams import PersistenceEntropy
PE = PersistenceEntropy()
features = PE.fit_transform(diagrams)
Leverage the compatibility with scikit-learn
to perform a train-test split and score the features.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(features, labels)
model = RandomForestClassifier()
model.fit(X_train, y_train)
model.score(X_valid, y_valid)
giotto-tda
with scikit-learn
onesfrom sklearn.pipeline import make_pipeline
steps = [VietorisRipsPersistence(homology_dimensions=[0, 1, 2]),
PersistenceEntropy(),
RandomForestClassifier()]
pipeline = make_pipeline(*steps)
pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(point_clouds, labels)
pipeline.fit(pcs_train, labels_train)
pipeline.score(pcs_valid, labels_valid)