`VietorisRipsPersistence`

and `PersistenceEntropy`

¶In this notebook, we showcase the ease of use of one of the core components of `giotto-tda`

: `VietorisRipsPersistence`

, along with vectorization methods. We first list steps in a typical, topological-feature extraction routine and then show to encapsulate them with a standard `scikit-learn`

â€“like pipeline.

If you are looking at a static version of this notebook and would like to run its contents, head over to GitHub and download the source.

**License: AGPLv3**

Let's begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. **Note**: You will need the auxiliary module generate_datasets.py to run this cell. You can change the second argument of `generate_point_clouds`

to obtain a finer or coarser sampling, or tune the level of noise via the third.

In [ ]:

```
from data.generate_datasets import make_point_clouds
n_samples_per_class = 10
point_clouds, labels = make_point_clouds(n_samples_per_class, 10, 0.1)
point_clouds.shape
print(f"There are {point_clouds.shape[0]} point clouds in {point_clouds.shape[2]} dimensions, "
f"each with {point_clouds.shape[1]} points.")
```

Instantiate a `VietorisRipsPersistence`

transformer and calculate so-called **persistence diagrams** for this collection of point clouds.

In [ ]:

```
from gtda.homology import VietorisRipsPersistence
VR = VietorisRipsPersistence(homology_dimensions=[0, 1, 2]) # Parameter explained in the text
diagrams = VR.fit_transform(point_clouds)
diagrams.shape
```

**Important note**: `VietorisRipsPersistence`

, and all other "persistent homology" transformers in `gtda.homology`

, expect input in the form of a 3D array or, in some cases, a list of 2D arrays. For each entry in the input (here, for each point cloud in `point_clouds`

) they compute a topological summary which is also a 2D array, and then stack all these summaries into a single output 3D array. So, in our case, `diagrams[i]`

represents the topology of `point_clouds[i]`

. `diagrams[i]`

is interpreted as follows:

- Each row is a triplet describing a single topological feature found in
`point_clouds[i]`

. - The first and second entries (respectively) in the triplet denote the values of the "filtration parameter" at which the feature appears or disappears respectively. They are referred to as the "birth" and "death" values of the feature (respectively). The meaning of "filtration parameter" depends on the specific transformer, but in the case of
`VietorisRipsPersistence`

on point clouds it has the interpretation of a length scale. - A topological feature can be a connected component, 1D hole/loop, 2D cavity, or more generally $d$-dimensional "void" which exists in the data at scales between its birth and death values. The integer $d$ is the
*homology dimension*(or degree) of the feature and is stored as the third entry in the triplet. In this example, the shapes should have 2D cavities so we explicitly tell`VietorisRipsPersistence`

to look for these by using the`homology_dimensions`

parameter!

If we make one scatter plot per available homology dimension, and plot births and deaths as x- and y-coordinates of points in 2D, we end up with a 2D representation of `diagrams[i]`

, and the reason why it is called a persistence *diagram*:

In [ ]:

```
from gtda.plotting import plot_diagram
i = 0
plot_diagram(diagrams[i])
```

The notebook Plotting in giotto-tda delves deeper into the plotting functions and class methods available in `giotto-tda`

.

Instantiate a `PersistenceEntropy`

transformer and extract scalar features from the persistence diagrams.

In [ ]:

```
from gtda.diagrams import PersistenceEntropy
PE = PersistenceEntropy()
features = PE.fit_transform(diagrams)
```

Leverage the compatibility with `scikit-learn`

to perform a train-test split and score the features.

In [ ]:

```
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(features, labels)
model = RandomForestClassifier()
model.fit(X_train, y_train)
model.score(X_valid, y_valid)
```

- Define an end-to-end pipeline by chaining transformers from
`giotto-tda`

with`scikit-learn`

ones - Train-test split the input point cloud data and labels.
- Fir the pipeline on the training data.
- Score the fitted pipeline on the test data.

In [ ]:

```
from sklearn.pipeline import make_pipeline
steps = [VietorisRipsPersistence(homology_dimensions=[0, 1, 2]),
PersistenceEntropy(),
RandomForestClassifier()]
pipeline = make_pipeline(*steps)
pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(point_clouds, labels)
pipeline.fit(pcs_train, labels_train)
pipeline.score(pcs_valid, labels_valid)
```