This notebook presents a case study of the TriScale framework. It revisits the analysis of Blink, an algorithm that detects failuresand reroutes traffic directly in the data plane. Parts of this case study are described in the TriScale paper.
import os
import copy
from pathlib import Path
import zipfile
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import triscale
import triplots
The dataset for this case study is available on Zenodo:
The wget commands below download the required files to reproduce this case study. Downloading and unzipping might take a while...
The .zip file is ~100 kB
# Set `download = True` to download (and extract) the data from this case study
# Eventually, adjust the record_id for the file version you are interested in.
# For reproducing the results of the TriScale paper, set `record_id = 3666724`
download = True
record_id = 3666724 # v3.0.1 (https://doi.org/10.5281/zenodo.3666724)
files= ['UseCase_FailureDetection.zip']
if download:
for file in files:
print(file)
url = 'https://zenodo.org/record/'+str(record_id)+'/files/'+file
os.system('wget %s' %url)
if file[-4:] == '.zip':
with zipfile.ZipFile(file,"r") as zip_file:
zip_file.extractall()
print('Done.')
else:
print('Nothing to download')
Nothing to download
We now import the custom module for the case study.
import UseCase_FailureDetection.failuredetection as fd
In this case study, 30 prefixes of 15 different internet traces have been selected. For each of these prefixes, 5 artificial traces have been generates, all of which include a failure. We are interested in evaluating
The experiment has been designed and performed by the authors of the Blink paper. In this case study, we only perform the data analysis, using TriScale approach to generalize the results.
For each prefix, we compute two metrics
The computation of metric values is performed by the compute_metrics()
function below.
# Construct the path to the different test results
result_dir = Path('UseCase_FailureDetection')
config_file = Path('UseCase_FailureDetection/config.yml')
out_file = result_dir / 'metrics.csv'
df = fd.compute_metrics(config_file, result_dir, out_name=out_file)
display(df)
Output retrieved from file. Skipping computation.
Protocol | Trace | Prefix | TPR | Speed_s | |
---|---|---|---|---|---|
0 | blink | 1 | 0 | 0.6 | 1.998730 |
1 | blink | 1 | 1 | 1.0 | 1.579861 |
2 | blink | 1 | 2 | 0.0 | NaN |
3 | blink | 1 | 3 | 1.0 | 1.707236 |
4 | blink | 1 | 4 | 0.8 | 1.419164 |
... | ... | ... | ... | ... | ... |
1345 | infinite_timeout | 15 | 25 | 0.8 | 1.681014 |
1346 | infinite_timeout | 15 | 26 | 0.0 | NaN |
1347 | infinite_timeout | 15 | 27 | 0.4 | 2.107471 |
1348 | infinite_timeout | 15 | 28 | 1.0 | 0.717849 |
1349 | infinite_timeout | 15 | 29 | 1.0 | 0.743098 |
1350 rows × 5 columns
For each set of prefixes, we compute one KPI: the 95% CI on the median of each metric (TPR and recovery time).
KPI = { 'percentile' : 50,
'confidence' : 95,
'bounds': [0,1],
'bound': 'lower',
}
out_file = result_dir / 'kpis.csv'
kpis = fd.compute_kpis(df,KPI,config_file,out_file)
Output retrieved from file. Skipping computation.
We can then plot these KPIs for each of the Internet traces.
figure = fd.plot_TPR(kpis,config_file)
figure.show()
figure = fd.plot_speed(kpis,config_file)
figure.show()
Using TriScale, we can generalize the results. For each trace, the evaluation of Blink on one prefix can be seen as a TriScale run. Since the prefixes are randomly selected from a fixed set, runs are i.i.d. and we can use TriScale’s KPI to derive the expected performance of Blink for any set of prefixes.
We can claim with 95% confidence that, for
at least 50% of the prefixes,
Blink always detects link failures
(TPR= 1) and reroutes traffic within 1 s or less