Notebook

TriScale - Experiment Sizing¶

This notebook is intended for self-study of TriScale.

This notebook contains tutorial materials for TriScale.

More specifically, this notebook presents TriScale's experiment_sizing function,
which implement a methodology to define the minimal number of runs required to estimate
a certain performance objective with a given level of confidence.

If you don't know about Jupyter Notebooks and how to interact with them,

fear not! We compiled everything that you need to know here: Notebook Basics :-)

For more details about TriScale, you may refer to the paper.

Banana: a black-box example
Opening the box
Your turn: time to practice

To get started, we need to import a few Python modules.
All the TriScale-specific functions are part of one module called triscale.

In [ ]:

import os
from pathlib import Path

import pandas as pd
import numpy as np

import triscale

Alright, we are ready to size some experiments!

Banana: a black-box example¶

Throughout the tutorial presentation, we used the Banana communication protocol as an example.
Before having a closer look at what TriScale offers, let us simply use the tool to see how many runs we need.

Evaluation objective
Let us say we want to measure the overall energy consumption achieved by the protocol.
For this purpose, we can use a simple metric: the sum of the energy consumed by all nodes in the network.

Note: We could pick any metric; the choice of the metric is independent of TriScale's methodology.

Performance indicator
TriScale uses percentiles of the metric values as performance indicators.
The goal of the experiments is to obtain an estimate of such a percentile for a given level of confidence.

These estimates are refered to as KPIs, or Key Performance Indicators.

For our performance metric (energy consumption), the lower the value, the better.
Thus, we want to derive an upper bound for our chosen percentile and (hopefully) show that this bound is small.

So let us go ahead and define the percentive we want to estimate and the confidence level for that estimation:

In [ ]:

# Definition of Banana's KPI
percentile = 50 # the median
confidence = 95 # the confidence level, in %

These two values are sufficient to define the minimal number of runs required to compute this KPI.
The computation is implemented in TriScale's experiment_sizing() function:

In [ ]:

triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

We need a minimum of 5 runs.

We can now do the same thing to estimate the long-term variability with the variability score.

In [ ]:

# Definition of Banana's variability score
percentile = 25 # the median
confidence = 95 # the confidence level, in %

In [ ]:

triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

We need a minimum of 11 series.

Hence, with only these four parameters, we can connect the total number of runs one needs
to perform (a minimum of 11 series of 5 runs) with the corresponding performance claims that one can make:

KPI: In a series of runs, the median value of the runs metric values is lower or equal

to the KPI with a confidence of 95%.

Variability score: The range of KPI values of the middle 50% of series is less or equal

to the variability score, with a confidence of 95%.

Opening the box¶

In the previous section, we've seen on an example the basic usage of TriScale's experiment_sizing() function.
Let us now open the box a bit and explain how things work underneath.

Basic computation¶

TriScale implements a statistical method that allows to estimate, based on a data sample,
any percentile of the underlying distribution with any level of confidence. Importantly,
the estimation does not rely on any assumption on the nature of the underlying distribution
(e.g., normal, or Poisson). The estimate is valid as long as the sample is independent and
identically distributed (or iid ).

Intuitively, it is "easier" to estimate the median (50th percentile) than the 99th percentile;
the more extreme the percentile, the more samples are required to provide an estimate for a
given level of confidence.

Let us consider the samples $x$ are ordered such that $x_1 \leq x_2 \ldots \leq x_N$. One can derive
the minimal number of samples $N$ such that $x_1$ is a lower bound for any percentile $0<p<1$
with a level of confidence $0<C<1$ using the following equation:

$$N \;\geq\; \frac{log(1-C)}{log(1-p)}$$

TriScale's experiment_sizing() function implements this computation and returns the
minimal number of samples $N$, as illustrated below.

In [ ]:

# Select the percentile we want to estimate 
percentile = 10

# Select the desired level of confidence for the estimation
confidence = 99 # in %

# Compute the minimal number of samples N required
triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

The previous result indicates that for $N = 44$ samples and above, $x_1$ is a lower bound
for the 10th percentile with probibility larger than 99%.

What about upper bounds instead?¶

The probability distributions are symetric: it takes the same number of samples to compute
a lower bound for the $p$-th percentile as to compute an upper bound for the $(1-p)$-th percentile.

triscale.experiment_sizing returns the required number of samples to estimate

a lower bound for percentiles $p <= 0.5$
an upper bound for percentiles $p>0.5$

Hence, the following cell returns the same number of samples required as previously:

In [ ]:

percentile = 90
confidence = 99 # in %

triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

To get a better feeling of how this minimal number of samples evolves this increasing confidence
and more extreme percentiles, let us compute a range of minimal number of samples and display
the results in a table (where the columns are the percentiles to estimate).

You don't need to understand the code in the following cell. We simply computes the required

number of samples for a list of percentiles and confidence levels, and store everything in a
Pandas DataFrame for a nicer display in tabular format.

In [ ]:

# Sets of percentiles and confidence levels to try
percentiles = [0.1, 1, 5, 10, 25, 50, 75, 90, 95, 99, 99.9]
confidences = [75, 90, 95, 99, 99.9, 99.99]

# Computing the minimum number of runs for each (perc., conf.) pair
min_number_samples = []
for c in confidences:
    tmp = []
    for p in percentiles:
        N = triscale.experiment_sizing(p,c)
        tmp.append(N[0])
    min_number_samples.append(tmp)
    
# Put the results in a DataFrame for a convenient display of the results
df = pd.DataFrame(columns=percentiles, data=min_number_samples)
df['Confidence level'] = confidences
df.set_index('Confidence level', inplace=True)

display(df)

What about better bounds?¶

So far, we have seen how to compute the minimal number of samples such that $x_1$ is a valid lower bound.
This implies that the estimate is then equal to the smallest value obtained in your series of runs.

If you work in a domain where outliers are common, you will want to get better bounds, which should be
less affected by outliers. Good news, this is simple: you just need to run more experiments!

The experiment_sizing() function takes an optional robustness argument that defines how many
outliers you want your bound to exclude. In other words, for a robustness or $r$, the function returns
the minimal number of samples required such that $x_{r+1}$ is a valid lower bound. This is illustrated below.

In [ ]:

percentile = 10
confidence = 99
triscale.experiment_sizing(
    percentile, 
    confidence,
    robustness=3,
    verbose=True); 

We obtain that a minimum of $N = 97$ samples are required such that $x_4$ is a lower bound for the
10th percentile with a confidence level of 99%.

Naturally, this is (much) more than the 44 samples we got before, where $x_1$ was the

lower bound. There is no free lunch! Better bounds demand more experiments.
But at least, now you know how many you need :-)

Your turn: time to practice¶

Based on the explanations above, use TriScale's experiment_sizing function to answer
the following questions:

What is the minimal number of runs required to estimate the
- 90th percentile with 90% confidence?
- 90th percentile with 95% confidence?
- 95th percentile with 90% confidence?
Based on the answers to the previous questions, is it harder (i.e., does it require more runs)

to increase the confidence level, or to estimate a more extreme percentile?

Optional (and harder) question:

For $N = 50$ samples, what is the index $m$ of the best possible (i.e., the largest) lower bound

for the 25th percentile, estimated with a 95% confidence level?

In [ ]:

########## YOUR CODE HERE ###########
# ...
#####################################

Solutions¶

Click here show the solutions

>>> print(triscale.experiment_sizing(90,90)[0])
22
>>> print(triscale.experiment_sizing(90,95)[0])
29
>>> print(triscale.experiment_sizing(95,90)[0])
45

We observe that it "costs" many more runs to estimate a more extreme percentile
(95th instead of 90th) than to increase the confidence level (90% to 95%).
This observation holds true in general. The number of runs required increases
exponentially when the percentiles get more extreme (close to $0$ or to $1$).

For the last question, we must play with the robusteness parameter. We can write a simple loop to increase its value until the number of runs required
reaches 50.

>>> r = 0
>>> while (triscale.experiment_sizing(25,95,r)[0] < 50):
>>>     r += 1 
>>> print(r)
7

Hence, we can exclude the 7 "worst" samples from the confidence interval.
With $N=50$ samples, the best lower bound for the 25th percentile with 95% confidence
is $x_8$ (assuming the first sample is $x_1$).

Next step: Data Analysis
Back to repo