# Aggregating Percentiles¶

There's a lot of articles explaning how percentiles can't actually be aggregated without loss of accuracy, so I won't rehash that here.

But, there is a good question around what do when you have no choice but to aggregate them. In that case, what type of aggregation is best?

To begin, let's provide some basic functionality that will help us explore various scenarios.

In [3]:
# for templating and display
from IPython.display import display
from tabulate import tabulate

from math import ceil

def percentile(percentage, data_points):
""" return the point that is at the desired percentage """
data_points = sorted(data_points)
point_at_percentage = ceil(len(data_points) * percentage) - 1
return data_points[point_at_percentage]


And we can test our percentile with:

In [4]:
import random

# we seed with a constant, to reproduce results. but this notebook should work
# with any value or random generator
random.seed(0)
SAMPLE_SIZE = 1000

random_points = [random.random() for _ in range(SAMPLE_SIZE)]

# should end up being around 0.99, since the points are random across 0 - 100
percentile(0.99, random_points)

Out[4]:
0.9867958186960467

So we've verified this function works. Now comes the fun part. Let's provide a way to randomly distribute these values across multiple bins.

In [6]:
def exponential_random():
"""
return a number on the exponential scale,
to emulate web service traffic.

this will produce a sharp exponent from
1 - 100
"""
return 10 ** (random.random() * 2)

def create_bins(count, points):
""" return <count> bins, with <points> random points per bin """
all_points = []
bins = []
for _ in range(count):
b = []
for _ in range(points):
p = exponential_random()
all_points.append(p)
b.append(p)
bins.append(b)
return bins, all_points

create_bins(10, 1)

Out[6]:
([[20.78373073698856],
[2.0644462457050095],
[1.6599971633412063],
[10.182311868152327],
[39.20323829703609],
[16.221512415891407],
[32.322723393505704],
[3.400298651748122],
[3.7147164248552023],
[7.201239971993294]],
[20.78373073698856,
2.0644462457050095,
1.6599971633412063,
10.182311868152327,
39.20323829703609,
16.221512415891407,
32.322723393505704,
3.400298651748122,
3.7147164248552023,
7.201239971993294])

Finally, we need a way to test these aggregations under various conditions. There's a few real-world scenarios that our functions should handle:

### Low Data Per Bin¶

It may happen that your percentile bins are not that accurate themselves, due to a low volume of data. For example, if each bin only recieves 10 data points, a p99 calculation cannot be accurate, as the last data point represents the worst 10 percent of the points.

This scenario occurs when tracking latencies for web services. If your service is spread across multiple hosts and percentiles are calculated on a per-host basis, and you have a low-volume endpoint, often the number of requests in a short time frame (one second, even one minute) will not capture enough data.

### Single Bin¶

A single bin effectively reduces the scenario to the accuracy of the percentile itself. We won't test this scenario since no aggregation is really performed in this case.

So let's write some test code that modulates:

• the percentile we're trying to aggregate
• the number of bins we're aggregating
• the points per bin
In [7]:
def test_percentile(aggregation_func, percentage, bin_count, point_count):
""" test our aggregation functionality, under different circumstances """
bins, points = create_bins(bin_count, point_count)
percentiles = [
percentile(percentage, bin_points) for bin_points in bins
]
actual_value = percentile(percentage, points)
return aggregation_func(percentage, percentiles), actual_value

def benchmark_accuracy(strategies, percentage, bin_count, point_count, trials=100000):
"""
Try each strategy <trials> times, and take the average of the accuracy
with the expected value. return the winner, with the accurracy per each.
"""
result_by_strategy = {}
for strategy in strategies:
total_deviation = 0
for _ in range(trials):
result, expected = test_percentile(strategy, percentage, bin_count, point_count)
total_deviation += abs(result - expected)
result_by_strategy[strategy.__name__] = total_deviation / trials
# note in python3.6 iteration via insertion order is now
# a language specification
winner = None
min_deviation = trials * percentage
for name, result in result_by_strategy.items():
if result < min_deviation:
min_deviation = result
winner = name
return [winner] + list(result_by_strategy.values())

In [8]:
BIN_COUNTS = [10, 100]
POINT_COUNTS = [1, 10, 100]
PERCENTAGES = [0.5, 0.95, 0.99]
# PERCENTAGES = [0.5, 0.95, 0.99]
# BIN_COUNTS = [1]
# POINT_COUNTS = [1]

def benchmark_all():
data = [["bin_count", "points_per_bin", "percentage", "winner"] + [s.__name__ for s in STRATEGIES]]
for bin_count in BIN_COUNTS:
for point_count in POINT_COUNTS:
for percentage in PERCENTAGES:
print((bin_count, point_count, percentage))
result = benchmark_accuracy(
STRATEGIES, percentage, bin_count, point_count
)
data.append([
bin_count,
point_count,
percentage,
] + result)

def benchmark_strategy(aggregation_func):
""" benchmark a single strategy against various scenarios """
data = [["function", "bin", "points_per_bin", "percentage / expected", "result"]]
for bin_count in BIN_COUNTS:
for point_count in POINT_COUNTS:
for percentage in PERCENTAGES:
result = test_percentile(aggregation_func, percentage,
bin_count, point_count)
data.append([
aggregation_func.__name__,
percentage,
bin_count,
point_count,
percentage,
result
])


Now we outline the strategies:

In [9]:
def median(percentage, percentiles):
""" return the median point """
sorted(percentiles)
return percentiles[len(percentiles) // 2]

def average(_, percentiles):
return sum(percentiles) / len(percentiles)

def max_(_, percentile):
return max(percentile)

def min_(_, percentile):
return min(percentile)

STRATEGIES = [median, average, percentile, max_, min_]


## Strategy Prediction¶

So which one will be the best? To help evaluate that, I think there's two limits to consider.

At low data points, the problem reduces down to each bin containing only one, or no, data points. In this case, percentiles are the most accurate, as median and average would be taking the median and average of the bins, which would effectively take the median and average of the traffic.

At a high number of data points per bin, the bins themselves are a more accurate representation of the percentiles they are capturing. As the number of data points reach infinity..

In [10]:
# this will take a while. This is slow in Native Python,
# at some point it would be valuable to switch to Numpy,
# and use the functions inside of it.
benchmark_all()

(10, 1, 0.5)
(10, 1, 0.95)
(10, 1, 0.99)
(10, 10, 0.5)
(10, 10, 0.95)
(10, 10, 0.99)
(10, 100, 0.5)
(10, 100, 0.95)
(10, 100, 0.99)
(100, 1, 0.5)
(100, 1, 0.95)
(100, 1, 0.99)
(100, 10, 0.5)
(100, 10, 0.95)
(100, 10, 0.99)
(100, 100, 0.5)
(100, 100, 0.95)
(100, 100, 0.99)
bin_count    points_per_bin    percentage  winner        median     average    percentile      max_      min_
-----------  ----------------  ------------  ----------  --------  ----------  ------------  --------  --------
10                 1          0.5   percentile  16.4145   11.4482         0         59.8732    8.46032
10                 1          0.95  percentile  48.2928   48.4826         0          0        68.2175
10                 1          0.99  percentile  48.4232   48.4967         0          0        68.3683
10                10          0.5   average      5.05107   1.20435        2.43379   14.1471    6.85546
10                10          0.95  average     16.4724    7.05708       19.1464    19.1643   42.1451
10                10          0.99  percentile  22.3421   21.5031         4.17321    4.19587  57.2037
10               100          0.5   average      1.73056   0.226398       0.556829   3.99453   3.05509
10               100          0.95  average      6.24118   2.65317        8.51557    8.49106  15.766
10               100          0.99  percentile   4.57526   3.61064        2.80232    2.80631  13.732
100                 1          0.5   percentile  17.478    11.4759         0         85.6289    8.98846
100                 1          0.95  percentile  56.1186   54.9946         0         19.1492   75.4618
100                 1          0.99  percentile  70.006    69.988          0          4.17005  90.4284
100                10          0.5   average      5.28758   0.408567       2.03797   30.4184    8.09651
100                10          0.95  average     17.8629    9.18991       18.0849    20.4303   62.4184
100                10          0.99  percentile  25.7039   25.1357         4.00724    4.45979  78.4014
100               100          0.5   average      1.81351   0.0785362      0.260815   7.30585   4.42301
100               100          0.95  average      6.61081   2.90541        8.79043   13.2074   25.0559
100               100          0.99  percentile   4.91507   3.99404        3.61271    3.93704  23.0564