This notebook is better viewed rendered as slides. You can convert it to slides and view them by:
$ ipython nbconvert --to slides --post serve <this-notebook-name.ipynb>
This and other related IPython notebooks can be found at the course github repository:
Even choosing a fruit implies dealing with conflicting objectives.
Note: In case you are -still- wondering, a maximization problem can be posed as the minimization one: $\min\ -\vec{F}(\vec{x})$.
Usually, there is not a unique solution that minimizes all objective functions simultaneously, but, instead, a set of equally good trade-off solutions.
As usual, we need some initialization and configuration.
import time, array, random, copy, math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rcParams['text.latex.preamble'] ='\\usepackage{libertine}\n\\usepackage[utf8]{inputenc}'
import seaborn
seaborn.set(style='whitegrid')
seaborn.set_context('notebook')
from deap import algorithms, base, benchmarks, tools, creator
Planting a constant seed to always have the same results (and avoid surprises in class). -you should not do this in a real-world case!
random.seed(a=42)
creator.create("FitnessMin", base.Fitness, weights=(-1.0,-1.0))
creator.create("Individual", array.array, typecode='d',
fitness=creator.FitnessMin)
Implementing the Dent problem
def dent(individual, lbda = 0.85):
"""
Implements the test problem Dent
Num. variables = 2; bounds in [-1.5, 1.5]; num. objetives = 2.
@author Cesar Revelo
"""
d = lbda * math.exp(-(individual[0] - individual[1]) ** 2)
f1 = 0.5 * (math.sqrt(1 + (individual[0] + individual[1]) ** 2) + \
math.sqrt(1 + (individual[0] - individual[1]) ** 2) + \
individual[0] - individual[1]) + d
f2 = 0.5 * (math.sqrt(1 + (individual[0] + individual[1]) ** 2) + \
math.sqrt(1 + (individual[0] - individual[1]) ** 2) - \
individual[0] + individual[1]) + d
return f1, f2
Preparing a DEAP toolbox
with Dent.
toolbox = base.Toolbox()
BOUND_LOW, BOUND_UP = -1.5, 1.5
NDIM = 2
# toolbox.register("evaluate", lambda ind: benchmarks.dtlz2(ind, 2))
toolbox.register("evaluate", dent)
Defining attributes, individuals and population.
def uniform(low, up, size=None):
try:
return [random.uniform(a, b) for a, b in zip(low, up)]
except TypeError:
return [random.uniform(a, b) for a, b in zip([low] * size, [up] * size)]
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
Creating an example population distributed as a mesh.
num_samples = 50
limits = [np.arange(BOUND_LOW, BOUND_UP, (BOUND_UP - BOUND_LOW)/num_samples)] * NDIM
sample_x = np.meshgrid(*limits)
flat = []
for i in range(len(sample_x)):
x_i = sample_x[i]
flat.append(x_i.reshape(num_samples**NDIM))
example_pop = toolbox.population(n=num_samples**NDIM)
for i, ind in enumerate(example_pop):
for j in range(len(flat)):
ind[j] = flat[j][i]
fitnesses = toolbox.map(toolbox.evaluate, example_pop)
for ind, fit in zip(example_pop, fitnesses):
ind.fitness.values = fit
We also need a_given_individual
.
a_given_individual = toolbox.population(n=1)[0]
a_given_individual[0] = 0.5
a_given_individual[1] = 0.5
a_given_individual.fitness.values = toolbox.evaluate(a_given_individual)
Implementing the Pareto dominance relation between two individulas.
def pareto_dominance(ind1,ind2):
'Returns `True` if `ind1` dominates `ind2`.'
extrictly_better = False
for item1 in ind1.fitness.values:
for item2 in ind2.fitness.values:
if item1 > item2:
return False
if not extrictly_better and item1 < item2:
extrictly_better = True
return extrictly_better
Note: Bear in mind that DEAP comes with a Pareto dominance relation that probably is more efficient than this implementation.
def pareto_dominance(x,y):
return tools.emo.isDominated(x.fitness.values, y.fitness.values)
Lets compute the set of individuals that are dominated
by a_given_individual
, the ones that dominate it (its dominators
) and the remaining ones.
dominated = [ind for ind in example_pop if pareto_dominance(a_given_individual, ind)]
dominators = [ind for ind in example_pop if pareto_dominance(ind, a_given_individual)]
others = [ind for ind in example_pop if not ind in dominated and not ind in dominators]
def plot_dent():
'Plots the points in decision and objective spaces.'
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
for ind in dominators: plt.plot(ind[0], ind[1], 'r.')
for ind in dominated: plt.plot(ind[0], ind[1], 'g.')
for ind in others: plt.plot(ind[0], ind[1], 'k.', ms=3)
plt.plot(a_given_individual[0], a_given_individual[1], 'bo', ms=6);
plt.xlabel('$x_1$');plt.ylabel('$x_2$');
plt.title('Decision space');
plt.subplot(1,2,2)
for ind in dominators: plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'r.', alpha=0.7)
for ind in dominated: plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'g.', alpha=0.7)
for ind in others: plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'k.', alpha=0.7, ms=3)
plt.plot(a_given_individual.fitness.values[0], a_given_individual.fitness.values[1], 'bo', ms=6);
plt.xlabel('$f_1(\mathbf{x})$');plt.ylabel('$f_2(\mathbf{x})$');
plt.xlim((0.5,3.6));plt.ylim((0.5,3.6));
plt.title('Objective space');
plt.tight_layout()
Having a_given_individual
(blue dot) we can now plot those that are dominated by it (in green), those that dominate it (in red) and those that are uncomparable.
plot_dent()
Obtaining the nondominated front.
non_dom = tools.sortNondominated(example_pop, k=len(example_pop), first_front_only=True)[0]
plt.figure(figsize=(5,5))
for ind in example_pop:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'k.', ms=3, alpha=0.5)
for ind in non_dom:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'bo', alpha=0.74, ms=5)
plt.title('Pareto-optimal front')
<matplotlib.text.Text at 0x11a000e48>
with $\mathcal{F}_1$ equal to $\mathcal{P}_t^\ast$, the set of non-dominated individuals of $\mathcal{P}_t$.
The assignment process goes as follows,
{f_m\left(\vec{x}_{I_{1}}\right)-f_m\left(\vec{x}_{I_{f_l}}\right)}\,.
$$
Here the $\mathrm{sort}\left(\set{F},m\right)$ function produces an ordered index vector $\vec{I}$ with respect to objective function $m$.
Sorting the population by rank and distance.
Now we have key element of the the non-dominated sorting GA.
We will deal with DTLZ3, which is a more difficult test problem.
New toolbox
instance with the necessary components.
toolbox = base.Toolbox()
Define problem domain as $\vec{x}\in\left[0,1\right]^{30}$ and a two-objective DTLZ3 instance.
NDIM = 30
BOUND_LOW, BOUND_UP = 0.0, 1.0
toolbox.register("evaluate", lambda ind: benchmarks.dtlz3(ind, 2))
Describing attributes, individuals and population and defining the selection, mating and mutation operators.
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("mate", tools.cxSimulatedBinaryBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0)
toolbox.register("mutate", tools.mutPolynomialBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0, indpb=1.0/NDIM)
toolbox.register("select", tools.selNSGA2)
Let's also use the toolbox
to store other configuration parameters of the algorithm. This will show itself usefull when performing massive experiments.
toolbox.pop_size = 50
toolbox.max_gen = 1000
toolbox.mut_prob = 0.2
Storing all the required information in the toolbox
and using DEAP's algorithms.eaMuPlusLambda
function allows us to create a very compact -albeit not a 100% exact copy of the original- implementation of NSGA-II.
def run_ea(toolbox, stats=None, verbose=False):
pop = toolbox.population(n=toolbox.pop_size)
pop = toolbox.select(pop, len(pop))
return algorithms.eaMuPlusLambda(pop, toolbox, mu=toolbox.pop_size,
lambda_=toolbox.pop_size,
cxpb=1-toolbox.mut_prob,
mutpb=toolbox.mut_prob,
stats=stats,
ngen=toolbox.max_gen,
verbose=verbose)
We are now ready to run our NSGA-II.
%time res,_ = run_ea(toolbox)
CPU times: user 20.5 s, sys: 234 ms, total: 20.7 s Wall time: 22 s
We can now get the Pareto fronts in the results (res
).
fronts = tools.emo.sortLogNondominated(res, len(res))
plot_colors = seaborn.color_palette("Set1", n_colors=10)
fig, ax = plt.subplots(1, figsize=(4,4))
for i,inds in enumerate(fronts):
par = [toolbox.evaluate(ind) for ind in inds]
df = pd.DataFrame(par)
df.plot(ax=ax, kind='scatter', label='Front ' + str(i+1),
x=df.columns[0], y=df.columns[1],
color=plot_colors[i])
plt.xlabel('$f_1(\mathbf{x})$');plt.ylabel('$f_2(\mathbf{x})$');
We create a stats
to store the individuals not only their objective function values.
stats = tools.Statistics()
stats.register("pop", copy.deepcopy)
toolbox.max_gen = 1000 # we need more generations!
Re-run the algorithm to get the data necessary for plotting.
%time res, logbook = run_ea(toolbox, stats=stats)
CPU times: user 20.9 s, sys: 266 ms, total: 21.2 s Wall time: 21.6 s
from matplotlib import animation
from IPython.display import HTML
def animate(frame_index, logbook):
'Updates all plots to match frame _i_ of the animation.'
ax.clear()
fronts = tools.emo.sortLogNondominated(logbook.select('pop')[frame_index],
len(logbook.select('pop')[frame_index]))
for i,inds in enumerate(fronts):
par = [toolbox.evaluate(ind) for ind in inds]
df = pd.DataFrame(par)
df.plot(ax=ax, kind='scatter', label='Front ' + str(i+1),
x=df.columns[0], y=df.columns[1], alpha=0.47,
color=plot_colors[i % len(plot_colors)])
ax.set_title('$t=$' + str(frame_index))
ax.set_xlabel('$f_1(\mathbf{x})$');ax.set_ylabel('$f_2(\mathbf{x})$')
return []
fig = plt.figure(figsize=(4,4))
ax = fig.gca()
anim = animation.FuncAnimation(fig, lambda i: animate(i, logbook),
frames=len(logbook), interval=60,
blit=True)
plt.close()
HTML(anim.to_html5_video())
Here it is clearly visible how the algorithm "jumps" from one local-optimum to a better one as evolution takes place.
Each problem instance is meant to test the algorithms with regard with a given feature: local optima, convexity, discontinuity, bias, or a combination of them.
How does our NSGA-II behaves when faced with different benchmark problems?
problem_instances = {'ZDT1': benchmarks.zdt1, 'ZDT2': benchmarks.zdt2,
'ZDT3': benchmarks.zdt3, 'ZDT4': benchmarks.zdt4,
'DTLZ1': lambda ind: benchmarks.dtlz1(ind,2),
'DTLZ2': lambda ind: benchmarks.dtlz2(ind,2),
'DTLZ3': lambda ind: benchmarks.dtlz3(ind,2),
'DTLZ4': lambda ind: benchmarks.dtlz4(ind,2, 100),
'DTLZ5': lambda ind: benchmarks.dtlz5(ind,2),
'DTLZ6': lambda ind: benchmarks.dtlz6(ind,2),
'DTLZ7': lambda ind: benchmarks.dtlz7(ind,2)}
toolbox.max_gen = 1000
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("obj_vals", np.copy)
def run_problem(toolbox, problem):
toolbox.register('evaluate', problem)
return run_ea(toolbox, stats=stats)
Running NSGA-II solving all problems. Now it takes longer.
%time results = {problem: run_problem(toolbox, problem_instances[problem]) for problem in problem_instances}
CPU times: user 3min 34s, sys: 1.95 s, total: 3min 36s Wall time: 3min 41s
Creating this animation takes more programming effort.
class MultiProblemAnimation:
def init(self, fig, results):
self.results = results
self.axs = [fig.add_subplot(3,4,i+1) for i in range(len(results))]
self.plots =[]
for i, problem in enumerate(sorted(results)):
(res, logbook) = self.results[problem]
pop = pd.DataFrame(data=logbook.select('obj_vals')[0])
plot = self.axs[i].plot(pop[0], pop[1], 'b.', alpha=0.47)[0]
self.plots.append(plot)
fig.tight_layout()
def animate(self, t):
'Updates all plots to match frame _i_ of the animation.'
for i, problem in enumerate(sorted(results)):
#self.axs[i].clear()
(res, logbook) = self.results[problem]
pop = pd.DataFrame(data=logbook.select('obj_vals')[t])
self.plots[i].set_data(pop[0], pop[1])
self.axs[i].set_title(problem + '; $t=' + str(t)+'$')
self.axs[i].set_xlim((0, max(1,pop.max()[0])))
self.axs[i].set_ylim((0, max(1,pop.max()[1])))
return self.axs
mpa = MultiProblemAnimation()
fig = plt.figure(figsize=(14,6))
anim = animation.FuncAnimation(fig, mpa.animate, init_func=mpa.init(fig,results),
frames=toolbox.max_gen, interval=60, blit=True)
plt.close()
HTML(anim.to_html5_video())
Read more about this in the class materials.
Evolutionary algorithms are stochastic algorithms, therefore their results must be assessed by repeating experiments until you reach an statistically valid conclusion.
Let's make a relatively simple but very important experiment:
As usual we need to establish some notation:
toolbox
instances to define experiments.We start by creating a toolbox
that will contain the configuration that will be shared across all experiments.
toolbox = base.Toolbox()
BOUND_LOW, BOUND_UP = 0.0, 1.0
NDIM = 30
# the explanation of this... a few lines bellow
def eval_helper(ind):
return benchmarks.dtlz3(ind, 2)
toolbox.register("evaluate", eval_helper)
def uniform(low, up, size=None):
try:
return [random.uniform(a, b) for a, b in zip(low, up)]
except TypeError:
return [random.uniform(a, b) for a, b in zip([low] * size, [up] * size)]
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("mate", tools.cxSimulatedBinaryBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0)
toolbox.register("mutate", tools.mutPolynomialBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0, indpb=1.0/NDIM)
toolbox.register("select", tools.selNSGA2)
toolbox.mut_prob = 0.15
We add a experiment_name
to toolbox
that we will fill up later on. Here $n\mathrm{pop}$ denotes the population size and $t_\mathrm{max}$ the max. number of iterations.
experiment_name = "$n_\mathrm{{pop}}={0};\ t_\mathrm{{max}}={1}$"
total_evals = 500
We can now replicate this toolbox instance and then modify the mutation probabilities.
pop_sizes = (10,50,100)
toolboxes=list([copy.deepcopy(toolbox) for _ in range(len(pop_sizes))])
Now toolboxes
is a list of copies of the same toolbox. One for each experiment configuration (population size).
...but we still have to set the population sizes in the elements of toolboxes
.
for pop_size, toolbox in zip(pop_sizes, toolboxes):
toolbox.pop_size = pop_size
toolbox.max_gen = total_evals // pop_size
toolbox.experiment_name = experiment_name.format(toolbox.pop_size, toolbox.max_gen)
for toolbox in toolboxes:
print(toolbox.experiment_name, toolbox.pop_size, toolbox.max_gen)
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ 10 50 $n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ 50 10 $n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ 100 5
As we are dealing with stochastic methods their results should be reported relying on an statistical analysis.
toolbox
instance in our case) should be repeated a sufficient amount of times.Note: I personally like the number 42 as it is the answer to The Ultimate Question of Life, the Universe, and Everything.
number_of_runs = 42
As we are now solving more demanding problems it would be nice to make our algorithms to run in parallel and profit from modern multi-core CPUs.
map()
function via the toolbox
.multiprocessing
or concurrent.futures
modules.from ipywidgets import IntProgress
from IPython.display import display
Process-based parallelization based on multiprocessing
requires that the parameters passed to map()
be pickleable.
lambda
functions can not be directly used.lambda
fans out there! -me included.def run_algo_wrapper(toolbox):
result, _ = run_ea(toolbox)
local_pareto_set = tools.emo.sortLogNondominated(result, len(result), first_front_only=True)
return local_pareto_set
%%time
import concurrent.futures
progress_bar = IntProgress(description="000/000", max=len(toolboxes)*number_of_runs)
display(progress_bar)
results = {toolbox.experiment_name:[] for toolbox in toolboxes}
with concurrent.futures.ProcessPoolExecutor() as executor:
# Submit all the tasks...
futures = {executor.submit(run_algo_wrapper, toolbox): toolbox
for _ in range(number_of_runs)
for toolbox in toolboxes}
# ...and wait for them to finish.
for future in concurrent.futures.as_completed(futures):
tb = futures[future]
results[tb.experiment_name].append(future.result())
progress_bar.value +=1
progress_bar.description = "%03d/%03d:" % (progress_bar.value, progress_bar.max)
CPU times: user 331 ms, sys: 97.2 ms, total: 429 ms Wall time: 15.1 s
This is not a perfect implementation but... works!
As running the experiments sometimes takes a long time it is a good practice to store the results.
import pickle
pickle.dump(results, open('nsga_ii_dtlz3-results.pickle', 'wb'))
In case you need it, this file is included in the github repository.
To load the results we would just have to:
# loaded_results = pickle.load(open('nsga_ii_dtlz3-results.pickle', 'rb'))
# results = loaded_results # <-- uncomment if needed
results
is a dictionary, but a pandas DataFrame
is a more handy container for the results.
res = pd.DataFrame(results)
Headers may come out unsorted. Let's fix that first.
res.head()
$n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | $n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | $n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | |
---|---|---|---|
0 | [[0.9977275932042845, 0.9965136379259723, 0.51... | [[0.9984083629151556, 0.09353025978685643, 0.7... | [[0.9998822922333848, 0.3606206000614993, 0.86... |
1 | [[0.9994523656808363, 0.15096009667708038, 0.7... | [[0.9999296140942003, 0.6362939526923219, 0.00... | [[0.9990418574002103, 0.8049558879536657, 0.48... |
2 | [[0.9933887753816163, 0.11289114848347837, 0.4... | [[0.9997638765656464, 0.2829054976583878, 0.89... | [[0.9999169320345067, 0.10912800384098932, 0.7... |
3 | [[0.9995015319036422, 0.5255243150173985, 0.53... | [[0.998903716395094, 0.6948982307664169, 0.190... | [[0.9939804258490037, 0.5537945580214072, 0.49... |
4 | [[0.9999071795487058, 0.597551943491093, 0.738... | [[0.99990936520471, 0.09148086217380487, 0.188... | [[0.9987082847513086, 0.6251687696069047, 0.17... |
res = res.reindex_axis([toolbox.experiment_name for toolbox in toolboxes], axis=1)
res.head()
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | $n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | $n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | |
---|---|---|---|
0 | [[0.9984083629151556, 0.09353025978685643, 0.7... | [[0.9998822922333848, 0.3606206000614993, 0.86... | [[0.9977275932042845, 0.9965136379259723, 0.51... |
1 | [[0.9999296140942003, 0.6362939526923219, 0.00... | [[0.9990418574002103, 0.8049558879536657, 0.48... | [[0.9994523656808363, 0.15096009667708038, 0.7... |
2 | [[0.9997638765656464, 0.2829054976583878, 0.89... | [[0.9999169320345067, 0.10912800384098932, 0.7... | [[0.9933887753816163, 0.11289114848347837, 0.4... |
3 | [[0.998903716395094, 0.6948982307664169, 0.190... | [[0.9939804258490037, 0.5537945580214072, 0.49... | [[0.9995015319036422, 0.5255243150173985, 0.53... |
4 | [[0.99990936520471, 0.09148086217380487, 0.188... | [[0.9987082847513086, 0.6251687696069047, 0.17... | [[0.9999071795487058, 0.597551943491093, 0.738... |
a = res.applymap(lambda pop: [toolbox.evaluate(ind) for ind in pop])
plt.figure(figsize=(11,3))
for i, col in enumerate(a.columns):
plt.subplot(1, len(a.columns), i+1)
for pop in a[col]:
x = pd.DataFrame(data=pop)
plt.scatter(x[0], x[1], marker='.', alpha=0.5)
plt.title(col)
The local Pareto-optimal fronts are clearly visible!
Calculating the reference point: a point that is “worst” than any other individual in every objective.
def calculate_reference(results, epsilon=0.1):
alldata = np.concatenate(np.concatenate(results.values))
obj_vals = [toolbox.evaluate(ind) for ind in alldata]
return np.max(obj_vals, axis=0) + epsilon
reference = calculate_reference(res)
reference
array([ 3475.29022518, 3547.26733352])
We can now compute the hypervolume of the Pareto-optimal fronts yielded by each algorithm run.
import deap.benchmarks.tools as bt
hypervols = res.applymap(lambda pop: bt.hypervolume(pop, reference))
hypervols.head()
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | $n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | $n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | |
---|---|---|---|
0 | 1.154701e+07 | 9.976071e+06 | 9.555584e+06 |
1 | 1.122888e+07 | 1.022963e+07 | 9.082437e+06 |
2 | 1.173466e+07 | 9.776245e+06 | 9.107024e+06 |
3 | 1.143605e+07 | 9.906633e+06 | 9.150103e+06 |
4 | 1.134042e+07 | 1.029052e+07 | 8.665202e+06 |
hypervols.describe()
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | $n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | $n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | |
---|---|---|---|
count | 4.200000e+01 | 4.200000e+01 | 4.200000e+01 |
mean | 1.140611e+07 | 1.000509e+07 | 8.969749e+06 |
std | 3.290728e+05 | 2.324033e+05 | 2.982720e+05 |
min | 1.040199e+07 | 9.481858e+06 | 8.436061e+06 |
25% | 1.119742e+07 | 9.868847e+06 | 8.815870e+06 |
50% | 1.144434e+07 | 1.000275e+07 | 8.900598e+06 |
75% | 1.165122e+07 | 1.014667e+07 | 9.131239e+06 |
max | 1.195610e+07 | 1.047124e+07 | 9.626931e+06 |
fig = plt.figure(figsize=(11,4))
plt.subplot(121, title='Violin plots of NSGA-II with $P_{\mathrm{mut}}$')
seaborn.violinplot(data=hypervols, palette='Set2')
plt.ylabel('Hypervolume'); plt.xlabel('Configuration')
plt.subplot(122, title='Box plots of NSGA-II with $P_{\mathrm{mut}}$')
seaborn.boxplot(data=hypervols, palette='Set2')
plt.ylabel('Hypervolume'); plt.xlabel('Configuration');
plt.tight_layout()
We start by writing a function that helps us tabulate the results of the application of an statistical hypothesis test.
import itertools
import scipy.stats as stats
def compute_stat_matrix(data, stat_func, alpha=0.05):
'''A function that applies `stat_func` to all combinations of columns in `data`.
Returns a squared matrix with the p-values'''
p_values = pd.DataFrame(columns=data.columns, index=data.columns)
for a,b in itertools.combinations(data.columns,2):
s,p = stat_func(data[a], data[b])
p_values[a].ix[b] = p
p_values[b].ix[a] = p
return p_values
The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal.
stats.kruskal(*[hypervols[col] for col in hypervols.columns])
KruskalResult(statistic=110.49079579338292, pvalue=1.0167836238281895e-24)
We now can assert that the results are not the same but which ones are different or similar to the others the others?
In case that the null hypothesis of the Kruskal-Wallis is rejected the Conover–Inman procedure (Conover, 1999, pp. 288-290) can be applied in a pairwise manner in order to determine if the results of one algorithm were significantly better than those of the other.
Note: If you want to get an extended summary of this method check out my PhD thesis.
def conover_inman_procedure(data, alpha=0.05):
num_runs = len(data)
num_algos = len(data.columns)
N = num_runs*num_algos
_,p_value = stats.kruskal(*[data[col] for col in data.columns])
ranked = stats.rankdata(np.concatenate([data[col] for col in data.columns]))
ranksums = []
for i in range(num_algos):
ranksums.append(np.sum(ranked[num_runs*i:num_runs*(i+1)]))
S_sq = (np.sum(ranked**2) - N*((N+1)**2)/4)/(N-1)
right_side = stats.t.cdf(1-(alpha/2), N-num_algos) * \
math.sqrt((S_sq*((N-1-p_value)/(N-1)))*2/num_runs)
res = pd.DataFrame(columns=data.columns, index=data.columns)
for i,j in itertools.combinations(np.arange(num_algos),2):
res[res.columns[i]].ix[j] = abs(ranksums[i] - ranksums[j]/num_runs) > right_side
res[res.columns[j]].ix[i] = abs(ranksums[i] - ranksums[j]/num_runs) > right_side
return res
conover_inman_procedure(hypervols)
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | $n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | $n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | |
---|---|---|---|
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | NaN | True | True |
$n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | True | NaN | True |
$n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | True | True | NaN |
We now know in what cases the difference is sufficient as to say that one result is better than the other.
Another alternative is the Friedman test.
measurements = [list(hypervols[col]) for col in hypervols.columns]
stats.friedmanchisquare(*measurements)
FriedmanchisquareResult(statistic=82.047619047619037, pvalue=1.5261102074586963e-18)
Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test (WRS), or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis that two populations are the same against an alternative hypothesis, especially that a particular population tends to have larger values than the other.
It has greater efficiency than the $t$-test on non-normal distributions, such as a mixture of normal distributions, and it is nearly as efficient as the $t$-test on normal distributions.
raw_p_values=compute_stat_matrix(hypervols, stats.mannwhitneyu)
raw_p_values
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | $n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | $n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | |
---|---|---|---|
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | NaN | 1.67658e-15 | 1.56072e-15 |
$n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | 1.67658e-15 | NaN | 2.96475e-15 |
$n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | 1.56072e-15 | 2.96475e-15 | NaN |
The familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors, among all the hypotheses when performing multiple hypotheses tests.
Example: When performing a test, there is a $\alpha$ chance of making a type I error. If we make $m$ tests, then the probability of making one type I error is $m\alpha$. Therefore, if an $\alpha=0.05$ is used and 5 pairwise comparisons are made, we will have a $5\times0.05 = 0.25$ chance of making a type I error.
One of these corrections is the Šidák correction as it is less conservative than the Bonferroni correction: $$\alpha_{SID} = 1-(1-\alpha)^\frac{1}{m},$$ where $m$ is the number of tests.
from scipy.misc import comb
alpha=0.05
alpha_sid = 1 - (1-alpha)**(1/comb(len(hypervols.columns), 2))
alpha_sid
0.016952427508441503
Let's apply the corrected alpha to raw_p_values
. If we have a cell with a True
value that means that those two results are the same.
raw_p_values.applymap(lambda value: value <= alpha_sid)
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | $n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | $n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | |
---|---|---|---|
$n_\mathrm{pop}=10;\ t_\mathrm{max}=50$ | False | True | True |
$n_\mathrm{pop}=50;\ t_\mathrm{max}=10$ | True | False | True |
$n_\mathrm{pop}=100;\ t_\mathrm{max}=5$ | True | True | False |
In this class/notebook we have seen some key elements:
Bear in mind that:
# To install run: pip install version_information
%load_ext version_information
%version_information scipy, numpy, matplotlib, seaborn, deap
Software | Version |
---|---|
Python | 3.6.0 64bit [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] |
IPython | 5.2.2 |
OS | Darwin 16.4.0 x86_64 i386 64bit |
scipy | 0.18.1 |
numpy | 1.11.3 |
matplotlib | 2.0.0 |
seaborn | 0.7.1 |
deap | 1.1 |
Sat Mar 04 02:05:51 2017 BRT |
# this code is here for cosmetic reasons
from IPython.core.display import HTML
from urllib.request import urlopen
HTML(urlopen('https://raw.githubusercontent.com/lmarti/jupyter_custom/master/custom.include').read().decode('utf-8'))