Understanding evolutionary strategies and covariance matrix adaptation

Luis Martí, IC/UFF

http://lmarti.com; [email protected]

Advanced Evolutionary Computation: Theory and Practice

The notebook is better viewed rendered as slides. You can convert it to slides and view them by:

This and other related IPython notebooks can be found at the course github repository:

In [61]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from matplotlib import cm 
from mpl_toolkits.mplot3d import axes3d
from scipy.stats import norm, multivariate_normal
import math

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rcParams['text.latex.preamble'] ='\\usepackage{libertine}\n\\usepackage[utf8]{inputenc}'

import seaborn
seaborn.set(style='whitegrid')
seaborn.set_context('notebook')

Statistics recap

  • Random variable: a variable whose value is subject to variations due to chance. A random variable can take on a set of possible different values, each with an associated probability, in contrast to other mathematical variables.

  • Probability distribution: mathematical function describing the possible values of a random variable and their associated probabilities.

  • Probability density function (pdf) of a continuous random variable is a function that describes the relative likelihood for this random variable to take on a given value.

    • The probability of the random variable falling within a particular range of values is given by the integral of this variable‚Äôs density over that range.
    • The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one.

Moments

The probability distribution of a random variable is often characterised by a small number of parameters, which also have a practical interpretation.

  • Mean (a.k.a expected value) refers to one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution.
    • population mean: $\mu = \operatorname{E}[X]$.
    • estimation of sample mean: $\bar{x}$.
  • Standard deviation measures the amount of variation or dispersion from the mean.
    • population deviation: $$ \sigma = \sqrt{\operatorname E[X^2]-(\operatorname E[X])^2} = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2}. $$
    • unbiased estimator: $$ s^2 = \frac{1}{N-1} \sum_{i=1}^N (x_i - \overline{x})^2. $$

Two samples

In [62]:
sample1 = np.random.normal(0, 0.5, 1000)
sample2 = np.random.normal(1,1,500)
In [63]:
def plot_normal_sample(sample, mu, sigma):
    'Plots an histogram and the normal distribution corresponding to the parameters.'
    x = np.linspace(mu - 4*sigma, mu + 4*sigma, 100)
    plt.plot(x, norm.pdf(x, mu, sigma), 'b', lw=2)
    plt.hist(sample, 30, normed=True, alpha=0.2)
    plt.annotate('3$\sigma$', 
                     xy=(mu + 3*sigma, 0),  xycoords='data',
                     xytext=(0, 100), textcoords='offset points',
                     fontsize=15,
                     arrowprops=dict(arrowstyle="->",
                                    connectionstyle="arc,angleA=180,armA=20,angleB=90,armB=15,rad=7"))
    plt.annotate('-3$\sigma$', 
                     xy=(mu -3*sigma, 0), xycoords='data', 
                     xytext=(0, 100), textcoords='offset points',
                     fontsize=15,
                     arrowprops=dict(arrowstyle="->",
                                     connectionstyle="arc,angleA=180,armA=20,angleB=90,armB=15,rad=7"))
In [64]:
plt.figure(figsize=(11,4))
plt.subplot(121)
plot_normal_sample(sample1, 0, 0.5)
plt.title('Sample 1: $\mu=0$, $\sigma=0.5$')
plt.subplot(122)
plot_normal_sample(sample2, 1, 1)
plt.title('Sample 2: $\mu=1$, $\sigma=1$')
plt.tight_layout();
In [65]:
print('Sample 1; estimated mean:', sample1.mean(), ' and std. dev.: ', sample1.std())
print('Sample 2; estimated mean:', sample2.mean(), ' and std. dev.: ', sample2.std())
Sample 1; estimated mean: -0.000176330861078  and std. dev.:  0.472785600332
Sample 2; estimated mean: 1.00435499156  and std. dev.:  0.969981727482

Covariance is a measure of how much two random variables change together. $$ \operatorname{cov}(X,Y) = \operatorname{E}{\big[(X - \operatorname{E}[X])(Y - \operatorname{E}[Y])\big]}, $$ $$ \operatorname{cov}(X,X) = s(X), $$

  • The sign of the covariance therefore shows the tendency in the linear relationship between the variables.
  • The magnitude of the covariance is not easy to interpret.
  • The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

Understanding covariance

In [66]:
sample_2d = np.array(list(zip(sample1, np.ones(len(sample1))))).T
In [67]:
plt.scatter(sample_2d[0,:], sample_2d[1,:], marker='x');
In [68]:
np.cov(sample_2d) # computes covariance between the two components of the sample
Out[68]:
array([[ 0.22374997,  0.        ],
       [ 0.        ,  0.        ]])

As the sample is only distributed along one axis, the covariance does not detects any relationship between them.

What happens when we rotate the sample?

In [69]:
def rotate_sample(sample, angle=-45):
    'Rotates a sample by `angle` degrees.'
    theta = (angle/180.) * np.pi
    rot_matrix = np.array([[np.cos(theta), -np.sin(theta)], 
                           [np.sin(theta), np.cos(theta)]])
    return sample.T.dot(rot_matrix).T
In [70]:
rot_sample_2d = rotate_sample(sample_2d)
In [71]:
plt.scatter(rot_sample_2d[0,:], rot_sample_2d[1,:], marker='x');
In [72]:
np.cov(rot_sample_2d)
Out[72]:
array([[ 0.11187499,  0.11187499],
       [ 0.11187499,  0.11187499]])

A two-dimensional normally-distributed variable

In [74]:
mu = [0,1]
cov = [[1,0],[0,0.2]] # diagonal covariance, points lie on x or y-axis
sample = np.random.multivariate_normal(mu,cov,1000).T
plt.scatter(sample[0], sample[1], marker='x', alpha=0.29)

estimated_mean = sample.mean(axis=1)
estimated_cov = np.cov(sample)
e_x,e_y = np.random.multivariate_normal(estimated_mean,estimated_cov,500).T

plt.plot(e_x,e_y,'rx', alpha=0.47)
x, y = np.mgrid[-4:4:.01, -1:3:.01]
pos = np.empty(x.shape + (2,))
pos[:, :, 0] = x; pos[:, :, 1] = y
rv = multivariate_normal(estimated_mean, estimated_cov)
plt.contour(x, y, rv.pdf(pos), cmap=cm.viridis_r, lw=4)
plt.axis('equal');

This is better understood in 3D

In [76]:
fig = plt.figure(figsize=(11,5))
ax = fig.gca(projection='3d')
ax.plot_surface(x, y, rv.pdf(pos), cmap=cm.viridis_r, rstride=30, cstride=10, linewidth=1, alpha=0.47)
ax.plot_wireframe(x, y, rv.pdf(pos), linewidth=0.47, alpha=0.47)
ax.scatter(e_x, e_y, 0.4, marker='.', alpha=0.47)
ax.axis('tight');

Again, what happens if we rotate the sample?

In [77]:
rot_sample = rotate_sample(sample)
estimated_mean = rot_sample.mean(axis=1)
estimated_cov = np.cov(rot_sample)
e_x,e_y = np.random.multivariate_normal(estimated_mean,estimated_cov,500).T
In [78]:
fig = plt.figure(figsize=(11,4))
plt.subplot(121)
plt.scatter(rot_sample[0,:], rot_sample[1,:], marker='x', alpha=0.7)
plt.title('"Original" data')
plt.axis('equal')
plt.subplot(122)
plt.scatter(e_x, e_y, marker='o', color='g', alpha=0.7)
plt.title('Sampled data')
plt.axis('equal');

Covariance captures the dependency and can model disposition of the "original" sample.

In [79]:
x, y = np.mgrid[-4:4:.01, -3:3:.01]
pos = np.empty(x.shape + (2,))
pos[:, :, 0] = x; pos[:, :, 1] = y
rv = multivariate_normal(estimated_mean, estimated_cov)
In [80]:
fig = plt.figure(figsize=(11,5))
ax = fig.gca(projection='3d')
ax.plot_surface(x, y, rv.pdf(pos), cmap=cm.viridis_r, rstride=30, cstride=10, linewidth=1, alpha=0.47)
ax.plot_wireframe(x, y, rv.pdf(pos), linewidth=0.47, alpha=0.47)
ax.scatter(e_x, e_y, 0.4, marker='.', alpha=0.47)
ax.axis('tight');