The following commands will build the slideshow:

In [91]:
!aws s3 cp s3://jupyter/notebooks/talks/DataPhilly\ February\ 2016\ Bayesian\ Optimization\ with\ Gaussian\ Processes.ipynb \
!jupyter nbconvert --to=slides --reveal-prefix= \
    --output=/vagrant/tmp/dataphilly /vagrant/tmp/dataphilly.ipynb
download: s3://jupyter/notebooks/talks/DataPhilly February 2016 Bayesian Optimization with Gaussian Processes.ipynb to ../../vagrant/tmp/dataphilly.ipynb
[NbConvertApp] Converting notebook /vagrant/tmp/dataphilly.ipynb to slides
[NbConvertApp] Writing 4638819 bytes to /vagrant/tmp/dataphilly.slides.html

The built slideshow is available online here.

In [1]:
%matplotlib inline
In [92]:
from __future__ import division

from collections import namedtuple
from warnings import filterwarnings

from IPython.display import display, Image
In [3]:
filterwarnings('ignore', module='matplotlib')
In [4]:
import GPy
from matplotlib import pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns
warning in stationary: failed to import cython module: falling back to numpy
In [5]:
blue, green, red, purple, gold, teal = sns.color_palette()
In [6]:
np.random.seed(5918927) # from

Bayesian Optimization with Gaussian Processes

February 18, 2016



Principal Data Scientist, Monetate

In [7]:
with open('/vagrant/tmp/dataphilly/ei.gif', 'rb') as gif_data:
    gif = Image(
In [8]:

The problem

Minimize a function that is difficult, expensive, or time-consuming to evaluate.

The idea

  • Represent our uncertainty about the true function with a probabilistic model
  • Choose the next evaluation point that maximizes the chances of finding the function's minimum

Gaussian processes

A flexible family of probability distributions over continuous functions

In [9]:
intro_plot_x = np.linspace(0, 1, 100)
intro_plot_X = intro_plot_x[:, np.newaxis]

intro_X = np.array([[0.5]])
intro_Y = np.array([[1]])
In [10]:
rbf_kernel = GPy.kern.RBF(1., lengthscale=0.25)
rbf_model = GPy.models.GPRegression(intro_X, intro_Y, kernel=rbf_kernel)

rbf_samples = rbf_model.posterior_samples_f(intro_plot_X, size=20)
 /usr/local/lib/python2.7/dist-packages/GPy/core/ RuntimeWarning:covariance is not positive-semidefinite.
In [11]:
fig, ax = plt.subplots(figsize=(8, 6))

ax.plot(intro_plot_x, rbf_samples, c=blue, alpha=0.75);
In [12]:

$f \sim GP(m, k)$ if for any $\mathbf{x} = (x_1, \ldots, x_n)$, $(f(x_1), \ldots, f(x_n))$ has a multivariate normal distribution

$$ (f(x_1), f(x_2), \ldots, f(x_n)) \sim N\left(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x})\right) $$

$$ m(\mathbf{x}) = \begin{pmatrix} m(x_1) \\ m(x_2) \\ \vdots \\ m(x_n) \end{pmatrix},\ k(\mathbf{x}, \mathbf{x}) = \begin{pmatrix} k(x_1, x_1) & k(x_1, x_2) & \cdots & k(x_1, x_n) \\ k(x_2, x_1) & k(x_2, x_2) & \cdots & k(x_2, x_n) \\ \vdots & \vdots & \ddots & \vdots \\ k(x_n, x_1) & k(x_n, x_2) & \cdots & k(x_n, x_n) \end{pmatrix} $$

If $f \sim GP(0, k)$, $\mathcal{D} = \{(x_1, y_1), \ldots, (x_n, y_n)\}$, where $y_i = f(x_i) + \varepsilon$, $\varepsilon \sim N(0, \sigma^2)$, $f\ |\ \mathcal{D}$ is also a Gaussian process with

$$ \begin{align*} f(\mathbf{x^*})\ |\ \mathcal{D} & \sim N(\tilde{m}(\mathbf{x^*}), \tilde{k}(\mathbf{x^*}, \mathbf{x^*})) \end{align*} $$

$$ \begin{align*} \tilde{m}(\mathbf{x^*}) & = k(\mathbf{x^*}, \mathbf{x}) \left(k(\mathbf{x}, \mathbf{x}) + \sigma^2 I\right)^{-1} \mathbf{y} \\ \tilde{k}(\mathbf{x^*}, \mathbf{x^*}) & = k(\mathbf{x^*}, \mathbf{x^*}) - k(\mathbf{x^*}, \mathbf{x}) \left(k(\mathbf{x}, \mathbf{x} + \sigma^2 I\right)^{-1} k(\mathbf{x}, \mathbf{x^*}) \end{align*} $$
In [13]:

rbf_post_mean, _ = rbf_model.predict(intro_plot_X)
rbf_post_low, rbf_post_high = rbf_model.predict_quantiles(intro_plot_X)

rbf_post_samples = rbf_model.posterior_samples_f(intro_plot_X, size=20)
WARNING: reconstraining parameters GP_regression.Gaussian_noise
In [14]:
fig, (l_ax, r_ax) = plt.subplots(ncols=2, sharex=True, sharey=True, figsize=(24, 9))

l_ax.plot(intro_plot_x, rbf_samples, c=blue, alpha=0.75);

                  color='gray', alpha=0.25);
r_ax.plot(intro_plot_x, rbf_post_samples, c=blue, alpha=0.75);
r_ax.scatter(intro_X, intro_Y, s=100, c='k', label='Observations');
r_ax.plot(intro_plot_x, rbf_post_mean, '--', c='k', label='GP mean');

r_ax.set_xlim(-0.01, 1.01);