The following commands will build the slideshow:

In [91]:
!aws s3 cp s3://jupyter/notebooks/talks/DataPhilly\ February\ 2016\ Bayesian\ Optimization\ with\ Gaussian\ Processes.ipynb \
/vagrant/tmp/dataphilly.ipynb
!jupyter nbconvert --to=slides --reveal-prefix=https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.2.0/ \
--output=/vagrant/tmp/dataphilly /vagrant/tmp/dataphilly.ipynb

download: s3://jupyter/notebooks/talks/DataPhilly February 2016 Bayesian Optimization with Gaussian Processes.ipynb to ../../vagrant/tmp/dataphilly.ipynb
[NbConvertApp] Converting notebook /vagrant/tmp/dataphilly.ipynb to slides
[NbConvertApp] Writing 4638819 bytes to /vagrant/tmp/dataphilly.slides.html


The built slideshow is available online here.

In [1]:
%matplotlib inline

In [92]:
from __future__ import division

from collections import namedtuple
from warnings import filterwarnings

from IPython.display import display, Image

In [3]:
filterwarnings('ignore', module='matplotlib')

In [4]:
import GPy
from matplotlib import pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns

warning in stationary: failed to import cython module: falling back to numpy

In [5]:
blue, green, red, purple, gold, teal = sns.color_palette()

In [6]:
np.random.seed(5918927) # from random.org


# Bayesian Optimization with Gaussian Processes¶

## DataPhilly¶

### Principal Data Scientist, Monetate¶

In [7]:
with open('/vagrant/tmp/dataphilly/ei.gif', 'rb') as gif_data:

In [8]:
display(gif)


## The problem¶

Minimize a function that is difficult, expensive, or time-consuming to evaluate.

### The idea¶

• Represent our uncertainty about the true function with a probabilistic model
• Choose the next evaluation point that maximizes the chances of finding the function's minimum

## Gaussian processes¶

A flexible family of probability distributions over continuous functions

In [9]:
intro_plot_x = np.linspace(0, 1, 100)
intro_plot_X = intro_plot_x[:, np.newaxis]

intro_X = np.array([[0.5]])
intro_Y = np.array([[1]])

In [10]:
rbf_kernel = GPy.kern.RBF(1., lengthscale=0.25)
rbf_model = GPy.models.GPRegression(intro_X, intro_Y, kernel=rbf_kernel)

rbf_samples = rbf_model.posterior_samples_f(intro_plot_X, size=20)

 /usr/local/lib/python2.7/dist-packages/GPy/core/gp.py:456: RuntimeWarning:covariance is not positive-semidefinite.

In [11]:
fig, ax = plt.subplots(figsize=(8, 6))

ax.plot(intro_plot_x, rbf_samples, c=blue, alpha=0.75);

In [12]:
fig

Out[12]:

$f \sim GP(m, k)$ if for any $\mathbf{x} = (x_1, \ldots, x_n)$, $(f(x_1), \ldots, f(x_n))$ has a multivariate normal distribution

$$(f(x_1), f(x_2), \ldots, f(x_n)) \sim N\left(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x})\right)$$

$$m(\mathbf{x}) = \begin{pmatrix} m(x_1) \\ m(x_2) \\ \vdots \\ m(x_n) \end{pmatrix},\ k(\mathbf{x}, \mathbf{x}) = \begin{pmatrix} k(x_1, x_1) & k(x_1, x_2) & \cdots & k(x_1, x_n) \\ k(x_2, x_1) & k(x_2, x_2) & \cdots & k(x_2, x_n) \\ \vdots & \vdots & \ddots & \vdots \\ k(x_n, x_1) & k(x_n, x_2) & \cdots & k(x_n, x_n) \end{pmatrix}$$

If $f \sim GP(0, k)$, $\mathcal{D} = \{(x_1, y_1), \ldots, (x_n, y_n)\}$, where $y_i = f(x_i) + \varepsilon$, $\varepsilon \sim N(0, \sigma^2)$, $f\ |\ \mathcal{D}$ is also a Gaussian process with

\begin{align*} f(\mathbf{x^*})\ |\ \mathcal{D} & \sim N(\tilde{m}(\mathbf{x^*}), \tilde{k}(\mathbf{x^*}, \mathbf{x^*})) \end{align*}

\begin{align*} \tilde{m}(\mathbf{x^*}) & = k(\mathbf{x^*}, \mathbf{x}) \left(k(\mathbf{x}, \mathbf{x}) + \sigma^2 I\right)^{-1} \mathbf{y} \\ \tilde{k}(\mathbf{x^*}, \mathbf{x^*}) & = k(\mathbf{x^*}, \mathbf{x^*}) - k(\mathbf{x^*}, \mathbf{x}) \left(k(\mathbf{x}, \mathbf{x} + \sigma^2 I\right)^{-1} k(\mathbf{x}, \mathbf{x^*}) \end{align*}
In [13]:
rbf_model.Gaussian_noise.constrain_fixed(0.)
rbf_model.optimize()

rbf_post_mean, _ = rbf_model.predict(intro_plot_X)
rbf_post_low, rbf_post_high = rbf_model.predict_quantiles(intro_plot_X)

rbf_post_samples = rbf_model.posterior_samples_f(intro_plot_X, size=20)

WARNING: reconstraining parameters GP_regression.Gaussian_noise

In [14]:
fig, (l_ax, r_ax) = plt.subplots(ncols=2, sharex=True, sharey=True, figsize=(24, 9))

l_ax.plot(intro_plot_x, rbf_samples, c=blue, alpha=0.75);

r_ax.fill_between(intro_plot_x,
rbf_post_low.ravel(),
rbf_post_high.ravel(),
color='gray', alpha=0.25);
r_ax.plot(intro_plot_x, rbf_post_samples, c=blue, alpha=0.75);
r_ax.scatter(intro_X, intro_Y, s=100, c='k', label='Observations');
r_ax.plot(intro_plot_x, rbf_post_mean, '--', c='k', label='GP mean');

r_ax.set_xlim(-0.01, 1.01);
r_ax.legend();