In this notebook we give a simple demonstration of the nested variational compression approach to deep Gaussian processes. First we perform some setup and load in the code. The deep GPs are currently released as an augmentation of our GPy software. First of all we import that software and some files for running deep GPs.
import GPy
from coldeep import ColDeep
from coldeep import build_supervised
from layers import *
mpi not found
Next some plotting for the notebook.
import matplotlib
from matplotlib import pyplot as plt
matplotlib.rcParams['figure.figsize'] = (16,8)
%matplotlib inline
Gaussian process models have Gaussian distributed derivatives. This means that they tend to struggle with approximating step functions, which have derivates which are either zero or infinite. Duvenaud et al showed that as we increase the number of layers in the model the distribution over derivatives can become more heavy tailed. Let's examine this in practice by fitting the model to a step function.
np.random.seed(0)
n = 30 # number of data
d = 1 # number of dimensions
# dependent variable is linearly spaced.
X = np.linspace(0,1,n)[:,None]
# response variable is step function
Y = np.where(X>0.5, 1,0) + np.random.randn(n,1)*0.02
# where to plot the model predictions
Xtest = np.linspace(-1,2,500)[:,None]
Now we attempt to model the step function with a Gaussian process. Parameters are chosen by type-II maximum likelihood.
model0 = GPy.models.GPRegression(X,Y)
model0.optimize('bfgs', max_iters=1000, messages=1)
:0: FutureWarning:IPython widgets are experimental and may change in the future.
We can plot the regression to see if it has managed to fit the data.
_ = model0.plot()
and we note that the model is overly smooth and results in a variance that is too high.
Now we will consider a deep Gaussian process. Firstly, we'll set up some model parameters and a helper
We can't plot the direct output of the deep GP, so we'll use Monte Carlo sampling. To aid comparison, we first plot the Monte Carlo samples from the original GP.
For our first experiment we create a deep GP with one hidden layer. The model is easily constructed by creating different layer objects for the deep GP and then concatenating them.
model1 = build_supervised(X, Y, Qs=(1,), Ms=(15,15))
Now we optimize the model with the L-BFGS algorithm.
model1.optimize('bfgs', max_iters=1000, messages=1)
model1.plot(xlim=(-1, 2), Nsamples=3)
layers.py:115: RuntimeWarning:covariance is not positive-semidefinite.
Next we consider two hidden layers.
model2 = build_supervised(X, Y, Qs=(1,1), Ms=(15,15,15))
model2.optimize('bfgs', max_iters=1000, messages=1)
model2.plot((-1, 2),3)
Finally we consider three hidden layers.
model3 = build_supervised(X, Y, (2,2,2), (15, 15, 15, 15))
model3.optimize('bfgs', messages=1,max_iters=5000)
model3.plot((-1, 2),3)
We can also explore what's going on between layers by plotting each of the Gaussian processes. The plots show how the mapping function looks and how the inducing variables propagate through.
for layer in model2.layers:
layer.plot()
import pods
data = pods.datasets.robot_wireless()
data['X'].shape
Y = data['Y']
n = Y.shape[0]
t = np.linspace(0, n-1, n)[:, None]
Acquiring resource: robot_wireless Details of data: Data created by Brian Ferris and Dieter Fox. Consists of WiFi access point strengths taken during a circuit of the Paul Allen building at the University of Washington. Please cite: WiFi-SLAM using Gaussian Process Latent Variable Models by Brian Ferris, Dieter Fox and Neil Lawrence in IJCAI'07 Proceedings pages 2480-2485. Data used in A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models by Neil D. Lawrence, JMLR 13 pg 1609--1638, 2012. After downloading the data will take up 284390 bytes of space. Data will be stored in /home/james/ods_data_cache/robot_wireless. Do you wish to proceed with the download? [yes/no]
model = build_supervised(t, Y, (2,15), (40, 40, 40))
model.optimize(messages=True)