$C$ $O_2$ Mauna Loa Gaussian Process Regression¶

The goal of this notebook is to replicate the results from chapter 5, p.118, of Gaussian Processes for Machine Learning.

I did not find a way on how to make GPflow perform this additive decomposition and therefore I decided to implement it by hand. In this set-up GPflow is used to find the correct parameters and the hand writte code is used to decompose the addtitive parts.

There is currently work under-way to add this functionality to GPflow. For more details have a look at Predicting component contributions #234.

In [1]:

import warnings
from IPython.display import display_html

In [2]:

%load_ext watermark
%watermark -a 'Christian Schuhegger' -u -d -v -p numpy,pandas,matplotlib,seaborn,gpflow,GPy,bokeh,statsmodels

/home/user/cs/local/install/Anaconda3-5.1.0-Linux-x86_64/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

Christian Schuhegger 
last updated: 2018-05-05 

CPython 3.6.4
IPython 6.2.1

numpy 1.14.2
pandas 0.22.0
matplotlib 2.2.2
seaborn 0.8.1
gpflow 1.1.0
GPy 1.9.2
bokeh 0.12.15
statsmodels 0.8.0

In [3]:

%matplotlib inline
import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns, gpflow

Let's use the bokeh library to make the chart interactive and so that you can zoom in. The hints on how to use bokeh on these examples came from Looking at the Keeling Curve with GPs in PyMC3.

In [4]:

import bokeh.plotting, bokeh.models, bokeh.io, bokeh.palettes
bokeh.io.output_notebook()

BokehJS 0.12.15 successfully loaded.

We take the raw data from the statsmodels package:

In [5]:

# The data used to be available at http://cdiac.esd.ornl.gov/ftp/trends/co2/maunaloa.co2, 
# but now it seems that more up-to-date data can be found at 
# ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt

import statsmodels.api as sm

mauna_loa_atmospheric_CO2_concentration_data = sm.datasets.get_rdataset("CO2")

print(mauna_loa_atmospheric_CO2_concentration_data.__doc__)

data1 = mauna_loa_atmospheric_CO2_concentration_data.data
data1.head()

 /home/user/cs/local/install/Anaconda3-5.1.0-Linux-x86_64/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning:The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.

+-----+-----------------+
| co2 | R Documentation |
+-----+-----------------+

Mauna Loa Atmospheric CO2 Concentration
---------------------------------------

Description
~~~~~~~~~~~

Atmospheric concentrations of CO\ *2* are expressed in parts per million
(ppm) and reported in the preliminary 1997 SIO manometric mole fraction
scale.

Usage
~~~~~

::

    co2

Format
~~~~~~

A time series of 468 observations; monthly from 1959 to 1997.

Details
~~~~~~~

The values for February, March and April of 1964 were missing and have
been obtained by interpolating linearly between the values for January
and May of 1964.

Source
~~~~~~

Keeling, C. D. and Whorf, T. P., Scripps Institution of Oceanography
(SIO), University of California, La Jolla, California USA 92093-0220.

ftp://cdiac.esd.ornl.gov/pub/maunaloa-co2/maunaloa.co2.

References
~~~~~~~~~~

Cleveland, W. S. (1993) *Visualizing Data*. New Jersey: Summit Press.

Examples
~~~~~~~~

::

    require(graphics)
    plot(co2, ylab = expression("Atmospheric concentration of CO"[2]),
         las = 1)
    title(main = "co2 data set")

Out[5]:

	time	value
0	1959.000000	315.42
1	1959.083333	316.31
2	1959.166667	316.50
3	1959.250000	317.56
4	1959.333333	318.13

We will make the data available in two formats, as row vectors as lower case x and y and as column vectors as upper case X and Y.

In [6]:

x = np.array(data1.time)
y = np.array(data1.value)
X = x.reshape(-1, 1)
Y = y.reshape(-1, 1)

In [7]:

p = bokeh.plotting.figure(title="Fit to the Mauna Loa Data", #x_axis_type='datetime', 
                          plot_width=900, plot_height=600)
p.yaxis.axis_label = 'CO2 [ppm]'
p.xaxis.axis_label = 'Date'

# true value
p.circle(x, y, color="black", legend="Observed data")
p.legend.location = "top_left"
bokeh.plotting.show(p)

Please use the GUI elements on the right top of the graph for moving and zooming into the graph.

As GPflow performs some optimization procedure to find the best fit parameters we will set-up the different aspects of the kernel functions with good starting points taken from chapter 5 of Gaussian Processes for Machine Learning. In principle you could use arbitrary starting point values and it is the job of GPflow to find the best fit parameters.

As in chapter 5 of Gaussian Processes for Machine Learning we create the total fit as an additive composition of different aspects.

$k_1$ models a long term trend.
$k_2$ models the seasonal variation due to the vegetation changes due to the seasons in the northern hemisphere.
$k_3$ I will not include, as GPflow does not provide out of the box a rational quadratic kernel function. But we will see that this does not hurt us too much.
$k_4$ finally is the short term variations plus some "white noise".

To be honest, I am not sure why $k_4$ includes the white noise at all, because in principle the gaussian process provides this white noise by default via the $\sigma_n$ parameter (see Gaussian Process Parameter Effects for details), which is called m.likelihood.variance in GPflow.

In [8]:

k1_ = gpflow.kernels.RBF(1, variance=(66.0 ** 2), lengthscales=67.0)
k2_exp_sine_squred_gamma= 2.0 / 1.3 ** 2.0
k2_exp_sine_squred_period = 1.0
k2_ = gpflow.kernels.RBF(1, variance=(2.4 ** 2.0), lengthscales=90.0) * gpflow.kernels.Periodic(1, period=k2_exp_sine_squred_period, variance=1.0,  lengthscales=1.0/k2_exp_sine_squred_gamma)
# k3 = how to do a rational quadratic term in GPflow?
k4_ = gpflow.kernels.RBF(1, variance=(0.18 ** 2), lengthscales=1.6) + gpflow.kernels.White(1, variance=0.19)
k1 = k1_
k2 = k2_
k4 = k4_
kernel = k1 + k2 + k4
k_ = kernel

In [9]:

m = gpflow.models.GPR(X, Y, kern=kernel)
m.likelihood.variance = 0.01
m.as_pandas_table()

 /home/user/cs/local/install/Anaconda3-5.1.0-Linux-x86_64/lib/python3.6/site-packages/gpflow/densities.py:89: UserWarning:Shape of x must be 2D at computation.

Out[9]:

	class	prior	transform	trainable	shape	fixed_shape	value
GPR/kern/rbf_1/variance	Parameter	None	+ve	True	()	True	4356.0
GPR/kern/rbf_1/lengthscales	Parameter	None	+ve	True	()	True	67.0
GPR/kern/product/rbf/variance	Parameter	None	+ve	True	()	True	5.76
GPR/kern/product/rbf/lengthscales	Parameter	None	+ve	True	()	True	90.0
GPR/kern/product/periodic/variance	Parameter	None	+ve	True	()	True	1.0
GPR/kern/product/periodic/lengthscales	Parameter	None	+ve	True	()	True	0.8450000000000001
GPR/kern/product/periodic/period	Parameter	None	+ve	True	()	True	1.0
GPR/kern/rbf_2/variance	Parameter	None	+ve	True	()	True	0.0324
GPR/kern/rbf_2/lengthscales	Parameter	None	+ve	True	()	True	1.6
GPR/kern/white/variance	Parameter	None	+ve	True	()	True	0.19
GPR/likelihood/variance	Parameter	None	+ve	True	()	True	0.01

In [10]:

%%time
opt = gpflow.train.ScipyOptimizer()
opt.minimize(m)

INFO:tensorflow:Optimization terminated with:
  Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
  Objective function value: 129.032561
  Number of iterations: 91
  Number of functions evaluations: 133
CPU times: user 3.69 s, sys: 447 ms, total: 4.14 s
Wall time: 3.08 s

In [11]:

m.as_pandas_table()

Out[11]:

	class	prior	transform	trainable	shape	fixed_shape	value
GPR/kern/rbf_1/variance	Parameter	None	+ve	True	()	True	4356.219071780713
GPR/kern/rbf_1/lengthscales	Parameter	None	+ve	True	()	True	48.565715481551585
GPR/kern/product/rbf/variance	Parameter	None	+ve	True	()	True	5.64144286809577
GPR/kern/product/rbf/lengthscales	Parameter	None	+ve	True	()	True	91.88714901784704
GPR/kern/product/periodic/variance	Parameter	None	+ve	True	()	True	1.026393528093981
GPR/kern/product/periodic/lengthscales	Parameter	None	+ve	True	()	True	0.6709642212683404
GPR/kern/product/periodic/period	Parameter	None	+ve	True	()	True	0.9997023123140168
GPR/kern/rbf_2/variance	Parameter	None	+ve	True	()	True	0.16043370929892875
GPR/kern/rbf_2/lengthscales	Parameter	None	+ve	True	()	True	0.32753755974365617
GPR/kern/white/variance	Parameter	None	+ve	True	()	True	0.03193821948646309
GPR/likelihood/variance	Parameter	None	+ve	True	()	True	0.010237414042966209

Let's create a visualization of the fitted overall model. I will only plot the fit between 1990 and 2000, to on the one hand show how nicely the regression curve fits the existing data points and to show how gaussian processes can be used to predict the near term future behaviour together with some confidence interval.

In [12]:

xx = np.linspace(1990, 2000, 1000).reshape(-1,1)

In [13]:

def plot(m, with_data=True):
    if not isinstance(m, list):
        m = [m]
    predictions = [model.predict_y(xx) for model in m]
    
    p = bokeh.plotting.figure(title="Fit to the Mauna Loa Data", #x_axis_type='datetime', 
                          plot_width=900, plot_height=600)
    p.yaxis.axis_label = 'CO2 [ppm]'
    p.xaxis.axis_label = 'Date'
   
    if with_data:
        # true value
        p.circle(x, y, color="black", legend="Observed data")
    for mean, var in predictions:
        xpts = xx[:,0]
        p.line(xpts, mean[:,0],  line_width=1)
        lowerband = mean[:,0] - 2*np.sqrt(var[:,0])
        upperband = mean[:,0] + 2*np.sqrt(var[:,0])
        band_x = np.append(xpts, xpts[::-1])
        band_y = np.append(lowerband, upperband[::-1])
        p.patch(band_x, band_y, color='#7570B3', fill_alpha=0.2)
    p.legend.location = "top_left"
    bokeh.plotting.show(p)

As you can see in the below figure the complete model matches nicely.

In [14]:

plot(m)

Now let's split the model into components and create model $m_1$ (blue) with only the long term trend and model $m_2$ (green) with only the product term of p.120:

$\begin{eqnarray} k_2(x,x')&=&\theta_3^2\cdot\hbox{exp}\left(-\frac{(x-x')^2}{2\theta_4^2}-\frac{2\sin^2(\pi(x-x')/1.0)}{\theta_5^2}\right)\\ &=&\left(\theta_3^2\cdot\hbox{exp}\left(-\frac{(x-x')^2}{2\theta_4^2}\right)\right)\cdot\left(1.0\cdot\hbox{exp}\left(-\frac{2\sin^2(\pi(x-x')/1.0)}{\theta_5^2}\right)\right) \end{eqnarray}$

Where $\theta_3$ is the variance term in the RBF kernel and therefore the variance term in the PeriodicKernel kernel will be $1.0$ .

How not to do it¶

It turns out that the simplistic idea on how to decompose the additive components of the model by simply creating partial models is not correct. This would have looked like this:

k1 = GPflow.kernels.RBF(1, variance=m.kern.rbf_1.variance.value, lengthscales=m.kern.rbf_1.lengthscales.value)
m1 = GPflow.gpr.GPR(X, Y, kern=k1)
m1.likelihood.variance = m.likelihood.variance.value
m1.fixed = True

k2 = GPflow.kernels.RBF(1, variance=m.kern.prod.rbf.variance.value, lengthscales=m.kern.prod.rbf.lengthscales.value) \
    * \
    GPflow.kernels.PeriodicKernel(1, period=m.kern.prod.periodickernel.period.value, 
                                  variance=m.kern.prod.periodickernel.variance.value, 
                                  lengthscales=m.kern.prod.periodickernel.lengthscales.value)
m2 = GPflow.gpr.GPR(X, Y, kern=k2)
m2.likelihood.variance = m.likelihood.variance.value
m2.fixed = True

k1 = GPflow.kernels.RBF(1, variance=m.kern.rbf_1.variance.value, lengthscales=m.kern.rbf_1.lengthscales.value)
k2 = GPflow.kernels.RBF(1, variance=m.kern.prod.rbf.variance.value, lengthscales=m.kern.prod.rbf.lengthscales.value) \
    * \
    GPflow.kernels.PeriodicKernel(1, period=m.kern.prod.periodickernel.period.value, 
                                  variance=m.kern.prod.periodickernel.variance.value, 
                                  lengthscales=m.kern.prod.periodickernel.lengthscales.value)
m3 = GPflow.gpr.GPR(X, Y, kern=(k1+k2))
m3.likelihood.variance = m.likelihood.variance.value
m3.fixed = True

How to do it¶

Have a look at The Kernel Cookbook and scroll down to "Additive decomposition" or look at equation 1.17 at the kernels chapter of David Duvenaud's Phd thesis.

There it says (I leave out the $\mu$ variables as we set the mean function to be 0):

$f_1(X^*)|f_1(X) + f_2(X) \sim \mathcal{N}\left({K_1^*}^T(K_1+K_2)^{-1}[f_1(X) + f_2(X)], ...\right)$ but our simplistic approach from above would have calculated the following:

$f_1(X^*)|f_1(X) + f_2(X) \sim \mathcal{N}\left({K_1^*}^TK_1^{-1}[f_1(X) + f_2(X)], ...\right)$

I did not find a way on how to make GPflow perform this additive decomposition and therefore I decided to implement it by hand. In this set-up GPflow is used to find the correct parameters and the hand writte code is used to decompose the addtitive parts.

In [15]:

import functools

# http://www.cs.toronto.edu/~duvenaud/cookbook/index.html

class GPCovarianceFunctionWhiteNoise:
    def __init__(self, sigma):
        self.sigma = float(sigma)

    def covariance(self, x1, x2):
        lr = np.zeros(shape=(len(x1), len(x2)))
        np.fill_diagonal(lr, self.sigma ** 2)
        return lr

class GPCovarianceFunctionSquaredExponential:
    def __init__(self, l, sigma):
        self.l = float(l)
        self.sigma = float(sigma)

    def covariance(self, x1, x2):
        return (self.sigma**2) * np.exp( -0.5 * (np.subtract.outer(x1, x2)**2) / (self.l ** 2))

class GPCovarianceFunctionExpSine2Kernel:
    def __init__(self, l, period, sigma):
        self.l = float(l)
        self.period = float(period)
        self.sigma = float(sigma)

    def covariance(self, x1, x2):
        return (self.sigma**2) * np.exp( -2.0 * np.power(np.sin((np.pi / self.period) * np.abs(np.subtract.outer(x1, x2))),
                                                         2) / (self.l ** 2))

class GPCovarianceFunctionSum:
    def __init__(self, k1, k2):
        self.k1 = k1
        self.k2 = k2

    def covariance(self, x1, x2):
        s1 = self.k1.covariance(x1, x2)
        s2 = self.k2.covariance(x1, x2)
        s = s1 + s2
        return  s

class GPCovarianceFunctionProduct:
    def __init__(self, k1, k2):
        self.k1 = k1
        self.k2 = k2

    def covariance(self, x1, x2):
        p1 = self.k1.covariance(x1, x2)
        p2 = self.k2.covariance(x1, x2)
        p = p1 * p2
        return  p

def conditional1(x_new, x, y, cov, sigma_n=0):
    if not isinstance(cov, list):
        cov = [cov]

    if len(cov) < 2:
        total_covariance_function = cov[0]
    else:
        total_covariance_function = functools.reduce(lambda a, x: GPCovarianceFunctionSum(a, x), cov)

    A = total_covariance_function.covariance(x_new, x_new)
    C = total_covariance_function.covariance(x_new, x)
    B = total_covariance_function.covariance(x, x) + np.power(sigma_n,2)*np.diag(np.ones(len(x)))

    mu = [np.linalg.inv(B).dot(C.T).T.dot(y).squeeze()]
    sigma = [(A - C.dot(np.linalg.inv(B).dot(C.T))).squeeze()]

    for i in range(0, len(cov)):
        partial_covariance_function = cov[i]
        C_ = partial_covariance_function.covariance(x_new, x)
        mu_ = np.linalg.inv(B).dot(C_.T).T.dot(y).squeeze()
        mu.append(mu_)

    return (mu, sigma)

def predict1(x_new, x, y, cov, sigma_n=0):
    l_y_pred, l_sigmas = conditional1(x_new, x, y, cov=cov, sigma_n=sigma_n)
    if len(l_sigmas[0].shape) > 1:
        return l_y_pred, [np.diagonal(ls) for ls in l_sigmas]
    else:
        return l_y_pred, l_sigmas

In [16]:

m.kern.product.rbf.lengthscales.value

Out[16]:

array(91.88714902)

In [17]:

xx = np.linspace(1990, 2000, 1000)
k1 = GPCovarianceFunctionSquaredExponential(l=m.kern.rbf_1.lengthscales.value, 
                                            sigma=np.sqrt(m.kern.rbf_1.variance.value))
k2 = GPCovarianceFunctionProduct(
    GPCovarianceFunctionSquaredExponential(l=m.kern.product.rbf.lengthscales.value,
                                           sigma=np.sqrt(m.kern.product.rbf.variance.value)),
    GPCovarianceFunctionExpSine2Kernel(l=m.kern.product.periodic.lengthscales.value, 
                                       period=m.kern.product.periodic.period.value, 
                                       sigma=np.sqrt(m.kern.product.periodic.variance.value)))

k4 = GPCovarianceFunctionSum(GPCovarianceFunctionSquaredExponential(l=m.kern.rbf_2.lengthscales.value, 
                                                                    sigma=np.sqrt(m.kern.rbf_2.variance.value)),
                             GPCovarianceFunctionWhiteNoise(sigma=np.sqrt(m.kern.white.variance.value)))

k = GPCovarianceFunctionSum(k1, GPCovarianceFunctionSum(k2, k4))

y_pred, sigmas = predict1(xx, x, y, cov=[k1, k2, k4], sigma_n=np.sqrt(m.likelihood.variance.value))

And if we then plot $m_1$ (blue) and $m_1 + m_2$ (green) and the combined/full model $m$ (red) in the same graph we get the picture below.

In [18]:

p = bokeh.plotting.figure(title="Fit to the Mauna Loa Data", #x_axis_type='datetime', 
                          plot_width=900, plot_height=600)
p.yaxis.axis_label = 'CO2 [ppm]'
p.xaxis.axis_label = 'Date'

# total fit
p.line(xx, y_pred[0],  line_width=1, line_color="firebrick", legend="Total fit")

# trend
p.line(xx, y_pred[1], line_width=1, line_color="blue", legend="Long term trend")

# trend + seasonal variation
p.line(xx, y_pred[1] + y_pred[2], line_width=1, line_color="green", legend="Long term trend + seasonal variation")

# true value
p.circle(x, y, color="black", legend="Observed data")
p.legend.location = "top_left"
bokeh.plotting.show(p)

Please pay attention to the very wiggly red line, which is the full model fit. If you scroll up a bit and have a look at the fit of the full model produced by GPflow itself (in the plot function of this notebook via the GPflow predict_y() method) you will not see this wiggly pattern. Let's plot the $m_1+m_2$ (green), the $m$ from the hand created predictions (red) and the $m$ as generated by GPflow (blue) in one graph to see the differences.

In [19]:

gpflow_mean, _ = m.predict_y(xx.reshape(-1,1))

p = bokeh.plotting.figure(title="Fit to the Mauna Loa Data", #x_axis_type='datetime', 
                          plot_width=900, plot_height=600)
p.yaxis.axis_label = 'CO2 [ppm]'
p.xaxis.axis_label = 'Date'

# total fit
p.line(xx, y_pred[0],  line_width=1, line_color="firebrick", legend="Total fit by hand")

# total fit as produced by GPflow
p.line(xx, gpflow_mean[:,0], line_width=1, line_color="blue", legend="Total fit by GPflow")

# trend + seasonal variation
p.line(xx, y_pred[1] + y_pred[2], line_width=1, line_color="green", legend="Long term trend + seasonal variation")

p.legend.location = "top_left"
bokeh.plotting.show(p)

As you can see the difference is not big, but I am not entirely sure where this is coming from. Perhaps GPflow ignores white noise in its predictions (which actually makes sense)!

If we then want to look at the seasonal variation (k2) and the short term variation (k4) we can do that as follows:

In [20]:

p = bokeh.plotting.figure(title="Fit to the Mauna Loa Data", #x_axis_type='datetime', 
                          plot_width=900, plot_height=600)
p.yaxis.axis_label = 'CO2 [ppm]'
p.xaxis.axis_label = 'Date'

# total fit
p.line(xx, y_pred[2],  line_width=1, line_color="firebrick", legend="Seasonal variation")

# trend
p.line(xx, y_pred[3], line_width=1, line_color="blue", legend="Short term variation")

# true value
p.legend.location = "top_left"
bokeh.plotting.show(p)

If we want to see how the seasonal variation changed over the years we can do that as follows:

In [21]:

# plot several years

def dates_to_idx(timelist):
    reference_time = pd.to_datetime('1959-01-01')
    t = (timelist - reference_time) / pd.Timedelta(1, "Y")
    return np.asarray(t)+1959

p = bokeh.plotting.figure(title="Several years of the seasonal component",
           plot_width=900, plot_height=600)
p.yaxis.axis_label = 'Δ CO2 [ppm]'
p.xaxis.axis_label = 'Month'

colors = bokeh.palettes.brewer['Paired'][5]
years = ["1960", "1970", "1980", "1990", "2000"]

In [22]:

# xx = np.linspace(1990, 2000, 1000)
# 
for i, year in enumerate(years):
    dates = pd.date_range(start="1/1/"+year, end="12/31/"+year, freq="10D")
    tnew = dates_to_idx(dates)

    print("Predicting year", year)
    y_pred_, sigmas_ = predict1(tnew, x, y, cov=[k1, k2, k4], sigma_n=np.sqrt(m.likelihood.variance.value))

    # plot mean
    t = np.asarray((dates - dates[0])/pd.Timedelta(1, "M")) + 1
    p.line(t, y_pred_[2], line_width=1, line_color=colors[i], legend=year)

p.legend.location = "bottom_left"
bokeh.plotting.show(p)

Predicting year 1960
Predicting year 1970
Predicting year 1980
Predicting year 1990
Predicting year 2000

Alternative implementation 2¶

Here is an alternative implementation of the decomposition based on a comment in Example that shows how to separate additive effects, e.g. time series decomposition of birthdays data

In [23]:

XX = xx.reshape(-1, 1)

In [24]:

_x_new = XX
_x     = X
_y     = Y
_A = k_.compute_K_symm(_x_new)
_C = k_.compute_K(_x_new, _x)
_B = k_.compute_K_symm(_x) + m.likelihood.variance.value*np.diag(np.ones(len(_x)))

_mu1 = np.linalg.inv(_B).dot(_C.T).T.dot(_y).squeeze()
_tmp1 = np.linalg.inv(_B).dot(_C.T)
_tmp2 = np.linalg.solve(_B, _C.T) 
'{:.10f}'.format(np.abs(_tmp1 - _tmp2).max())

Out[24]:

'0.0000003427'

In [25]:

_mu2 = np.linalg.solve(_B,_C.T).T.dot(_y).squeeze()
'{:.10f}'.format(np.abs(_mu1 - _mu2).max())

Out[25]:

'0.0000001639'

In [26]:

_mu3 = _C.dot(np.linalg.inv(_B)).dot(_y).squeeze()
'{:.10f}'.format(np.abs(_mu1 - _mu3).max())

Out[26]:

'0.0000001528'

In [27]:

_mu4 = _C.dot(np.linalg.solve(_B, _y)).squeeze()
'{:.10f}'.format(np.abs(_mu1 - _mu4).max())

Out[27]:

'0.0000001638'

In [28]:

def conditional2(x_new, x, y, cov, sigma_n=0):
    if not isinstance(cov, list):
        cov = [cov]

    if len(cov) < 2:
        total_covariance_function = cov[0]
    else:
        # total_covariance_function = None
        total_covariance_function = functools.reduce(lambda a, x: a + x, cov)

    A = total_covariance_function.compute_K_symm(x_new)
    C = total_covariance_function.compute_K(x_new, x)
    B = total_covariance_function.compute_K_symm(x) + np.power(sigma_n,2)*np.diag(np.ones(len(x)))

    mu = [np.linalg.inv(B).dot(C.T).T.dot(y).squeeze()]
    sigma = [(A - C.dot(np.linalg.inv(B).dot(C.T))).squeeze()]

    for i in range(0, len(cov)):
        partial_covariance_function = cov[i]
        C_ = partial_covariance_function.compute_K(x_new, x)
        mu_ = np.linalg.inv(B).dot(C_.T).T.dot(y).squeeze()
        mu.append(mu_)

    return (mu, sigma)

def predict2(x_new, x, y, cov, sigma_n=0):
    l_y_pred, l_sigmas = conditional2(x_new, x, y, cov=cov, sigma_n=sigma_n)
    if len(l_sigmas[0].shape) > 1:
        return l_y_pred, [np.diagonal(ls) for ls in l_sigmas]
    else:
        return l_y_pred, l_sigmas

In [29]:

y_pred2, sigmas2 = predict2(XX, X, Y, cov=[k1_, k2_, k4_], sigma_n=np.sqrt(m.likelihood.variance.value))

In [30]:

len(y_pred[0])

Out[30]:

In [31]:

np.allclose(y_pred[0], y_pred2[0])

Out[31]:

False

In [32]:

df = pd.DataFrame(np.array([y_pred[0],y_pred2[0]]).T, columns=['np','tf'])
df['delta'] = np.abs(df['np'] - df['tf'])
df['%'] = df['delta'] / df['np'] * 100.0
with pd.option_context('display.max_rows', None, 'display.max_columns', 3):    
    display_html(df.head().to_html(),raw=True)

	np	tf	delta	%
0	353.433528	353.491490	0.057962	0.016400
1	353.790253	353.593367	0.196886	0.055650
2	353.594593	353.691103	0.096510	0.027294
3	353.719705	353.784908	0.065203	0.018434
4	353.803838	353.875188	0.071350	0.020166

In [33]:

df[['delta','%']].max()

Out[33]:

delta    0.413584
%        0.115987
dtype: float64

Alternative implementation 3¶

Detecting periodicities with Gaussian processes
- download (see figure3.ipynb)

In [34]:

import GPy

In [35]:

k1__  = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.rbf_1.variance.value)           , lengthscale=float(m.kern.rbf_1.lengthscales.value))
k2__1 = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.product.rbf.variance.value)     , lengthscale=float(m.kern.product.rbf.lengthscales.value))
k2__2 = GPy.kern.StdPeriodic(input_dim=1, variance=float(m.kern.product.periodic.variance.value), lengthscale=float(m.kern.product.periodic.lengthscales.value),period=float(m.kern.product.periodic.period.value))
k2__  = k2__1 * k2__2
k4__1 = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.rbf_2.variance.value)           , lengthscale=float(m.kern.rbf_2.lengthscales.value))
k4__2 = GPy.kern.White      (input_dim=1, variance=float(m.kern.white.variance.value))
k4__  = k4__1 + k4__2
k__   = k1__ + k2__ + k4__
m__   = GPy.models.GPRegression(X,Y,k__)
m__.Gaussian_noise.variance[0] = float(m.likelihood.variance.value)
m__

 /home/user/cs/local/install/Anaconda3-5.1.0-Linux-x86_64/lib/python3.6/site-packages/paramz/transformations.py:111: RuntimeWarning:overflow encountered in expm1

Out[35]:

Model: GP regression
Objective: 129.03256335476846
Number of Parameters: 11
Number of Optimization Parameters: 11
Updates: True

GP_regression.	value	constraints
sum.rbf.variance	4356.219071780713	+ve
sum.rbf.lengthscale	48.565715481551585	+ve
sum.mul.rbf.variance	5.64144286809577	+ve
sum.mul.rbf.lengthscale	91.88714901784704	+ve
sum.mul.std_periodic.variance	1.026393528093981	+ve
sum.mul.std_periodic.period	0.9997023123140168	+ve
sum.mul.std_periodic.lengthscale	0.6709642212683404	+ve
sum.rbf_1.variance	0.16043370929892875	+ve
sum.rbf_1.lengthscale	0.32753755974365617	+ve
sum.white.variance	0.03193821948646309	+ve
Gaussian_noise.variance	0.010237414042966209	+ve

In [36]:

def conditional3(x_new, x, y, cov, sigma_n=0):
    if not isinstance(cov, list):
        cov = [cov]

    if len(cov) < 2:
        total_covariance_function = cov[0]
    else:
        # total_covariance_function = None
        total_covariance_function = functools.reduce(lambda a, x: a + x, cov)

    A = total_covariance_function.K(x_new) # total_covariance_function.K(x_new,x_new)
    C = total_covariance_function.K(x_new, x)
    B = total_covariance_function.K(x) + np.power(sigma_n,2)*np.diag(np.ones(len(x))) # total_covariance_function.K(x,x)

    mu = [np.linalg.inv(B).dot(C.T).T.dot(y).squeeze()]
    sigma = [(A - C.dot(np.linalg.inv(B).dot(C.T))).squeeze()]

    for i in range(0, len(cov)):
        partial_covariance_function = cov[i]
        C_ = partial_covariance_function.K(x_new, x)
        mu_ = np.linalg.inv(B).dot(C_.T).T.dot(y).squeeze()
        mu.append(mu_)
        
        #         kxX = C_
        #         K_1 = np.linalg.inv(B)
        #         mean = C_.dot(np.linalg.inv(B)).dot(Y)


    return (mu, sigma)

def predict3(x_new, x, y, cov, sigma_n=0):
    l_y_pred, l_sigmas = conditional3(x_new, x, y, cov=cov, sigma_n=sigma_n)
    if len(l_sigmas[0].shape) > 1:
        return l_y_pred, [np.diagonal(ls) for ls in l_sigmas]
    else:
        return l_y_pred, l_sigmas

In [37]:

y_pred3, sigmas3 = predict3(XX, X, Y, cov=[k1__, k2__, k4__], sigma_n=np.sqrt(m__.Gaussian_noise.variance[0]))

 /home/user/cs/local/install/Anaconda3-5.1.0-Linux-x86_64/lib/python3.6/site-packages/paramz/transformations.py:111: RuntimeWarning:overflow encountered in expm1

In [38]:

np.allclose(y_pred[0], y_pred3[0])

Out[38]:

False

In [39]:

df = pd.DataFrame(np.array([y_pred[0],y_pred2[0],y_pred3[0]]).T, columns=['np','gpflow','gpy'])
df['delta_gpflow'] = np.abs(df['np'] - df['gpflow'])
df['%_gpflow'] = df['delta_gpflow'] / df['np'] * 100.0
df['delta_gpy'] = np.abs(df['np'] - df['gpy'])
df['%_gpy'] = df['delta_gpy'] / df['np'] * 100.0
df['delta_gpflow_gpy'] = np.abs(df['gpflow'] - df['gpy'])
df['%_gpflow_gpy'] = df['delta_gpflow_gpy'] / df['gpflow'] * 100.0
df.head()

Out[39]:

	np	gpflow	gpy	delta_gpflow	%_gpflow	delta_gpy	%_gpy	delta_gpflow_gpy	%_gpflow_gpy
0	353.433528	353.491490	353.491490	0.057962	0.016400	0.057962	0.016400	7.407044e-08	2.095395e-08
1	353.790253	353.593367	353.593367	0.196886	0.055650	0.196886	0.055650	2.133197e-08	6.032910e-09
2	353.594593	353.691103	353.691102	0.096510	0.027294	0.096510	0.027294	1.523574e-07	4.307639e-08
3	353.719705	353.784908	353.784908	0.065203	0.018434	0.065203	0.018434	3.971718e-08	1.122636e-08
4	353.803838	353.875188	353.875188	0.071350	0.020166	0.071350	0.020166	4.792372e-08	1.354255e-08

In [40]:

df[['delta_gpflow','%_gpflow','delta_gpy','%_gpy','delta_gpflow_gpy','%_gpflow_gpy']].max()

Out[40]:

delta_gpflow        4.135843e-01
%_gpflow            1.159868e-01
delta_gpy           4.135842e-01
%_gpy               1.159867e-01
delta_gpflow_gpy    2.924092e-07
%_gpflow_gpy        8.092542e-08
dtype: float64

Using different GPy optimizers¶

GPy delivers "wrong" results compared to gpflow

In [41]:

k1__  = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.rbf_1.variance.value)           , lengthscale=float(m.kern.rbf_1.lengthscales.value))
k2__1 = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.product.rbf.variance.value)     , lengthscale=float(m.kern.product.rbf.lengthscales.value))
k2__2 = GPy.kern.StdPeriodic(input_dim=1, variance=float(m.kern.product.periodic.variance.value), lengthscale=float(m.kern.product.periodic.lengthscales.value),period=float(m.kern.product.periodic.period.value))
k2__  = k2__1 * k2__2
k4__1 = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.rbf_2.variance.value)           , lengthscale=float(m.kern.rbf_2.lengthscales.value))
k4__2 = GPy.kern.White      (input_dim=1, variance=float(m.kern.white.variance.value))
k4__  = k4__1 + k4__2
k__   = k1__ + k2__ + k4__
m__   = GPy.models.GPRegression(X,Y,k__)
m__.Gaussian_noise.variance[0] = float(m.likelihood.variance.value)
m__.optimize()
m__

 /home/user/cs/local/install/Anaconda3-5.1.0-Linux-x86_64/lib/python3.6/site-packages/paramz/transformations.py:111: RuntimeWarning:overflow encountered in expm1

Out[41]:

Model: GP regression
Objective: 129.03256333921234
Number of Parameters: 11
Number of Optimization Parameters: 11
Updates: True

GP_regression.	value	constraints
sum.rbf.variance	4356.219071782721	+ve
sum.rbf.lengthscale	48.5657154836552	+ve
sum.mul.rbf.variance	5.641442868106639	+ve
sum.mul.rbf.lengthscale	91.88714903815146	+ve
sum.mul.std_periodic.variance	1.0263935281187573	+ve
sum.mul.std_periodic.period	0.9997023902939737	+ve
sum.mul.std_periodic.lengthscale	0.6709642212192272	+ve
sum.rbf_1.variance	0.16043370928666736	+ve
sum.rbf_1.lengthscale	0.32753755952069186	+ve
sum.white.variance	0.03193821950509535	+ve
Gaussian_noise.variance	0.010237414044922417	+ve

In [42]:

k1__  = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.rbf_1.variance.value)           , lengthscale=float(m.kern.rbf_1.lengthscales.value))
k2__1 = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.product.rbf.variance.value)     , lengthscale=float(m.kern.product.rbf.lengthscales.value))
k2__2 = GPy.kern.StdPeriodic(input_dim=1, variance=float(m.kern.product.periodic.variance.value), lengthscale=float(m.kern.product.periodic.lengthscales.value),period=float(m.kern.product.periodic.period.value))
k2__  = k2__1 * k2__2
k4__1 = GPy.kern.RBF        (input_dim=1, variance=float(m.kern.rbf_2.variance.value)           , lengthscale=float(m.kern.rbf_2.lengthscales.value))
k4__2 = GPy.kern.White      (input_dim=1, variance=float(m.kern.white.variance.value))
k4__  = k4__1 + k4__2
k__   = k1__ + k2__ + k4__
m__   = GPy.models.GPRegression(X,Y,k__)
m__.Gaussian_noise.variance[0] = float(m.likelihood.variance.value)
m__.optimize(optimizer='scg')
m__

 /home/user/cs/local/install/Anaconda3-5.1.0-Linux-x86_64/lib/python3.6/site-packages/paramz/transformations.py:111: RuntimeWarning:overflow encountered in expm1

Out[42]:

Model: GP regression
Objective: 129.03256333607948
Number of Parameters: 11
Number of Optimization Parameters: 11
Updates: True

GP_regression.	value	constraints
sum.rbf.variance	4356.219071782721	+ve
sum.rbf.lengthscale	48.56571548365474	+ve
sum.mul.rbf.variance	5.641442868106637	+ve
sum.mul.rbf.lengthscale	91.88714903814706	+ve
sum.mul.std_periodic.variance	1.026393528118752	+ve
sum.mul.std_periodic.period	0.9997023902770954	+ve
sum.mul.std_periodic.lengthscale	0.6709642212192378	+ve
sum.rbf_1.variance	0.16043370928667006	+ve
sum.rbf_1.lengthscale	0.32753755952074015	+ve
sum.white.variance	0.03193821950509133	+ve
Gaussian_noise.variance	0.01023741404492199	+ve

In [43]:

def conditional(X_new, X, Y, cov, sigma_n=0, framework='gpy'):
    if not isinstance(cov, list):
        cov = [cov]

    if len(cov) < 2:
        total_covariance_function = cov[0]
    else:
        # total_covariance_function = None
        total_covariance_function = functools.reduce(lambda a, x: a + x, cov)

    if framework == 'gpflow':
        A = total_covariance_function.compute_K_symm(X_new)
        C = total_covariance_function.compute_K(X_new, X)
        B = total_covariance_function.compute_K_symm(X) + np.power(sigma_n,2)*np.diag(np.ones(len(X)))
    elif framework == 'gpy':
        A = total_covariance_function.K(X_new) # total_covariance_function.K(x_new,x_new)
        C = total_covariance_function.K(X_new, X)
        B = total_covariance_function.K(X) + np.power(sigma_n,2)*np.diag(np.ones(len(X))) # total_covariance_function.K(x,x)
    else:
        raise ValueError('Unknown framework parameter: {}'.format(framework))

    # mu = [np.linalg.inv(B).dot(C.T).T.dot(y).squeeze()]
    BY = np.linalg.solve(B, Y)
    mu = [C.dot(BY).squeeze()]
    sigma = [(A - C.dot(np.linalg.inv(B).dot(C.T))).squeeze()]

    for i in range(0, len(cov)):
        partial_covariance_function = cov[i]
        C_ = None
        if framework == 'gpflow':
            C_ = partial_covariance_function.compute_K(X_new, X)
        elif framework == 'gpy':
            C_ = partial_covariance_function.K(X_new, X)
        else:
            raise ValueError('Unknown framework parameter: {}'.format(framework))
        # mu_ = np.linalg.inv(B).dot(C_.T).T.dot(y).squeeze()
        mu_ = C_.dot(BY).squeeze()
        mu.append(mu_)

    return (mu, sigma)

def predict(X_new, X, Y, cov, sigma_n=0, framework='gpy'):
    l_y_pred, l_sigmas = conditional(X_new, X, Y, cov=cov, sigma_n=sigma_n, framework=framework)
    if len(l_sigmas[0].shape) > 1:
        return l_y_pred, [np.diagonal(ls) for ls in l_sigmas]
    else:
        return l_y_pred, l_sigmas

In [44]:

with warnings.catch_warnings():
    warnings.filterwarnings("ignore")
    y_pred2, sigmas2 = predict(XX, X, Y, cov=[k1_ , k2_ , k4_] , sigma_n=np.sqrt(m.likelihood.variance.value)   , framework='gpflow')
    y_pred3, sigmas3 = predict(XX, X, Y, cov=[k1__, k2__, k4__], sigma_n=np.sqrt(m__.Gaussian_noise.variance[0]), framework='gpy')

In [45]:

'{:.10f}'.format(np.abs(y_pred2[0] - y_pred3[0]).max())

Out[45]:

'0.0000001568'

CCO2O_2 Mauna Loa Gaussian Process Regression¶