Gaussian processes are a powerful tool for nonlinear regression models.
Assume that we have predictor variables $\mathbf{X} = \{\mathbf{x_i}\}_{i=1}^N \in \mathbb{R}^d$ and response variables $\mathbf{y}=\{y_i \in \mathbb{R}\}_{i=1}^N$.
The response variables $\mathbf{y}$ are assumed to dependent on the predictors $\mathbf{X}$,
\begin{equation} y_i \sim \mathcal{N}(f(\mathbf{x}_i),\sigma^2), \ i=1,\ldots,n, \end{equation}where $f$ is a mapping function. Treating $f$ as a random function, we assume that the distribution over $f$ is a Gaussian process, $$ f \sim \mathcal{GP}(m(\mathbf{x}),k(\mathbf{x},\mathbf{x}')), $$ where $m(\cdot)$ and $k(\cdot,\cdot)$ are the mean and kernel functions respectively.
We start by simulating some data
using GaussianProcesses
using Random
Random.seed!(20140430)
# Training data
n=10; #number of training points
x = 2π * rand(n); #predictors
y = sin.(x) + 0.05*randn(n); #regressors
The first step in modelling with Gaussian Processes is to choose mean functions and kernels which describe the process.
Note that all hyperparameters for the mean and kernel functions and $\sigma$ are given on the log scale. This is true for all strictly positive hyperparameters. Gaussian Processes are represented by objects of type 'GP' and constructed from observation data, a mean function and kernel, and optionally the amount of observation noise.
#Select mean and covariance function
mZero = MeanZero() #Zero mean function
kern = SE(0.0,0.0) #Sqaured exponential kernel (note that hyperparameters are on the log scale)
logObsNoise = -1.0 # log standard deviation of observation noise (this is optional)
gp = GP(x,y,mZero,kern,logObsNoise) #Fit the GP
GP Exact object: Dim = 1 Number of observations = 10 Mean function: Type: MeanZero, Params: Float64[] Kernel: Type: SEIso, Params: [0.0, 0.0] Input observations = [4.85461 5.17653 … 1.99412 3.45676] Output observations = [-0.967293, -1.00705, -1.0904, 0.881121, -0.333213, -0.976965, 0.915934, 0.736218, 0.950849, -0.306432] Variance of observation noise = 0.1353352832366127 Marginal Log-Likelihood = -6.335
Once we've fit the GP
function to the data, we can calculate the predicted mean and variance of the function at unobserved points $\{\mathbf{x}^\ast,y^\ast\}$, conditional on the observed data $\mathcal{D}=\{\mathbf{y},\mathbf{X}\}$. This is done with the predict_y
function.
The predict_y
function returns the mean vector $\mu(\mathbf{x}^\ast)$ and covariance matrix (variance vector if full_cov=false
) $\Sigma(\mathbf{x}^\ast,\mathbf{x}^{\ast^\top})$ of the predictive distribution,
\begin{equation}
y^\ast|\mathbf{x}^\ast,\mathcal{D} \sim \mathcal{N}(\mu(\mathbf{x}^\ast),\Sigma(\mathbf{x}^\ast,\mathbf{x}^{\ast^\top})+\sigma^2\mathbf{I}),
\end{equation}
where
\begin{eqnarray}
\mu(\mathbf{x}^\ast) &=& k(\mathbf{x}^\ast,\mathbf{X})(k(\mathbf{X}, \mathbf{X}) + \sigma_n^2 \mathbf{I})^{-1}\mathbf{y} \\ \ \mbox{and} \
\Sigma(\mathbf{x}^\ast,\mathbf{x}^{\ast}) &=& k(\mathbf{x}^\ast,\mathbf{x}^\ast) -k(\mathbf{x}^\ast,\mathbf{X})(k(\mathbf{X}, \mathbf{X})+ \sigma_n^2 \mathbf{I})^{-1} k(\mathbf{X},\mathbf{x}^\ast).
\end{eqnarray}
Note you can use the predict_f
function to predict the latent function $\mathbf{f}^\ast$.
μ, σ² = predict_y(gp,range(0,stop=2π,length=100));
Plotting GPs is straightforward and utilises the recipes approach to plotting from the Plots.jl package. More information about plotting GPs and the available functionality can be found in this notebook.
The default plot function plot(gp)
outputs the predicted mean and variance of the function (i.e. uses predict_f
in the background), with the uncertainty in the function represented by a confidence ribbon (set to 95% by default). All optional plotting arguments are given after ;
.
using Plots #Load Plots.jl package
plot(gp; xlabel="x", ylabel="y", title="Gaussian process", legend=false, fmt=:png) # Plot the GP