jupyter
from the examples
directory of the local copy of GpABC.jl
:$ cd GpABC.jl/examples
$ jupyter notebook gp-example.ipynb
This notebook was tested under Julia 1.7.2
ENV["PYTHON"]=""; import Pkg; Pkg.activate(@__DIR__); Pkg.instantiate()
using GpABC, Distributions, Plots
pyplot()
Activating project at `~/gaussian_processes/GpABC.jl/examples`
Plots.PyPlotBackend()
GpABC
¶Define the latent function that we are going to approximate:
f(x) = x ^ 2 + 10 * sin(x)
f (generic function with 1 method)
Set up some training and test data. Random noise is added to observations in training points, to make the task a little bit harder.
n = 30
training_x = sort(rand(Uniform(-10, 10), n))
training_y = f.(training_x)
training_y += 20 * (rand(n) .- 0.5) # add some noise
test_x = range(min(training_x...), stop=max(training_x...), length=1000) |> collect;
The package is built around a type GPModel
, which encapsulates all the information
required for training the Gaussian Process and performing the regression. In the simplest
scenario the user would instantiate this type with some training data and labels, provide
the hyperparameters and run the regression. SquaredExponentialIsoKernel
will be used by default.
Assume we already know the kernel hyperparameters:
hypers = [37.08, 1.0, 6.58];
Run the regression and plot the results
gpm = GPModel(training_x, training_y)
set_hyperparameters(gpm, hypers)
test_y, test_var = gp_regression(test_x, gpm)
plot(test_x, test_y, ribbon=1.96 * sqrt.(test_var), c=:red, linewidth=2, label="Approximation")
plot!(test_x, f.(test_x), c=:blue, linewidth=2, label="True function")
scatter!(training_x, training_y, m=:star4, label="Noisy training data")
Normally, kernel hyperparameters are not known in advance. In this scenario the gp_train
function should be used to find the Maximum Likelihood Estimate (MLE) of hyperparameters. By default,
Conjugate Gradient bounded box optimisation is used, as long as the gradient
with respect to hyperparameters is implemented for the kernel function. If the gradient
implementation is not provided, Nelder Mead optimiser is used by default.
gp_train(gpm)
3-element Vector{Float64}: 59.66348821527163 3.3226747044378633 5.024559895725993
Re-run the regression with optimised hyperparameters, and plot the results
test_y, test_var = gp_regression(test_x, gpm)
plot(test_x, test_y, ribbon=1.96 * sqrt.(test_var), c=:red, linewidth=2, label="Approximation")
plot!(test_x, f.(test_x), c=:blue, linewidth=2, label="True function")
scatter!(training_x, training_y, m=:star4, label="Noisy training data")
gp_train
¶Suppose we want to implement our own kernel function that adds a periodic element to the standard SE ISO kernel: $$ k(x, x') = \sigma_f^2 \exp\left(-\frac{(x - x')^2}{2l^2}\right) + \exp(-2\sin^2(\sigma_g\pi(x - x'))) $$
This kernel introduces a new hyperparameter, $\sigma_g$, in addition to the standard hyperparameters of $\sigma_f$ and $l$.
import GpABC.covariance, GpABC.get_hyperparameters_size,
GpABC.covariance_grad, GpABC.covariance_training, GpABC.covariance_diagonal
mutable struct SeIsoPeriodicKernelCache
last_theta::AbstractArray{Float64, 1}
D2::AbstractArray{Float64, 2}
D::AbstractArray{Float64, 2}
se_part::AbstractArray{Float64, 2}
periodic_part::AbstractArray{Float64, 2}
end
SeIsoPeriodicKernelCache() = SeIsoPeriodicKernelCache(zeros(0), zeros(0, 0), zeros(0, 0), zeros(0, 0), zeros(0, 0))
struct SeIsoPeriodicKernel <: AbstractGPKernel
cache::SeIsoPeriodicKernelCache
end
SeIsoPeriodicKernel() = SeIsoPeriodicKernel(SeIsoPeriodicKernelCache())
function get_hyperparameters_size(ker::SeIsoPeriodicKernel, training_data::AbstractArray{Float64, 2})
3
end
function covariance(ker::SeIsoPeriodicKernel, log_theta::AbstractArray{Float64, 1},
x1::AbstractArray{Float64, 2}, x2::AbstractArray{Float64, 2})
D2 = scaled_squared_distance([log_theta[2]], x1, x2)
n = size(x1, 1)
m = size(x2, 1)
D = repeat(x1, 1, m) - repeat(x2', n, 1)
sigma_f = exp(log_theta[1] * 2)
sigma_g = exp(log_theta[3])
K = sigma_f .* exp.(-D2 ./ 2) .+ exp.(-2 .* (sin.(pi * sigma_g .* D).*sin.(pi * sigma_g .* D)))
end
covariance (generic function with 6 methods)
Note that defining the kernel gradient with respect to hyperparameters is optional, and we are skipping it here. This means that it will not be possible to use gradient-based optimisation for GP training, and Nelder-Mead algorithm will be used instead.
gpm = GPModel(training_x, training_y, SeIsoPeriodicKernel())
gp_train(gpm; log_level=1)
test_y, test_var = gp_regression(test_x, gpm)
plot(test_x, test_y, ribbon=1.96 * sqrt.(test_var), c=:red, linewidth=2, label="Approximation")
plot!(test_x, f.(test_x), c=:blue, linewidth=2, label="True function")
scatter!(training_x, training_y, m=:star4, label="Noisy training data")
Unbound optimisation using Optim.NelderMead{Optim.AffineSimplexer, Optim.AdaptiveParameters}. No gradient provided. Start point: [0.0, 0.0, 0.0, 0.0]. * Status: success * Candidate solution Final objective value: 1.120994e+02 * Found with Algorithm: Nelder-Mead * Convergence measures √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08 * Work counters Seconds run: 0 (vs limit Inf) Iterations: 253 f(x) calls: 438 Optimized log hyperparameters: [4.087406144498667, 1.2038296492655691, -0.40874810884522483, 1.5949439813933144] Optimized hyperparameters: [59.58513585132702, 3.332856183952766, 0.6644815882781876, 4.928053001723828]
Now, let's implement the gradient for the new kernel, as well as short cirquit functions for computing the covariance using cached computation results (covariance_training
), and for diagonal-only variance (covariance_diagonal
)
function update_cache(cache::SeIsoPeriodicKernelCache, log_theta::AbstractArray{Float64, 1}, x::AbstractArray{Float64, 2})
sigma_f = exp(log_theta[1] * 2)
sigma_g = exp(log_theta[3])
D2 = scaled_squared_distance([log_theta[2]], x, x)
n = size(x, 1)
D = repeat(x, 1, n) - repeat(x', n, 1)
cache.last_theta = copy(log_theta)
cache.se_part = sigma_f .* exp.(-D2 ./ 2)
cache.periodic_part = exp.(-2 .* (sin.(pi * sigma_g .* D).^2))
cache.D2 = D2
cache.D = D
0
end
function covariance_grad(ker::SeIsoPeriodicKernel, log_theta::AbstractArray{Float64, 1},
x::AbstractArray{Float64, 2}, R::AbstractArray{Float64, 2})
cache = ker.cache
if log_theta != cache.last_theta
update_cache(ker.cache, log_theta, x)
end
KR = cache.se_part .* R
d1 = 2 * sum(KR)
d2 = KR[:]' * cache.D2[:]
sigma_g = exp(log_theta[3])
periodic_arg = sigma_g .* pi .* cache.D
d3 = sum(R' .* cache.periodic_part .* -2 .* sin.(2 .* periodic_arg) .* periodic_arg)
return [d1, d2, d3]
end
function covariance_training(ker::SeIsoPeriodicKernel,
log_theta::AbstractArray{Float64, 1}, x::AbstractArray{Float64, 2})
if log_theta != ker.cache.last_theta
update_cache(ker.cache, log_theta, x)
end
return ker.cache.periodic_part + ker.cache.se_part
end
function covariance_diagonal(ker::SeIsoPeriodicKernel, log_theta::AbstractArray{Float64, 1}, x::AbstractArray{Float64, 2})
fill(exp(log_theta[1] * 2) + 1, (size(x, 1), 1))
end
covariance_diagonal (generic function with 6 methods)
Now when we train the GP and run the regression, we can see that Conjugate Gradient Descent is used.
gpm = GPModel(training_x, training_y, SeIsoPeriodicKernel())
gp_train(gpm; log_level=1)
test_y, test_var = gp_regression(test_x, gpm)
plot(test_x, test_y, ribbon=1.96 * sqrt.(test_var), c=:red, linewidth=2, label="Approximation")
plot!(test_x, f.(test_x), c=:blue, linewidth=2, label="True function")
scatter!(training_x, training_y, m=:star4, label="Noisy training data")
Bound optimisation using Optim.ConjugateGradient{Float64, Nothing, Optim.var"#30#32", LineSearches.InitialHagerZhang{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}}. Gradient provided. Start point: [1.0, 1.0, 1.0, 1.0]; lower bound: [-10.0, -10.0, -10.0, -10.0]; upper bound: [10.0, 10.0, 10.0, 10.0] * Status: success * Candidate solution Final objective value: 4.416590e+03 * Found with Algorithm: Fminbox with Conjugate Gradient * Convergence measures |x - x'| = 0.00e+00 ≤ 0.0e+00 |x - x'|/|x'| = NaN ≰ 0.0e+00 |f(x) - f(x')| = 0.00e+00 ≤ 0.0e+00 |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00 |g(x)| = 1.00e+01 ≰ 1.0e-08 * Work counters Seconds run: 1 (vs limit Inf) Iterations: 1 f(x) calls: 5 ∇f(x) calls: 4 Optimized log hyperparameters: [0.0, 0.0, 0.0, 0.0] Optimized hyperparameters: [1.0, 1.0, 1.0, 1.0]