Notebook

Mauna Loa time series example¶

You are seeing the notebook output generated by Literate.jl from the Julia source file. The rendered HTML can be viewed in the docs.

In this notebook, we apply Gaussian process regression to the Mauna Loa CO₂ dataset. It is adapted from the corresponding AbstractGPs.jl tutorial. It is therefore instructive to read that first and then see how EasyGPs.jl simplifies the steps.

Setup¶

We make use of the following packages:

In [1]:

using CSV, DataFrames  # data loading
using EasyGPs  # handles all things related to GPs
using Plots  # visualisation

Let's load and visualize the dataset.

In [2]:

(xtrain, ytrain), (xtest, ytest) = let
    data = CSV.read(joinpath("/home/runner/work/EasyGPs.jl/EasyGPs.jl/examples/0-mauna-loa", "CO2_data.csv"), Tables.matrix; header=0)
    year = data[:, 1]
    co2 = data[:, 2]

    # We split the data into training and testing set:
    idx_train = year .< 2004
    idx_test = .!idx_train

    (year[idx_train], co2[idx_train]), (year[idx_test], co2[idx_test])  # block's return value
end

function plotdata()
    plot(; xlabel="year", ylabel="CO₂ [ppm]", legend=:bottomright)
    scatter!(xtrain, ytrain; label="training data", ms=2, markerstrokewidth=0)
    return scatter!(xtest, ytest; label="test data", ms=2, markerstrokewidth=0)
end

plotdata()

Out[2]:

Prior¶

We construct the GP prior using the same kernels and initial parameters as in the original tutorial.

In [3]:

k_smooth_trend = exp(8.0) * with_lengthscale(SEKernel(), exp(4.0))#with_lengthscale(SEKernel(), exp(4.0))
k_seasonality =
    exp(2.0) * PeriodicKernel(; r=[0.5]) * with_lengthscale(SEKernel(), exp(4.0))
k_medium_term_irregularities =
    1.0 * with_lengthscale(RationalQuadraticKernel(; α=exp(-1.0)), 1.0)
k_noise_terms =
    exp(-4.0) * with_lengthscale(SEKernel(), exp(-2.0)) + exp(-4.0) * WhiteKernel()
kernel = k_smooth_trend + k_seasonality + k_medium_term_irregularities + k_noise_terms

Out[3]:

Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.018315638888734182)
			- σ² = 2980.9579870417283
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- σ² = 7.38905609893065
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.018315638888734182)
	Rational Quadratic Kernel (α = 0.36787944117144233, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 1.0)
			- σ² = 1.0
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 7.3890560989306495)
			- σ² = 0.01831563888873418
	White Kernel
			- σ² = 0.01831563888873418

We construct the GP object with the kernel above:

In [4]:

gp = GP(kernel)

Out[4]:

AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.PeriodicKernel{Float64}, Float64}, KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}(AbstractGPs.ZeroMean{Float64}(), Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.018315638888734182)
			- σ² = 2980.9579870417283
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- σ² = 7.38905609893065
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.018315638888734182)
	Rational Quadratic Kernel (α = 0.36787944117144233, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 1.0)
			- σ² = 1.0
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 7.3890560989306495)
			- σ² = 0.01831563888873418
	White Kernel
			- σ² = 0.01831563888873418)

Posterior¶

To construct a posterior, we can call the GP object with the usual AbstractGPs.jl API:

In [5]:

fpost_init = posterior(gp(xtrain), ytrain)

Out[5]:

AbstractGPs.PosteriorGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.PeriodicKernel{Float64}, Float64}, KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}, @NamedTuple{α::Vector{Float64}, C::LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, x::Vector{Float64}, δ::Vector{Float64}}}(AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.PeriodicKernel{Float64}, Float64}, KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}(AbstractGPs.ZeroMean{Float64}(), Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.018315638888734182)
			- σ² = 2980.9579870417283
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- σ² = 7.38905609893065
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.018315638888734182)
	Rational Quadratic Kernel (α = 0.36787944117144233, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 1.0)
			- σ² = 1.0
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 7.3890560989306495)
			- σ² = 0.01831563888873418
	White Kernel
			- σ² = 0.01831563888873418), (α = [-9.508930876112435, 19.389883604424003, -7.197900320791052, -4.730615304800623, -15.234713648253596, 18.626206282434165, 7.52613866753285, -0.03155846892785483, -13.78865473660018, 1.6524756571158592  …  -4.046597224913827, -2.933445740419233, 2.0990116246075754, 4.200011756113528, 5.696790413752951, -11.011994642514985, 9.404617185258822, -5.4955611580625705, 4.121759655599095, -1.660088986434353], C = LinearAlgebra.Cholesky{Float64, Matrix{Float64}}([54.675256509854954 54.657791476465945 … 38.45054512284434 38.41521522921302; 2988.4287692379394 1.3818485204982465 … 2.169551656459134 1.8834148423467896; … ; 2102.2934175352666 2104.619869227493 … 0.23177071515726433 0.19908837092736484; 2100.361746538508 2102.2934175352652 … 2988.428769237257 0.2317686016889045], 'U', 0), x = [1958.2083333333333, 1958.2916666666667, 1958.375, 1958.4583333333333, 1958.5416666666667, 1958.625, 1958.7083333333333, 1958.7916666666667, 1958.875, 1958.9583333333333  …  2003.2083333333333, 2003.2916666666667, 2003.375, 2003.4583333333333, 2003.5416666666667, 2003.625, 2003.7083333333333, 2003.7916666666667, 2003.875, 2003.9583333333333], δ = [315.71, 317.45, 317.5, 317.1, 315.86, 314.93, 313.2, 312.66, 313.33, 314.67  …  376.48, 377.74, 378.5, 378.18, 376.72, 374.31, 373.2, 373.1, 374.64, 375.93]))

Let's visualize what the GP fitted to the data looks like, for the initial choice of kernel hyperparameters.

We use the following function to plot a GP f on a specific range, using the AbstractGPs plotting recipes. By setting ribbon_scale=2 we visualize the uncertainty band with $\pm 2$ (instead of the default $\pm 1$ ) standard deviations.

In [6]:

plot_gp!(f; label) = plot!(f(1920:0.2:2030); ribbon_scale=2, linewidth=1, label)

plotdata()
plot_gp!(fpost_init; label="posterior f(⋅)")

Out[6]:

A reasonable fit to the data, but poor extrapolation away from the observations!

Hyperparameter Optimization¶

We can now call EasyGPs.fit in order to optimize the hyperparameters. This takes care of all the parameterizations, automatic differentiation, and runs the optimizer for us. We pass an option to choose the exact same optimizer as in the original tutorial.

In [7]:

@time fitted_gp = EasyGPs.fit(
    gp,
    xtrain,
    ytrain;
    optimizer=Optim.LBFGS(;
        alphaguess=Optim.LineSearches.InitialStatic(; scaled=true),
        linesearch=Optim.LineSearches.BackTracking(),
    ),
)

 31.334930 seconds (164.76 M allocations: 28.161 GiB, 55.54% gc time, 0.00% compilation time)

Out[7]:

AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.PeriodicKernel{Float64}, Float64}, KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}(AbstractGPs.ZeroMean{Float64}(), Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.006409359405350664)
			- σ² = 196189.1377809622
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- σ² = 6.031052736397145
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.01211762616986784)
	Rational Quadratic Kernel (α = 0.0008271632559866195, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.04882049165706758)
			- σ² = 160.20288567771527
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 8.753051584372757)
			- σ² = 0.028319259298516024
	White Kernel
			- σ² = 0.035763467727454694)

Let's now construct the posterior GP with the optimized hyperparameters:

In [8]:

fpost_opt = posterior(fitted_gp(xtrain), ytrain)

Out[8]:

AbstractGPs.PosteriorGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.PeriodicKernel{Float64}, Float64}, KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}, @NamedTuple{α::Vector{Float64}, C::LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, x::Vector{Float64}, δ::Vector{Float64}}}(AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.PeriodicKernel{Float64}, Float64}, KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}(AbstractGPs.ZeroMean{Float64}(), Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.006409359405350664)
			- σ² = 196189.1377809622
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- σ² = 6.031052736397145
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.01211762616986784)
	Rational Quadratic Kernel (α = 0.0008271632559866195, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.04882049165706758)
			- σ² = 160.20288567771527
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 8.753051584372757)
			- σ² = 0.028319259298516024
	White Kernel
			- σ² = 0.035763467727454694), (α = [-6.406182340634154, 9.479125493791651, -3.9490143583488737, -2.294487419227209, -7.264350222649295, 10.570794389574559, 4.76500548919227, -0.4504587431002098, -5.300917445548071, 0.6851966513533065  …  -2.409231626659166, -1.8522007489649162, -0.09151094650266152, 2.9183720747914403, 3.859886698441741, -4.851671638375728, 5.789111919267359, -2.5957211978777988, 1.630186241122395, -1.560011051494671], C = LinearAlgebra.Cholesky{Float64, Matrix{Float64}}([443.1201144183181 443.11912202078764 … 424.54524962114675 424.4804049003455; 196354.9960507961 0.9378174820560204 … 31.32691542330711 31.08840895407605; … ; 188124.53958787597 188153.49719916153 … 0.28386305777042586 0.18248677810649572; 188095.8055877751 188124.53958787597 … 196354.9960507961 0.2838487116274963], 'U', 0), x = [1958.2083333333333, 1958.2916666666667, 1958.375, 1958.4583333333333, 1958.5416666666667, 1958.625, 1958.7083333333333, 1958.7916666666667, 1958.875, 1958.9583333333333  …  2003.2083333333333, 2003.2916666666667, 2003.375, 2003.4583333333333, 2003.5416666666667, 2003.625, 2003.7083333333333, 2003.7916666666667, 2003.875, 2003.9583333333333], δ = [315.71, 317.45, 317.5, 317.1, 315.86, 314.93, 313.2, 312.66, 313.33, 314.67  …  376.48, 377.74, 378.5, 378.18, 376.72, 374.31, 373.2, 373.1, 374.64, 375.93]))

This is the kernel with the point-estimated hyperparameters:

In [9]:

fpost_opt.prior.kernel

Out[9]:

Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.006409359405350664)
			- σ² = 196189.1377809622
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- σ² = 6.031052736397145
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.01211762616986784)
	Rational Quadratic Kernel (α = 0.0008271632559866195, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.04882049165706758)
			- σ² = 160.20288567771527
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 8.753051584372757)
			- σ² = 0.028319259298516024
	White Kernel
			- σ² = 0.035763467727454694

And, finally, we can visualize our optimized posterior GP:

In [10]:

plotdata()
plot_gp!(fpost_opt; label="optimized posterior f(⋅)")

Out[10]:

Package and system information

Package information (click to expand)

Status `~/work/EasyGPs.jl/EasyGPs.jl/examples/0-mauna-loa/Project.toml`
  [336ed68f] CSV v0.10.13
  [a93c6f00] DataFrames v1.6.1
  [dcfb08e9] EasyGPs v0.1.0 `/home/runner/work/EasyGPs.jl/EasyGPs.jl#master`
  [98b081ad] Literate v2.16.1
  [91a5bcdd] Plots v1.40.2

To reproduce this notebook's package environment, you can download the full Manifest.toml.

System information (click to expand)

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  JULIA_DEBUG = Documenter
  JULIA_LOAD_PATH = :/home/runner/.julia/packages/JuliaGPsDocs/7M86H/src

This notebook was generated using Literate.jl.