Notebook

Mauna Loa time series example¶

You are seeing the notebook output generated by Literate.jl from the Julia source file. The rendered HTML can be viewed in the docs.

In this notebook, we apply Gaussian process regression to the Mauna Loa CO₂ dataset. This showcases a rich combination of kernels, and how to handle and optimize all their parameters.

Setup¶

We make use of the following packages:

In [1]:

using CSV, DataFrames  # data loading
using AbstractGPs  # exact GP regression
using ParameterHandling  # for nested and constrained parameters
using Optim  # optimization
using Zygote  # auto-diff gradient computation
using Plots  # visualisation

Let's load and visualize the dataset.

Tip

The let block creates a new scope, so any utility variables we define in here won't leak outside. This is particularly helpful to keep notebooks tidy! The return value of the block is given by its last expression.

In [2]:

(xtrain, ytrain), (xtest, ytest) = let
    data = CSV.read(joinpath("/home/runner/work/AbstractGPs.jl/AbstractGPs.jl/examples/1-mauna-loa", "CO2_data.csv"), Tables.matrix; header=0)
    year = data[:, 1]
    co2 = data[:, 2]

    # We split the data into training and testing set:
    idx_train = year .< 2004
    idx_test = .!idx_train

    (year[idx_train], co2[idx_train]), (year[idx_test], co2[idx_test])  # block's return value
end
# The utility variables such as `idx_train` and `idx_test` are not available outside the `let` scope

function plotdata()
    plot(; xlabel="year", ylabel="CO₂ [ppm]", legend=:bottomright)
    scatter!(xtrain, ytrain; label="training data", ms=2, markerstrokewidth=0)
    return scatter!(xtest, ytest; label="test data", ms=2, markerstrokewidth=0)
end

plotdata()

Out[2]:

Prior¶

We will model this dataset using a sum of several kernels which describe

smooth trend: squared exponential kernel with long lengthscale;
seasonal component: periodic covariance function with period of one year, multiplied with a squared exponential kernel to allow decay away from exact periodicity;
medium-term irregularities: rational quadratic kernel;
noise terms: squared exponential kernel with short lengthscale and uncorrelated observation noise.

For more details, see Rasmussen & Williams (2005), chapter 5.

We will use ParameterHandling.jl for handling the (hyper)parameters of our model. It provides functions such as positive with which we can put constraints on the hyperparameters, and allows us to represent all required parameters as a nested NamedTuple:

In [3]:

# initial values to match http://stor-i.github.io/GaussianProcesses.jl/latest/mauna_loa/
θ_init = (;
    se_long = (;
        σ = positive(exp(4.0)),
        ℓ = positive(exp(4.0)),
    ),
    seasonal = (;
        # product kernels only need a single overall signal variance
        per = (;
            ℓ = positive(exp(0.0)),  # relative to period!
            p = fixed(1.0),  # 1 year, do not optimize over
        ),
        se = (;
            σ = positive(exp(1.0)),
            ℓ = positive(exp(4.0)),
        ),
    ),
    rq = (;
        σ = positive(exp(0.0)),
        ℓ = positive(exp(0.0)),
        α = positive(exp(-1.0)),
    ),
    se_short = (;
        σ = positive(exp(-2.0)),
        ℓ = positive(exp(-2.0)),
    ),
    noise_scale = positive(exp(-2.0)),
)

Out[3]:

(se_long = (σ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(3.9999999997270757, exp, 1.4901161193847656e-8), ℓ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(3.9999999997270757, exp, 1.4901161193847656e-8)), seasonal = (per = (ℓ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(-1.490116130486996e-8, exp, 1.4901161193847656e-8), p = ParameterHandling.Fixed{Float64}(1.0)), se = (σ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(0.9999999945181691, exp, 1.4901161193847656e-8), ℓ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(3.9999999997270757, exp, 1.4901161193847656e-8))), rq = (σ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(-1.490116130486996e-8, exp, 1.4901161193847656e-8), ℓ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(-1.490116130486996e-8, exp, 1.4901161193847656e-8), α = ParameterHandling.Positive{Float64, typeof(exp), Float64}(-1.0000000405055565, exp, 1.4901161193847656e-8)), se_short = (σ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(-2.000000110105522, exp, 1.4901161193847656e-8), ℓ = ParameterHandling.Positive{Float64, typeof(exp), Float64}(-2.000000110105522, exp, 1.4901161193847656e-8)), noise_scale = ParameterHandling.Positive{Float64, typeof(exp), Float64}(-2.000000110105522, exp, 1.4901161193847656e-8))

We define a couple of helper functions to simplify the kernel construction:

In [4]:

SE(θ) = θ.σ^2 * with_lengthscale(SqExponentialKernel(), θ.ℓ)
Per(θ) = with_lengthscale(PeriodicKernel(; r=[θ.ℓ / 2]), θ.p)  # NOTE- discrepancy with GaussianProcesses.jl
RQ(θ) = θ.σ^2 * with_lengthscale(RationalQuadraticKernel(; α=θ.α), θ.ℓ)

Out[4]:

RQ (generic function with 1 method)

This allows us to write a function that, given the nested tuple of parameter values, constructs the GP prior:

In [5]:

function build_gp_prior(θ)
    k_smooth_trend = SE(θ.se_long)
    k_seasonality = Per(θ.seasonal.per) * SE(θ.seasonal.se)
    k_medium_term_irregularities = RQ(θ.rq)
    k_noise_terms = SE(θ.se_short) + θ.noise_scale^2 * WhiteKernel()
    kernel = k_smooth_trend + k_seasonality + k_medium_term_irregularities + k_noise_terms
    return GP(kernel)  # `ZeroMean` mean function by default
end

Out[5]:

build_gp_prior (generic function with 1 method)

Posterior¶

To construct the posterior, we need to first build a FiniteGP, which represents the infinite-dimensional GP at a finite number of input features:

In [6]:

function build_finite_gp(θ)
    f = build_gp_prior(θ)
    return f(xtrain)
end

Out[6]:

build_finite_gp (generic function with 1 method)

WhiteKernel vs FiniteGP observation noise

In this notebook, we already included observation noise through the WhiteKernel as part of the GP prior covariance in build_gp_prior. We therefore call f(xtrain) which implies zero (additional) observation noise.

Alternatively, we could have omitted the `θ.noise_scale^2 *
WhiteKernel()` term and instead passed the noise variance as a second
argument to the GP call in `build_finite_gp`, `f(xtrain,
θ.noise_scale^2)`.

These two approaches have slightly different semantics: In the first one,
the `WhiteKernel` contributes non-zero variance to the `[i, j]` element
of the covariance matrix of the `FiniteGP` if `xtrain[i] == xtrain[j]`
(based on the values of the features). In the second one, the observation
noise variance passed to `FiniteGP` only contributes to the diagonal
elements of the covariance matrix, i.e. for `i == j`.

Moreover, the variance (uncertainty) of the posterior predictions
includes the variance from the `WhiteKernel`, but does not include the
variance of the observation noise passed to the `FiniteGP`. To include
the observation noise in posterior predictions from the second approach,
call `fpost_opt(xtest, noise_scale^2)`.

Tip

For most use-cases and if in any doubt, we recommend that you pass in observation noise to the FiniteGP, and omit the explicit WhiteKernel. This is slightly faster (no need to check xtrain[i] == xtrain[j] for all pairs i, j), and WhiteKernel will not give stable gradients if you wish to compute the gradient of the log marginal likelihood logpdf(f(x), y) w.r.t. x.

We obtain the posterior, conditioned on the (finite) observations, by calling posterior:

In [7]:

function build_posterior_gp(θ)
    fx = build_finite_gp(θ)
    return posterior(fx, ytrain)
end

Out[7]:

build_posterior_gp (generic function with 1 method)

Now we can put it all together to obtain a PosteriorGP. The call to ParameterHandling.value is required to replace the constraints (such as positive in our case) with concrete numbers:

In [8]:

fpost_init = build_posterior_gp(ParameterHandling.value(θ_init))

Out[8]:

AbstractGPs.PosteriorGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.TransformedKernel{KernelFunctions.PeriodicKernel{Float64}, KernelFunctions.ScaleTransform{Float64}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}, @NamedTuple{α::Vector{Float64}, C::LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, x::Vector{Float64}, δ::Vector{Float64}}}(AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.TransformedKernel{KernelFunctions.PeriodicKernel{Float64}, KernelFunctions.ScaleTransform{Float64}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}(AbstractGPs.ZeroMean{Float64}(), Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.018315638888734182)
			- σ² = 2980.957987041728
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- Scale Transform (s = 1.0)
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.018315638888734182)
					- σ² = 7.3890560989306495
	Rational Quadratic Kernel (α = 0.36787944117144233, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 1.0)
			- σ² = 1.0
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 7.3890560989306495)
			- σ² = 0.018315638888734182
	White Kernel
			- σ² = 0.018315638888734182), (α = [-9.508930877725136, 19.389883608061695, -7.197900325224828, -4.730615302853674, -15.23471364637827, 18.62620628181198, 7.526138666092433, -0.03155846836128887, -13.788654737243691, 1.6524756606264315  …  -4.04659725976334, -2.9334457010779236, 2.099011615147155, 4.200011736661871, 5.696790416685783, -11.011994620197953, 9.404617195251152, -5.495561190687362, 4.121759676625851, -1.6600889945065676], C = LinearAlgebra.Cholesky{Float64, Matrix{Float64}}([54.67525650985495 54.657791476465945 … 38.45054512284434 38.41521522921301; 2988.428769237939 1.381848520498082 … 2.1695516564590633 1.8834148423466848; … ; 2102.293417535266 2104.6198692274925 … 0.2317707151386545 0.1990883709260124; 2100.3617465385078 2102.293417535265 … 2988.4287692372563 0.2317686017127864], 'U', 0), x = [1958.2083333333333, 1958.2916666666667, 1958.375, 1958.4583333333333, 1958.5416666666667, 1958.625, 1958.7083333333333, 1958.7916666666667, 1958.875, 1958.9583333333333  …  2003.2083333333333, 2003.2916666666667, 2003.375, 2003.4583333333333, 2003.5416666666667, 2003.625, 2003.7083333333333, 2003.7916666666667, 2003.875, 2003.9583333333333], δ = [315.71, 317.45, 317.5, 317.1, 315.86, 314.93, 313.2, 312.66, 313.33, 314.67  …  376.48, 377.74, 378.5, 378.18, 376.72, 374.31, 373.2, 373.1, 374.64, 375.93]))

Let's visualize what the GP fitted to the data looks like, for the initial choice of kernel hyperparameters.

We use the following function to plot a GP f on a specific range, using the AbstractGPs plotting recipes. By setting ribbon_scale=2 we visualize the uncertainty band with $\pm 2$ (instead of the default $\pm 1$ ) standard deviations.

In [9]:

plot_gp!(f; label) = plot!(f(1920:0.2:2030); ribbon_scale=2, linewidth=1, label)

plotdata()
plot_gp!(fpost_init; label="posterior f(⋅)")

Out[9]:

A reasonable fit to the data, but poor extrapolation away from the observations!

Hyperparameter Optimization¶

To improve the fit, we want to maximize the (log) marginal likelihood with respect to the hyperparameters. Optim.jl expects to minimize a loss, so we define it as the negative log marginal likelihood:

In [10]:

function loss(θ)
    fx = build_finite_gp(θ)
    lml = logpdf(fx, ytrain)  # this computes the log marginal likelihood
    return -lml
end

Out[10]:

loss (generic function with 1 method)

Work-in-progress

In the future, we are planning to provide the optimize_loss utility function as part of JuliaGaussianProcesses -- for now, we just define it inline.

The L-BFGS parameters were chosen because they seem to work well empirically. You could also try with the defaults.

In [11]:

default_optimizer = LBFGS(;
    alphaguess=Optim.LineSearches.InitialStatic(; scaled=true),
    linesearch=Optim.LineSearches.BackTracking(),
)

function optimize_loss(loss, θ_init; optimizer=default_optimizer, maxiter=1_000)
    options = Optim.Options(; iterations=maxiter, show_trace=true)

    θ_flat_init, unflatten = ParameterHandling.value_flatten(θ_init)
    loss_packed = loss ∘ unflatten

    # https://julianlsolvers.github.io/Optim.jl/stable/#user/tipsandtricks/#avoid-repeating-computations
    function fg!(F, G, x)
        if F !== nothing && G !== nothing
            val, grad = Zygote.withgradient(loss_packed, x)
            G .= only(grad)
            return val
        elseif G !== nothing
            grad = Zygote.gradient(loss_packed, x)
            G .= only(grad)
            return nothing
        elseif F !== nothing
            return loss_packed(x)
        end
    end

    result = optimize(Optim.only_fg!(fg!), θ_flat_init, optimizer, options; inplace=false)

    return unflatten(result.minimizer), result
end

Out[11]:

optimize_loss (generic function with 1 method)

We now run the optimization:

In [12]:

θ_opt, opt_result = optimize_loss(loss, θ_init)
opt_result

Iter     Function value   Gradient norm 
     0     2.285662e+02     3.466984e+02
 * time: 4.506111145019531e-5
     1     1.597968e+02     5.811062e+01
 * time: 0.5265400409698486
     2     1.516902e+02     4.934826e+01
 * time: 1.0012779235839844
     3     1.311976e+02     4.436594e+01
 * time: 1.2326030731201172
     4     1.250485e+02     2.431021e+01
 * time: 1.7128939628601074
     5     1.208493e+02     7.951627e+00
 * time: 1.9376680850982666
     6     1.190798e+02     9.555929e+00
 * time: 2.411310911178589
     7     1.172102e+02     7.308331e+00
 * time: 2.6527140140533447
     8     1.161961e+02     5.340958e+00
 * time: 3.1252670288085938
     9     1.157434e+02     2.651424e+00
 * time: 3.3527801036834717
    10     1.156009e+02     1.236270e+00
 * time: 3.5666980743408203
    11     1.155254e+02     7.741628e-01
 * time: 4.041176080703735
    12     1.154964e+02     3.221284e-01
 * time: 4.284119129180908
    13     1.154897e+02     1.036930e+00
 * time: 4.794978141784668
    14     1.154860e+02     1.210659e+00
 * time: 5.05358099937439
    15     1.154829e+02     3.334882e-01
 * time: 5.5147199630737305
    16     1.154811e+02     2.493652e-01
 * time: 5.723245143890381
    17     1.154783e+02     5.218759e-01
 * time: 5.967154026031494
    18     1.154739e+02     6.706579e-01
 * time: 6.4355480670928955
    19     1.154657e+02     2.451446e+00
 * time: 6.688038110733032
    20     1.154480e+02     1.479048e+00
 * time: 6.903762102127075
    21     1.154182e+02     4.592655e-01
 * time: 7.37571907043457
    22     1.153886e+02     1.267364e+00
 * time: 7.612437009811401
    23     1.153628e+02     9.849155e-01
 * time: 7.826198101043701
    24     1.153570e+02     1.646326e+00
 * time: 8.038887023925781
    25     1.153257e+02     2.741751e+00
 * time: 8.51312804222107
    26     1.153176e+02     8.952727e-01
 * time: 8.72532606124878
    27     1.153146e+02     4.115828e-01
 * time: 8.945369958877563
    28     1.153093e+02     4.487567e-01
 * time: 9.42802906036377
    29     1.153034e+02     4.656332e-01
 * time: 9.683174133300781
    30     1.153010e+02     2.814319e-01
 * time: 9.95576000213623
    31     1.152961e+02     2.402583e-01
 * time: 10.42322301864624
    32     1.152938e+02     5.287938e-01
 * time: 10.672600984573364
    33     1.152920e+02     1.309567e-01
 * time: 10.88507890701294
    34     1.152914e+02     7.550968e-02
 * time: 11.09777307510376
    35     1.152911e+02     1.123292e-01
 * time: 11.599746942520142
    36     1.152905e+02     1.044834e-01
 * time: 11.827867031097412
    37     1.152902e+02     4.074496e-01
 * time: 12.10360598564148
    38     1.152899e+02     1.512259e-01
 * time: 12.592885971069336
    39     1.152894e+02     1.065919e-01
 * time: 12.86594295501709
    40     1.152890e+02     2.087131e-01
 * time: 13.32337212562561
    41     1.152887e+02     1.105464e-01
 * time: 13.557969093322754
    42     1.152886e+02     1.300625e-01
 * time: 13.778734922409058
    43     1.152884e+02     1.726311e-01
 * time: 13.996907949447632
    44     1.152883e+02     1.001641e-01
 * time: 14.48740005493164
    45     1.152882e+02     2.566597e-02
 * time: 14.753740072250366
    46     1.152881e+02     3.000688e-02
 * time: 14.982991933822632
    47     1.152880e+02     8.008848e-02
 * time: 15.468786001205444
    48     1.152879e+02     7.895790e-02
 * time: 15.735589981079102
    49     1.152878e+02     2.456469e-01
 * time: 15.98135495185852
    50     1.152876e+02     1.293619e-01
 * time: 16.44363498687744
    51     1.152874e+02     4.788862e-02
 * time: 16.659332990646362
    52     1.152874e+02     3.378284e-02
 * time: 16.88279891014099
    53     1.152873e+02     6.138932e-02
 * time: 17.101464986801147
    54     1.152872e+02     1.131192e-01
 * time: 17.616990089416504
    55     1.152870e+02     6.160510e-02
 * time: 17.85941195487976
    56     1.152868e+02     7.670725e-02
 * time: 18.06210494041443
    57     1.152865e+02     1.136733e-01
 * time: 18.2756769657135
    58     1.152861e+02     1.707358e-01
 * time: 18.853105068206787
    59     1.152859e+02     3.296803e-01
 * time: 19.10324001312256
    60     1.152857e+02     1.709746e-01
 * time: 19.323065042495728
    61     1.152855e+02     5.870519e-02
 * time: 19.795124053955078
    62     1.152855e+02     5.870449e-02
 * time: 20.342180013656616
    63     1.152855e+02     5.870449e-02
 * time: 21.089613914489746
    64     1.152855e+02     6.378579e-02
 * time: 21.617460012435913
    65     1.152855e+02     6.380196e-02
 * time: 22.22872805595398
    66     1.152855e+02     6.380196e-02
 * time: 23.16610598564148

Out[12]:

 * Status: success

 * Candidate solution
    Final objective value:     1.152855e+02

 * Found with
    Algorithm:     L-BFGS

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 6.38e-02 ≰ 1.0e-08

 * Work counters
    Seconds run:   23  (vs limit Inf)
    Iterations:    66
    f(x) calls:    122
    ∇f(x) calls:   66

The final value of the log marginal likelihood is:

In [13]:

-opt_result.minimum

Out[13]:

-115.28552953175688

Warning

To avoid bad local optima, we could (and should) have carried out several random restarts with different initial values for the hyperparameters, and then picked the result with the highest marginal likelihood. We omit this for simplicity. For more details on how to fit GPs in practice, check out A Practical Guide to Gaussian Processes.

Let's construct the posterior GP with the optimized hyperparameters:

In [14]:

fpost_opt = build_posterior_gp(ParameterHandling.value(θ_opt))

Out[14]:

AbstractGPs.PosteriorGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.TransformedKernel{KernelFunctions.PeriodicKernel{Float64}, KernelFunctions.ScaleTransform{Float64}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}, @NamedTuple{α::Vector{Float64}, C::LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, x::Vector{Float64}, δ::Vector{Float64}}}(AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.KernelSum{Tuple{KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.KernelProduct{Tuple{KernelFunctions.TransformedKernel{KernelFunctions.PeriodicKernel{Float64}, KernelFunctions.ScaleTransform{Float64}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}}}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.RationalQuadraticKernel{Float64, Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}, KernelFunctions.ScaledKernel{KernelFunctions.WhiteKernel, Float64}}}}(AbstractGPs.ZeroMean{Float64}(), Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.006433770319418152)
			- σ² = 193702.65168343997
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- Scale Transform (s = 1.0)
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.012141608741588681)
					- σ² = 6.021824896021731
	Rational Quadratic Kernel (α = 0.000414331119017645, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.03460188893362941)
			- σ² = 319.30676488252783
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 8.754567915780282)
			- σ² = 0.028265792449666365
	White Kernel
			- σ² = 0.03577915464642226), (α = [-6.405054233291955, 9.478105306102208, -3.9465447794003228, -2.2960659942451325, -7.262283464584002, 10.568438906760933, 4.764817643464915, -0.4524530569754075, -5.3032225545822405, 0.6818717130531068  …  -2.41027458635354, -1.850374273887441, -0.08769478252709415, 2.9206620881820915, 3.8602291756555536, -4.849362941092445, 5.784746509096768, -2.596412769248565, 1.6291888793670009, -1.5585572426563472], C = LinearAlgebra.Cholesky{Float64, Matrix{Float64}}([440.48614543270895 440.4851485165153 … 421.8992014927274 421.83433071242655; 194027.60519039418 0.937152361691982 … 31.17812095066706 30.939700545184966; … ; 185840.75302666932 185869.5510782053 … 0.2838762410681556 0.18244279688867943; 185812.17834670338 185840.75302666402 … 194027.6051903926 0.2838618686957846], 'U', 0), x = [1958.2083333333333, 1958.2916666666667, 1958.375, 1958.4583333333333, 1958.5416666666667, 1958.625, 1958.7083333333333, 1958.7916666666667, 1958.875, 1958.9583333333333  …  2003.2083333333333, 2003.2916666666667, 2003.375, 2003.4583333333333, 2003.5416666666667, 2003.625, 2003.7083333333333, 2003.7916666666667, 2003.875, 2003.9583333333333], δ = [315.71, 317.45, 317.5, 317.1, 315.86, 314.93, 313.2, 312.66, 313.33, 314.67  …  376.48, 377.74, 378.5, 378.18, 376.72, 374.31, 373.2, 373.1, 374.64, 375.93]))

This is the kernel with the point-estimated hyperparameters:

In [15]:

fpost_opt.prior.kernel

Out[15]:

Sum of 5 kernels:
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.006433770319418152)
			- σ² = 193702.65168343997
	Product of 2 kernels:
			Periodic Kernel, length(r) = 1
					- Scale Transform (s = 1.0)
			Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
					- Scale Transform (s = 0.012141608741588681)
					- σ² = 6.021824896021731
	Rational Quadratic Kernel (α = 0.000414331119017645, metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 0.03460188893362941)
			- σ² = 319.30676488252783
	Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
			- Scale Transform (s = 8.754567915780282)
			- σ² = 0.028265792449666365
	White Kernel
			- σ² = 0.03577915464642226

Let's print the optimized values of the hyperparameters in a more helpful format:

Work-in-progress

This is another utility function we would eventually like to move out of this notebook:

In [16]:

using Printf

show_params(nt::Union{Dict,NamedTuple}) = String(take!(show_params(IOBuffer(), nt)))
function show_params(io, nt::Union{Dict,NamedTuple}, indent::Int=0)
    for (s, v) in pairs(nt)
        if v isa Union{Dict,NamedTuple}
            println(io, " "^indent, s, ":")
            show_params(io, v, indent + 4)
        else
            println(io, " "^indent, s, " = ", @sprintf("%.3f", v))
        end
    end
    return io
end

print(show_params(ParameterHandling.value(θ_opt)))

se_long:
    σ = 440.117
    ℓ = 155.430
seasonal:
    per:
        ℓ = 1.458
        p = 1.000
    se:
        σ = 2.454
        ℓ = 82.361
rq:
    σ = 17.869
    ℓ = 28.900
    α = 0.000
se_short:
    σ = 0.168
    ℓ = 0.114
noise_scale = 0.189

And, finally, we can visualize our optimized posterior GP:

In [17]:

plotdata()
plot_gp!(fpost_opt; label="optimized posterior f(⋅)")

Out[17]:

Package and system information

Package information (click to expand)

Status `~/work/AbstractGPs.jl/AbstractGPs.jl/examples/1-mauna-loa/Project.toml`
  [99985d1d] AbstractGPs v0.5.23 `/home/runner/work/AbstractGPs.jl/AbstractGPs.jl#28319c5`
  [336ed68f] CSV v0.10.15
  [a93c6f00] DataFrames v1.7.0
  [98b081ad] Literate v2.20.1
  [429524aa] Optim v1.11.0
  [2412ca09] ParameterHandling v0.5.0
  [91a5bcdd] Plots v1.40.10
⌅ [e88e6eb3] Zygote v0.6.75
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

To reproduce this notebook's package environment, you can download the full Manifest.toml.

System information (click to expand)

Julia Version 1.11.4
Commit 8561cc3d68d (2025-03-10 11:36 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  JULIA_DEBUG = Documenter
  JULIA_LOAD_PATH = :/home/runner/.julia/packages/JuliaGPsDocs/7M86H/src

This notebook was generated using Literate.jl.