You are seeing the notebook output generated by Literate.jl from the Julia source file. The rendered HTML can be viewed in the docs.
One of the reasons to be interested in the spectral density of a kernel is that it allows us to approximate a GP Prior. In this notebook we show the two feature functions implemented in KernelSpectralDensities.jl and how to use them.
Load required packages
using KernelSpectralDensities
using StatsBase
using LinearAlgebra
using CairoMakie
using DisplayAs #hide
In general, feature functions allow us to project an input into a higher-dimensional space, which is useful for a variety of tasks. For example, we can use them to approximate a kernel using the "kernel trick".
A special class of feature functions are "random Fourier features", derived from
the Fourier transform, which we saw in add link from other example.
KernelSpectralDensities implements two types of random Fourier features,
ShiftedRFF
and DoubleRFF
.
For this example we use the simple squared exponential kernel.
ker = SqExponentialKernel()
S = SpectralDensity(ker, 1);
The ShiftedRFF
feature function is somewhat more common, and has
been used in papers such as Efficiently sampling functions from Gaussian process posteriors.
It is defined as
φi(x)=√2/lcos(2π((wTix)+bi))
We generate a set of 4 feature functions, which we can evaluate at any point.
srff = ShiftedRFF(S, 4)
srff(1.0)
4-element Vector{Float64}: -0.678811244277198 -0.07379871172304092 -0.5752271467160547 -0.3316998113692027
If we plot them, we see that each feature function is a harmonic with varying frequency and phase.
x = range(0, 5; length=100)
f = Figure(; size=(600, 400))
ax = Axis(f[1, 1]; xlabel="x", ylabel="rff(x)", title="ShiftedRFF")
series!(ax, x, reduce(hcat, srff.(x)); labels=["srff $i" for i in 1:4])
axislegend(ax; position=:rt)
f
The DoubleRFF
feature function is less common, but is theoretically
equivalent to the ShiftedRFF
feature function.
It is defined as
φ(x)=√1/l(cos(2πw′x)sin(2πw′x))
We again generate a set of 4 feature functions.
drff = DoubleRFF(S, 4)
drff(1.0)
4-element Vector{Float64}: 0.6932661920534857 0.35352779445995697 -0.13921920469410634 0.6123872129170224
We plot these features as well
f = Figure(; size=(600, 400))
ax = Axis(f[1, 1]; xlabel="x", ylabel="rff(x)", title="Double RFF")
series!(ax, x, reduce(hcat, drff.(x)); labels=["drff $i" for i in 1:4])
axislegend(ax; position=:rt)
f
We can use the feature functions to approximate a kernel, using the kernel trick k(x,x′)=⟨φ(x),φ(x′)⟩
To demonstrate that this works, we generate some feature functions and see how well they recover the kernel.
rff = ShiftedRFF(S, 100)
kt(x, y) = dot(rff(x), rff(y))
x_plot = range(0, 2; length=50)
f = Figure(; size=(600, 400))
ax = Axis(f[1, 1]; xlabel="y", ylabel="ker(0, y)", title="")
lines!(ax, x_plot, ker.(0, x_plot); label="Original Kernel")
lines!(ax, x_plot, kt.(0, x_plot); label="KT, l = 100")
axislegend(ax)
f
Clearly this is not quite correct, and we can quantify this by checking the error.
norm(ker.(0, x_plot) .- kt.(0, x_plot))
0.5981992704193174
Fortunately, we can improve the approximation by using more features,
rff1000 = ShiftedRFF(S, 5000)
kt1000(x, y) = dot(rff1000(x), rff1000(y))
lines!(ax, x_plot, kt1000.(0, x_plot); label="KT, l=1000")
axislegend(ax)
f
which also reduces the error.
norm(ker.(0, x_plot) .- kt1000.(0, x_plot))
0.08264467536433352
In the section above we used the ShiftedRFF
feature function, but what about the DoubleRFF
?
Let's compare the two!. First we define some helper functions.
function kt_error(ker, rff, S, l, x)
rff = rff(S, l)
kt(x, y) = dot(rff(x), rff(y))
return norm(ker.(0, x) .- kt.(0, x))
end
function mean_kt_error(ker, rff, S, l, x, n)
return mean([kt_error(ker, rff, S, l, x) for _ in 1:n])
end
nothing # hide
Now we compute the mean error for both feature functions, using 100 features when recovering the original kernel. To reduce the effect of randomness, we average over 5000 runs.
srff_err = mean_kt_error(ker, ShiftedRFF, S, 100, x_plot, 5000)
0.5172357811364618
drff_err = mean_kt_error(ker, DoubleRFF, S, 100, x_plot, 5000)
0.38844224378063663
We see that the double rff has a lower average error. This continues to hold for higher number of features.
srff_err = mean_kt_error(ker, ShiftedRFF, S, 1000, x_plot, 5000)
0.16641427025939007
drff_err = mean_kt_error(ker, DoubleRFF, S, 1000, x_plot, 5000)
0.12305498951692075
Lastly, we show a loglog plot of the mean error as a function of the number of features. We see that both feature functions have the order of error scaling, but the DualRFF error has a small offset, resulting in a lower error. This is especially impactful for a small number of features.
l_plot = [10, 50, 100, 500, 1000]
srff_comp = [mean_kt_error(ker, ShiftedRFF, S, l, x_plot, 5000) for l in l_plot]
drff_comp = [mean_kt_error(ker, DoubleRFF, S, l, x_plot, 5000) for l in l_plot]
f = Figure(; size=(600, 400))
ax = Axis(f[1, 1]; xlabel="l", ylabel="mean err", title="", xscale=log10, yscale=log10)
lines!(ax, l_plot, srff_comp; label="ShiftedRFF")
lines!(ax, l_plot, drff_comp; label="DoubleRFF")
axislegend(ax)
f
Status `~/work/KernelSpectralDensities.jl/KernelSpectralDensities.jl/examples/2-features/Project.toml` [13f3f980] CairoMakie v0.13.1 [0b91fe84] DisplayAs v0.1.6 [027d52a2] KernelSpectralDensities v0.2.0 `/home/runner/work/KernelSpectralDensities.jl/KernelSpectralDensities.jl#main` [98b081ad] Literate v2.20.1 [2913bbd2] StatsBase v0.34.4To reproduce this notebook's package environment, you can download the full Manifest.toml.
Julia Version 1.11.3 Commit d63adeda50d (2025-01-21 19:42 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 4 × AMD EPYC 7763 64-Core Processor WORD_SIZE: 64 LLVM: libLLVM-16.0.6 (ORCJIT, znver3) Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores) Environment: JULIA_PKG_SERVER_REGISTRY_PREFERENCE = eager JULIA_LOAD_PATH = :/home/runner/.julia/packages/JuliaGPsDocs/7M86H/src
This notebook was generated using Literate.jl.