You are seeing the notebook output generated by Literate.jl from the Julia source file. The rendered HTML can be viewed in the docs.
Building on linear regression, we can fit non-linear data sets by introducing a feature space. In a higher-dimensional feature space, we can overfit the data; ridge regression introduces regularization to avoid this. In this notebook we show how we can use KernelFunctions.jl for kernel ridge regression.
# Loading and setup of required packages
using KernelFunctions
using LinearAlgebra
using Distributions
# Plotting
using Plots;
default(; lw=2.0, legendfontsize=11.0, ylims=(-150, 500));
using Random: seed!
seed!(42);
Here we use a one-dimensional toy problem. We generate data using the fourth-order polynomial f(x)=(x+4)(x+1)(x−1)(x−3):
f_truth(x) = (x + 4) * (x + 1) * (x - 1) * (x - 3)
x_train = -5:0.5:5
x_test = -7:0.1:7
noise = rand(Uniform(-20, 20), length(x_train))
y_train = f_truth.(x_train) + noise
y_test = f_truth.(x_test)
plot(x_test, y_test; label=raw"$f(x)$")
scatter!(x_train, y_train; seriescolor=1, label="observations")
For training inputs X=(xn)Nn=1 and observations y=(yn)Nn=1, the linear regression weights w using the least-squares estimator are given by w=(X⊤X)−1X⊤y
linear_regression
:
function linear_regression(X, y, Xstar)
weights = (X' * X) \ (X' * y)
return Xstar * weights
end;
A linear regression fit to the above data set:
y_pred = linear_regression(x_train, y_train, x_test)
scatter(x_train, y_train; label="observations")
plot!(x_test, y_pred; label="linear fit")
We can improve the fit by including additional features, i.e. generalizing to ˜X=(ϕ(xn))Nn=1, where ϕ(x) constructs a feature vector for each input x. Here we include powers of the input, ϕ(x)=(1,x,x2,…,xd):
function featurize_poly(x; degree=1)
return repeat(x, 1, degree + 1) .^ (0:degree)'
end
function featurized_fit_and_plot(degree)
X = featurize_poly(x_train; degree=degree)
Xstar = featurize_poly(x_test; degree=degree)
y_pred = linear_regression(X, y_train, Xstar)
scatter(x_train, y_train; legend=false, title="fit of order $degree")
return plot!(x_test, y_pred)
end
plot((featurized_fit_and_plot(degree) for degree in 1:4)...)
Note that the fit becomes perfect when we include exactly as many orders in the features as we have in the underlying polynomial (4).
However, when increasing the number of features, we can quickly overfit to noise in the data set:
featurized_fit_and_plot(20)
To counteract this unwanted behaviour, we can introduce regularization. This leads to ridge regression with L2 regularization of the weights (Tikhonov regularization). Instead of the weights in linear regression, w=(X⊤X)−1X⊤y
ridge_regression
:
function ridge_regression(X, y, Xstar, lambda)
weights = (X' * X + lambda * I) \ (X' * y)
return Xstar * weights
end
function regularized_fit_and_plot(degree, lambda)
X = featurize_poly(x_train; degree=degree)
Xstar = featurize_poly(x_test; degree=degree)
y_pred = ridge_regression(X, y_train, Xstar, lambda)
scatter(x_train, y_train; legend=false, title="\$\\lambda=$lambda\$")
return plot!(x_test, y_pred)
end
plot((regularized_fit_and_plot(20, lambda) for lambda in (1e-3, 1e-2, 1e-1, 1))...)
Instead of constructing the feature matrix explicitly, we can use kernels to replace inner products of feature vectors with a kernel evaluation: ⟨ϕ(x),ϕ(x′)⟩=k(x,x′) or ˜X˜X⊤=K, where Kij=k(xi,xj).
To apply this "kernel trick" to ridge regression, we can rewrite the ridge estimate for the weights w=(X⊤X+λ1)−1X⊤y
This is implemented by kernel_ridge_regression
:
function kernel_ridge_regression(k, X, y, Xstar, lambda)
K = kernelmatrix(k, X)
kstar = kernelmatrix(k, Xstar, X)
return kstar * ((K + lambda * I) \ y)
end;
Now, instead of explicitly constructing features, we can simply pass in a PolynomialKernel
object:
function kernelized_fit_and_plot(kernel, lambda=1e-4)
y_pred = kernel_ridge_regression(kernel, x_train, y_train, x_test, lambda)
if kernel isa PolynomialKernel
title = string("order ", kernel.degree)
else
title = string(nameof(typeof(kernel)))
end
scatter(x_train, y_train; label=nothing)
return plot!(x_test, y_pred; label=nothing, title=title)
end
plot((kernelized_fit_and_plot(PolynomialKernel(; degree=degree, c=1)) for degree in 1:4)...)
However, we can now also use kernels that would have an infinite-dimensional feature expansion, such as the squared exponential kernel:
kernelized_fit_and_plot(SqExponentialKernel())
Status `~/work/KernelFunctions.jl/KernelFunctions.jl/examples/kernel-ridge-regression/Project.toml` [31c24e10] Distributions v0.25.117 [ec8451be] KernelFunctions v0.10.65 `/home/runner/work/KernelFunctions.jl/KernelFunctions.jl#65fe151` [98b081ad] Literate v2.20.1 [91a5bcdd] Plots v1.40.9 [37e2e46d] LinearAlgebra v1.11.0To reproduce this notebook's package environment, you can download the full Manifest.toml.
Julia Version 1.11.3 Commit d63adeda50d (2025-01-21 19:42 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 4 × AMD EPYC 7763 64-Core Processor WORD_SIZE: 64 LLVM: libLLVM-16.0.6 (ORCJIT, znver3) Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores) Environment: JULIA_DEBUG = Documenter JULIA_LOAD_PATH = :/home/runner/.julia/packages/JuliaGPsDocs/7M86H/src
This notebook was generated using Literate.jl.