Regression with Outliers

In the standard Gaussian process regression setting it is assumed that the observations are Normally distributed about the latent function. In the package this can applied using either the GP or GPE functions with which exact Gaussian process models.

One of the drawbacks of exact GP regression is that by assuming Normal noise the GP is not robust to outliers. In this setting, it is more appropriate to assume that the distribution of the noise is heavy tailed. For example, with a Student-t distribution, $$ \mathbf{y} \ | \ \mathbf{f},\nu,\sigma \sim \prod_{i=1}^n \frac{\Gamma(\nu+1)/2}{\Gamma(\nu/2)\sqrt{\nu\pi}\sigma}\left(1+\frac{(y_i-f_i)^2}{\nu\sigma^2}\right)^{-(\nu+1)/2} $$

Moving away from the Gaussian likelihood function (i.e. Normally distributed noise) and using the Student-t likelihood means that we can no longer analytically calculate the GP marginal likelihood. We can take a Bayesian perspective and sample from the joint distribution of the latent function and model parameters.

In [1]:
#Load functions from packages
using GaussianProcesses, Plots
using Distributions:Normal, TDist
using Random
using Statistics

#Simulate the data
Random.seed!(112233)
n = 20
X = range(-3,stop=3,length=n);
sigma = 1.0
Y = X + sigma*rand(TDist(3),n);

# Plots observations
pyplot()
scatter(X,Y;fmt=:png, leg=false)