Notebook

Regression with Outliers¶

In the standard Gaussian process regression setting it is assumed that the observations are Normally distributed about the latent function. In the package this can applied using either the GP or GPE functions with which exact Gaussian process models.

One of the drawbacks of exact GP regression is that by assuming Normal noise the GP is not robust to outliers. In this setting, it is more appropriate to assume that the distribution of the noise is heavy tailed. For example, with a Student-t distribution, $$ \mathbf{y} \ | \ \mathbf{f},\nu,\sigma \sim \prod_{i=1}^n \frac{\Gamma(\nu+1)/2}{\Gamma(\nu/2)\sqrt{\nu\pi}\sigma}\left(1+\frac{(y_i-f_i)^2}{\nu\sigma^2}\right)^{-(\nu+1)/2} $$

Moving away from the Gaussian likelihood function (i.e. Normally distributed noise) and using the Student-t likelihood means that we can no longer analytically calculate the GP marginal likelihood. We can take a Bayesian perspective and sample from the joint distribution of the latent function and model parameters.

In [1]:

#Load functions from packages
using GaussianProcesses, Plots
using Distributions:Normal, TDist
using Random
using Statistics

#Simulate the data
Random.seed!(112233)
n = 20
X = range(-3,stop=3,length=n);
sigma = 1.0
Y = X + sigma*rand(TDist(3),n);

# Plots observations
pyplot()
scatter(X,Y;fmt=:png, leg=false)

Out[1]:

We fit a standard (exact) Gaussian process model to the Student-t data and compare this against the Monte Carlo GP which is applicable for non-Gaussian observations models.

In [2]:

#Build the models

gpe = GPE(X,vec(Y),MeanZero(),Matern(3/2,0.0,0.0),0.5) #Exact GP assuming Gaussian noise

l = StuTLik(3,0.1)
gpmc = GPMC(X, vec(Y), MeanZero(), Matern(3/2,0.0,0.0), l) #Monte Carlo GP with student-t likelihood

Out[2]:

GP Monte Carlo object:
  Dim = 1
  Number of observations = 20
  Mean function:
    Type: MeanZero, Params: Float64[]
  Kernel:
    Type: Mat32Iso, Params: [0.0, 0.0]
  Likelihood:
    Type: StuTLik, Params: [0.1]
  Input observations = 
[-3.0 -2.68421 … 2.68421 3.0]
  Output observations = [-3.98707, -2.7855, -1.74722, -2.27384, -7.12662, -1.43191, -0.840899, 0.305618, -0.444213, -0.924319, -0.0303161, -0.368707, -0.706642, 2.07517, 5.68693, -1.02634, 3.25863, 5.46091, 4.45861, 4.60741]
  Log-posterior = -77.863

Estimate the parameters of the exact GP through maximum likelihood estimation

In [3]:

optimize!(gpe)

Out[3]:

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.5,0.0,0.0]
 * Minimizer: [0.6566151884322414,1.7118340256199516, ...]
 * Minimum: 4.578633e+01
 * Iterations: 13
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 2.09e-08 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
     |f(x) - f(x')| = 0.00e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.84e-12 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 38
 * Gradient Calls: 38

Taking a Bayesian perspective, we can add prior distributions to the model parameters and sample from the posterior distribution using Markov chain Monte Carlo through the mcmc function which uses a Hamiltonian Monte Carlo sampler.

In [4]:

set_priors!(gpmc.lik,[Normal(-2.0,4.0)])
set_priors!(gpmc.kernel,[Normal(-2.0,4.0),Normal(-2.0,4.0)])

samples = mcmc(gpmc;nIter=10000,burn=1000,thin=2);

Number of iterations = 10000, Thinning = 2, Burn-in = 1000 
Step size = 0.100000, Average number of leapfrog steps = 10.016300 
Number of function calls: 100164
Acceptance rate: 0.377800

In [5]:

#Plot posterior samples
xtest = range(minimum(gpmc.x),stop=maximum(gpmc.x),length=50);

#Set the parameters to the posterior values the sample random function
fsamples = [];
for i in 1:size(samples,2)
    set_params!(gpmc,samples[:,i])
    update_target!(gpmc)
    push!(fsamples, rand(gpmc,xtest))
end

#Predict
p1=plot(gpe,leg=false, title="Exact GP")   #Exact GP (assuming Gaussian noise)


sd = [std(fsamples[i]) for i in 1:50]
p2=plot(xtest,mean(fsamples),ribbon=2*sd,leg=false, title="Monte Carlo GP") #GP Monte Carlo with student-t noise
scatter!(X,Y)

plot(p1,p2;fmt=:png)

Out[5]: