Regression with Outliers

In the standard Gaussian process regression setting it is assumed that the observations are Normally distributed about the latent function. In the package this can applied using either the GP or GPE functions with which exact Gaussian process models.

One of the drawbacks of exact GP regression is that by assuming Normal noise the GP is not robust to outliers. In this setting, it is more appropriate to assume that the distribution of the noise is heavy tailed. For example, with a Student-t distribution, $$ \mathbf{y} \ | \ \mathbf{f},\nu,\sigma \sim \prod_{i=1}^n \frac{\Gamma(\nu+1)/2}{\Gamma(\nu/2)\sqrt{\nu\pi}\sigma}\left(1+\frac{(y_i-f_i)^2}{\nu\sigma^2}\right)^{-(\nu+1)/2} $$

Moving away from the Gaussian likelihood function (i.e. Normally distributed noise) and using the Student-t likelihood means that we can no longer analytically calculate the GP marginal likelihood. We can take a Bayesian perspective and sample from the joint distribution of the latent function and model parameters.

In [1]:
#Load functions from packages
using GaussianProcesses, Plots
using Distributions:Normal, TDist

#Simulate the data
n = 20
X = collect(linspace(-3,3,n))
sigma = 1.0
Y = X + sigma*rand(TDist(3),n);

# Plots observations
scatter(X,Y;fmt=:png, leg=false)

We fit a standard (exact) Gaussian process model to the Student-t data and compare this against the Monte Carlo GP which is applicable for non-Gaussian observations models.

In [2]:
#Build the models

gpe = GPE(X,vec(Y),MeanZero(),Matern(3/2,0.0,0.0),0.5) #Exact GP assuming Gaussian noise

l = StuTLik(3,0.1)
gpmc = GPMC(X, vec(Y), MeanZero(), Matern(3/2,0.0,0.0), l) #Monte Carlo GP with student-t likelihood
GP Monte Carlo object:
  Dim = 1
  Number of observations = 20
  Mean function:
    Type: GaussianProcesses.MeanZero, Params: Float64[]
    Type: GaussianProcesses.Mat32Iso, Params: [0.0, 0.0]
    Type: GaussianProcesses.StuTLik, Params: [0.1]
  Input observations = 
[-3.0 -2.68421 … 2.68421 3.0]
  Output observations = [-3.98707, -2.7855, -1.74722, -2.27384, -7.12662, -1.43191, -0.840899, 0.305618, -0.444213, -0.924319, -0.0303161, -0.368707, -0.706642, 2.07517, 5.68693, -1.02634, 3.25863, 5.46091, 4.45861, 4.60741]
  Log-posterior = -77.863

Estimate the parameters of the exact GP through maximum likelihood estimation

In [3]:
Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.5,0.0,0.0]
 * Minimizer: [0.6566151884322408,1.7118340256199516, ...]
 * Minimum: 4.578633e+01
 * Iterations: 13
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 2.09e-08 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 3.10e-16 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.86e-12 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 38
 * Gradient Calls: 38

Taking a Bayesian perspective, we can add prior distributions to the model parameters and sample from the posterior distribution using Markov chain Monte Carlo through the mcmc function which uses a Hamiltonian Monte Carlo sampler.

In [4]:

samples = mcmc(gpmc;nIter=10000,burn=1000,thin=2);
Number of iterations = 10000, Thinning = 2, Burn-in = 1000 
Step size = 0.100000, Average number of leapfrog steps = 10.036500 
Number of function calls: 100366
Acceptance rate: 0.508500 
In [5]:
#Plot posterior samples
xtest = linspace(minimum(gpmc.X),maximum(gpmc.X),50);

#Set the parameters to the posterior values the sample random function
fsamples = [];
for i in 1:size(samples,2)
    push!(fsamples, rand(gpmc,xtest))

p1=plot(gpe,leg=false, title="Exact GP")   #Exact GP (assuming Gaussian noise)

sd = [std(fsamples[i]) for i in 1:50]
p2=plot(xtest,mean(fsamples),ribbon=2*sd,leg=false, title="Monte Carlo GP") #GP Monte Carlo with student-t noise