For all tasks of this notebook, the following conditions apply: (1) You can reuse any code from the previous notebook. (2) You can use any standard python routines, including anything from numpy and scipy. (3) Except for evaluating the Gauss kernel, you are not allowed to use anything from the sklearn-package.
Make implemented subroutines and the data from the previous Notebook 5 available here. To this end, create a file sheet5.py where you copy the required code and import the python file.
# put setup and required imports here
from scipy.linalg import cholesky
from scipy.linalg import solve_triangular
from scipy.linalg import sove
In this programming task, you will use a cross-validation scheme based on Theorem 47 to estimate a good the regularization parameter.
Write a Python class RegularizedKernelRegression which can be used to compute the solution of $$ \min_{f \in \mathcal H} \sum_{i=1}^N (f(x_i) - y_i)^2 + \lambda \|f\|_k. $$ The regularization parameter $\lambda$ is chosen for given data via k-fold cross validation, see https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation . Concretely, given a set of possible regularization parameters $\Lambda$, the algorithm selects that $\lambda \in \Lambda$ which produces the smallest $L_2$-error $$ \frac{1}{N} \sum_{i=1}^N (y_i - f^{\kappa(i)}(x_i))^2. $$ Here, $\kappa: \{1,\dots,N\} \to \{1,\dots,k\}$ maps the data index $i$ to the index $j$ of the validation set, which the data points has been randomly assigned to. $f^j$ refers to the solution where the points from the $j$th validation set have been removed from the training data.
import sheet5
from scipy.linalg import solve
class RegularizedKernelRegression:
def __init__(self, kernel, regparams):
"""
Creates a RegularizedKernelRegression object.
The regularization parameter best suited to given data is selected
from a user-specific list of regularization parameters via k-fold
cross-validation.
Parameters:
kernel - function object which takes two array-like arguments X and Y
regparams - list of potential regularization parameters
"""
# your code goes here
def __find_best_regparam(self, eigval, Q, y, k):
"""
Auxiliary method used to estimate the best regularization parameter.
Parameters:
eigval - array-like with N elements; the eigenvalues
Q - matrix of shape (N,N); the normalized (unit “length”) eigenvectors,
such that the column Q[:,i] is the eigenvector corresponding
to the eigenvalue eigval[i].
y - array-like with shape (N,); sample values
k - int; the number validation sets, k=N leads to LOOCV
"""
# your code goes here
def fit(self, X, y, cv = True, k='LOOCV', regp = 0):
"""
Fits the given N data.
If cv == True, the regularization parameter is estimated using
k-fold cross-validation. By default, LOOCV is used (k=N). If N is not a
multiple of k, then an ValueError is raised. If cv == False,
the user-provided regularization parameter regp is used.
Parameters:
X - array-like with shape (N,); sample points
y - array-like with shape (N,); sample values
cv - boolean; indicates if the regularization parameter should
be estimated using cross-validation.
k - int; the number validation sets, k=N leads to LOOCV
regp - user-specified regularization parameter that is used if
cross-validation is turned off.
"""
# your code goes here
def predict(self, X):
"""
Predicts function values at the given points using the computed fit.
Parameters:
X - array-like with shape (M,)
Returns:
y_pred - array-like with shape (M,); the estimated function values
"""
# your code goes here
Your must run successfully through the following test:
kernel100 = lambda X,Y: sheet5.gauss_kernel(X,Y, 100)
reg = RegularizedKernelRegression(kernel100,np.linspace(0.001, 15, 100))
target_values = [0.65779935, 0.33784479, 0.89531297, 1.6475294, 1.20401756, 0.69247441,
-0.84429611, 0.19573349, 1.32045238, -0.32945063]
reg.fit(sheet5.X_sample, sheet5.toy_noisy)
print('Checking LOOCV-reg. param: '+\
str(np.allclose(reg.fitted_regparam, 0.001)))
print('Checking predicted values: '+\
str(np.allclose(reg.predict(sheet5.X_sample),target_values)))
Let X_sample = np.linspace(0,1,40) and X_full as in Notebook 5.Task 4. Generate four lines of noisy observations toy_noisy = toy(X_sample) + 0.5*Noise where Noise \sim N(0,1). For each of the lines of observations, generate two diagrams, one for the Gauss kernel with $\gamma=100$, the other for the Gauss kernel with $\gamma = 10$. Generate plots like in Notebook 5.Task 4 and add the following to each of the plots:
What do the plots tell you? Write some text here.
# your code goes here
In this progamming task, you will implement a Gaussian process with prior mean $m=0$ and covariance kernel $$ k_w(x,y) = \exp\left(-\frac{1}{2w^2}\|x-y\|_2^2\right) $$ with length scale $w$. The assumed noise in the data is described via the noise level $\sigma^2$.
Provide the details for the following Python class below. In contrast to previous tasks on regularized regression, work with the Cholesky decomposition here instead of an eigenvalue decomposition.
class GaussianProcess:
def __init__(self, length_scale, noise_level):
"""
Creates an instance of the GaussianProcess.
Parameters:
kernel - function object which takes two array-like arguments X and Y
noise_level - float; the noise level
"""
# your code goes here
def fit(self, X,y):
"""
Fits the Gaussian process to the given N observations
Parameters:
X - array-like with shape (N,d) or (N,) if d==1
y - array-like with shape (N,)
"""
# your code goes here
def predict(self, X):
"""
Predicts the Gaussian process at the given points X.
If the Gaussian process has not been fitted to data yet,
mean and covariance of the prior in the points X are returned
Parameters:
X - array-like with shape (M,d) or (M,) if d==1
Returns:
mu - matrix with shape (M,d); posterior mean in the points X
sig - matrix with shape (M,M); posterior covariance in the points X
"""
# your code goes here
def generate_sample_path(self, X, n=1):
"""
Generates sample paths of the posterior distribution
computed in the points X.
If the Gaussian process has not been fitted to any data yet,
then sample paths of the prior are generated.
To assure positive definiteness of the posterior covariance,
0.000001 noise is added to the posterior covariance.
Parameters:
X - array-like with shape (M,)
n - number of sample paths (default n=1)
Returns:
f - array with shape (M,n) or (M,) if n==1
"""
# your code goes here
def plot_sample_paths(self, ax, X, n=1):
"""
Adds sample paths of the Gaussian process to the plot.
If the Gaussian process has been fitted to data, then
realizations of the posterior distribution in the points X
are generated, otherwise realizations of the prior distribution
in X.
Parameters:
X - array-like with shape (M,)
n - number of sample paths (default n=1)
"""
# your code goes here
def plot_prediction(self, ax, X, legend=False):
"""
Adds a plot to the given axis object which visualizes
the posterior distribution computed in the points X.
If the Gaussian process has not been fitted to any data yet,
then the prior distribution is visualized.
The points X are depicted as blue dots, the posterior mean
is depicted as a red line and the 95%-confindence region is
depicted as filled area around the posterior mean with color lightgray.
As title the length scale and the noise level are printed.
Parameters:
X - array-like with shape (M,)
ax - axis object
legend - if True a legend is depicted in the plot
"""
# your code goes here
Your code has to pass the following tests successfully:
gp = GaussianProcess(np.sqrt(1/20), 0.0001)
X_obs = [0,0.25,0.5,0.75,1.0]
y_obs = [0,0.8,0.5,0.3, 1.0]
target_mu = [[ 5.75901382e-05]
,[ 7.99894861e-01]
,[ 4.99981486e-01]
,[ 3.00057226e-01]
,[ 9.99871278e-01]]
target_sig = [[ 9.99842618e-05, 1.17243789e-08, -6.67412986e-09, 3.35724040e-09,
-1.29059649e-09],
[ 1.17243789e-08, 9.99756334e-05, 1.64210410e-08, -8.62783973e-09,
3.35724040e-09],
[ -6.67412986e-09, 1.64210410e-08, 9.99735192e-05, 1.64210411e-08,
-6.67412986e-09],
[ 3.35724040e-09, -8.62783973e-09, 1.64210411e-08, 9.99756334e-05,
1.17243788e-08],
[ -1.29059649e-09, 3.35724040e-09, -6.67412986e-09, 1.17243788e-08,
9.99842618e-05]]
gp.fit(X_obs,y_obs)
X = np.linspace(0.0,1.0,5)
mu, sig = gp.predict(X)
print('Checking posterior mean: '+\
str(np.allclose(mu,target_mu)))
print('Checking posterior covariance: '+\
str(np.allclose(sig,target_sig)))
Use your Gaussian Process implementation to produce a number of plots where you vary $w$ and $\sigma^2$. By refering to these plots, explain the meaning of the length scale and the noise level.