Abstract: Classical machine learning and statistical approaches to learning, such as neural networks and linear regression, assume a parametric form for functions. Gaussian process models are an alternative approach that assumes a probabilistic prior over functions. This brings benefits, in that uncertainty of function estimation is sustained throughout inference, and some challenges: algorithms for fitting Gaussian processes tend to be more complex than parametric models. In this sessions I will introduce Gaussian processes and explain why sustaining uncertainty is important.
$$ \newcommand{\Amatrix}{\mathbf{A}} \newcommand{\KL}[2]{\text{KL}\left( #1\,\|\,#2 \right)} \newcommand{\Kaast}{\kernelMatrix_{\mathbf{ \ast}\mathbf{ \ast}}} \newcommand{\Kastu}{\kernelMatrix_{\mathbf{ \ast} \inducingVector}} \newcommand{\Kff}{\kernelMatrix_{\mappingFunctionVector \mappingFunctionVector}} \newcommand{\Kfu}{\kernelMatrix_{\mappingFunctionVector \inducingVector}} \newcommand{\Kuast}{\kernelMatrix_{\inducingVector \bf\ast}} \newcommand{\Kuf}{\kernelMatrix_{\inducingVector \mappingFunctionVector}} \newcommand{\Kuu}{\kernelMatrix_{\inducingVector \inducingVector}} \newcommand{\Kuui}{\Kuu^{-1}} \newcommand{\Qaast}{\mathbf{Q}_{\bf \ast \ast}} \newcommand{\Qastf}{\mathbf{Q}_{\ast \mappingFunction}} \newcommand{\Qfast}{\mathbf{Q}_{\mappingFunctionVector \bf \ast}} \newcommand{\Qff}{\mathbf{Q}_{\mappingFunctionVector \mappingFunctionVector}} \newcommand{\aMatrix}{\mathbf{A}} \newcommand{\aScalar}{a} \newcommand{\aVector}{\mathbf{a}} \newcommand{\acceleration}{a} \newcommand{\bMatrix}{\mathbf{B}} \newcommand{\bScalar}{b} \newcommand{\bVector}{\mathbf{b}} \newcommand{\basisFunc}{\phi} \newcommand{\basisFuncVector}{\boldsymbol{ \basisFunc}} \newcommand{\basisFunction}{\phi} \newcommand{\basisLocation}{\mu} \newcommand{\basisMatrix}{\boldsymbol{ \Phi}} \newcommand{\basisScalar}{\basisFunction} \newcommand{\basisVector}{\boldsymbol{ \basisFunction}} \newcommand{\activationFunction}{\phi} \newcommand{\activationMatrix}{\boldsymbol{ \Phi}} \newcommand{\activationScalar}{\basisFunction} \newcommand{\activationVector}{\boldsymbol{ \basisFunction}} \newcommand{\bigO}{\mathcal{O}} \newcommand{\binomProb}{\pi} \newcommand{\cMatrix}{\mathbf{C}} \newcommand{\cbasisMatrix}{\hat{\boldsymbol{ \Phi}}} \newcommand{\cdataMatrix}{\hat{\dataMatrix}} \newcommand{\cdataScalar}{\hat{\dataScalar}} \newcommand{\cdataVector}{\hat{\dataVector}} \newcommand{\centeredKernelMatrix}{\mathbf{ \MakeUppercase{\centeredKernelScalar}}} \newcommand{\centeredKernelScalar}{b} \newcommand{\centeredKernelVector}{\centeredKernelScalar} \newcommand{\centeringMatrix}{\mathbf{H}} \newcommand{\chiSquaredDist}[2]{\chi_{#1}^{2}\left(#2\right)} \newcommand{\chiSquaredSamp}[1]{\chi_{#1}^{2}} \newcommand{\conditionalCovariance}{\boldsymbol{ \Sigma}} \newcommand{\coregionalizationMatrix}{\mathbf{B}} \newcommand{\coregionalizationScalar}{b} \newcommand{\coregionalizationVector}{\mathbf{ \coregionalizationScalar}} \newcommand{\covDist}[2]{\text{cov}_{#2}\left(#1\right)} \newcommand{\covSamp}[1]{\text{cov}\left(#1\right)} \newcommand{\covarianceScalar}{c} \newcommand{\covarianceVector}{\mathbf{ \covarianceScalar}} \newcommand{\covarianceMatrix}{\mathbf{C}} \newcommand{\covarianceMatrixTwo}{\boldsymbol{ \Sigma}} \newcommand{\croupierScalar}{s} \newcommand{\croupierVector}{\mathbf{ \croupierScalar}} \newcommand{\croupierMatrix}{\mathbf{ \MakeUppercase{\croupierScalar}}} \newcommand{\dataDim}{p} \newcommand{\dataIndex}{i} \newcommand{\dataIndexTwo}{j} \newcommand{\dataMatrix}{\mathbf{Y}} \newcommand{\dataScalar}{y} \newcommand{\dataSet}{\mathcal{D}} \newcommand{\dataStd}{\sigma} \newcommand{\dataVector}{\mathbf{ \dataScalar}} \newcommand{\decayRate}{d} \newcommand{\degreeMatrix}{\mathbf{ \MakeUppercase{\degreeScalar}}} \newcommand{\degreeScalar}{d} \newcommand{\degreeVector}{\mathbf{ \degreeScalar}} % Already defined by latex %\newcommand{\det}[1]{\left|#1\right|} \newcommand{\diag}[1]{\text{diag}\left(#1\right)} \newcommand{\diagonalMatrix}{\mathbf{D}} \newcommand{\diff}[2]{\frac{\text{d}#1}{\text{d}#2}} \newcommand{\diffTwo}[2]{\frac{\text{d}^2#1}{\text{d}#2^2}} \newcommand{\displacement}{x} \newcommand{\displacementVector}{\textbf{\displacement}} \newcommand{\distanceMatrix}{\mathbf{ \MakeUppercase{\distanceScalar}}} \newcommand{\distanceScalar}{d} \newcommand{\distanceVector}{\mathbf{ \distanceScalar}} \newcommand{\eigenvaltwo}{\ell} \newcommand{\eigenvaltwoMatrix}{\mathbf{L}} \newcommand{\eigenvaltwoVector}{\mathbf{l}} \newcommand{\eigenvalue}{\lambda} \newcommand{\eigenvalueMatrix}{\boldsymbol{ \Lambda}} \newcommand{\eigenvalueVector}{\boldsymbol{ \lambda}} \newcommand{\eigenvector}{\mathbf{ \eigenvectorScalar}} \newcommand{\eigenvectorMatrix}{\mathbf{U}} \newcommand{\eigenvectorScalar}{u} \newcommand{\eigenvectwo}{\mathbf{v}} \newcommand{\eigenvectwoMatrix}{\mathbf{V}} \newcommand{\eigenvectwoScalar}{v} \newcommand{\entropy}[1]{\mathcal{H}\left(#1\right)} \newcommand{\errorFunction}{E} \newcommand{\expDist}[2]{\left<#1\right>_{#2}} \newcommand{\expSamp}[1]{\left<#1\right>} \newcommand{\expectation}[1]{\left\langle #1 \right\rangle } \newcommand{\expectationDist}[2]{\left\langle #1 \right\rangle _{#2}} \newcommand{\expectedDistanceMatrix}{\mathcal{D}} \newcommand{\eye}{\mathbf{I}} \newcommand{\fantasyDim}{r} \newcommand{\fantasyMatrix}{\mathbf{ \MakeUppercase{\fantasyScalar}}} \newcommand{\fantasyScalar}{z} \newcommand{\fantasyVector}{\mathbf{ \fantasyScalar}} \newcommand{\featureStd}{\varsigma} \newcommand{\gammaCdf}[3]{\mathcal{GAMMA CDF}\left(#1|#2,#3\right)} \newcommand{\gammaDist}[3]{\mathcal{G}\left(#1|#2,#3\right)} \newcommand{\gammaSamp}[2]{\mathcal{G}\left(#1,#2\right)} \newcommand{\gaussianDist}[3]{\mathcal{N}\left(#1|#2,#3\right)} \newcommand{\gaussianSamp}[2]{\mathcal{N}\left(#1,#2\right)} \newcommand{\given}{|} \newcommand{\half}{\frac{1}{2}} \newcommand{\heaviside}{H} \newcommand{\hiddenMatrix}{\mathbf{ \MakeUppercase{\hiddenScalar}}} \newcommand{\hiddenScalar}{h} \newcommand{\hiddenVector}{\mathbf{ \hiddenScalar}} \newcommand{\identityMatrix}{\eye} \newcommand{\inducingInputScalar}{z} \newcommand{\inducingInputVector}{\mathbf{ \inducingInputScalar}} \newcommand{\inducingInputMatrix}{\mathbf{Z}} \newcommand{\inducingScalar}{u} \newcommand{\inducingVector}{\mathbf{ \inducingScalar}} \newcommand{\inducingMatrix}{\mathbf{U}} \newcommand{\inlineDiff}[2]{\text{d}#1/\text{d}#2} \newcommand{\inputDim}{q} \newcommand{\inputMatrix}{\mathbf{X}} \newcommand{\inputScalar}{x} \newcommand{\inputSpace}{\mathcal{X}} \newcommand{\inputVals}{\inputVector} \newcommand{\inputVector}{\mathbf{ \inputScalar}} \newcommand{\iterNum}{k} \newcommand{\kernel}{\kernelScalar} \newcommand{\kernelMatrix}{\mathbf{K}} \newcommand{\kernelScalar}{k} \newcommand{\kernelVector}{\mathbf{ \kernelScalar}} \newcommand{\kff}{\kernelScalar_{\mappingFunction \mappingFunction}} \newcommand{\kfu}{\kernelVector_{\mappingFunction \inducingScalar}} \newcommand{\kuf}{\kernelVector_{\inducingScalar \mappingFunction}} \newcommand{\kuu}{\kernelVector_{\inducingScalar \inducingScalar}} \newcommand{\lagrangeMultiplier}{\lambda} \newcommand{\lagrangeMultiplierMatrix}{\boldsymbol{ \Lambda}} \newcommand{\lagrangian}{L} \newcommand{\laplacianFactor}{\mathbf{ \MakeUppercase{\laplacianFactorScalar}}} \newcommand{\laplacianFactorScalar}{m} \newcommand{\laplacianFactorVector}{\mathbf{ \laplacianFactorScalar}} \newcommand{\laplacianMatrix}{\mathbf{L}} \newcommand{\laplacianScalar}{\ell} \newcommand{\laplacianVector}{\mathbf{ \ell}} \newcommand{\latentDim}{q} \newcommand{\latentDistanceMatrix}{\boldsymbol{ \Delta}} \newcommand{\latentDistanceScalar}{\delta} \newcommand{\latentDistanceVector}{\boldsymbol{ \delta}} \newcommand{\latentForce}{f} \newcommand{\latentFunction}{u} \newcommand{\latentFunctionVector}{\mathbf{ \latentFunction}} \newcommand{\latentFunctionMatrix}{\mathbf{ \MakeUppercase{\latentFunction}}} \newcommand{\latentIndex}{j} \newcommand{\latentScalar}{z} \newcommand{\latentVector}{\mathbf{ \latentScalar}} \newcommand{\latentMatrix}{\mathbf{Z}} \newcommand{\learnRate}{\eta} \newcommand{\lengthScale}{\ell} \newcommand{\rbfWidth}{\ell} \newcommand{\likelihoodBound}{\mathcal{L}} \newcommand{\likelihoodFunction}{L} \newcommand{\locationScalar}{\mu} \newcommand{\locationVector}{\boldsymbol{ \locationScalar}} \newcommand{\locationMatrix}{\mathbf{M}} \newcommand{\variance}[1]{\text{var}\left( #1 \right)} \newcommand{\mappingFunction}{f} \newcommand{\mappingFunctionMatrix}{\mathbf{F}} \newcommand{\mappingFunctionTwo}{g} \newcommand{\mappingFunctionTwoMatrix}{\mathbf{G}} \newcommand{\mappingFunctionTwoVector}{\mathbf{ \mappingFunctionTwo}} \newcommand{\mappingFunctionVector}{\mathbf{ \mappingFunction}} \newcommand{\scaleScalar}{s} \newcommand{\mappingScalar}{w} \newcommand{\mappingVector}{\mathbf{ \mappingScalar}} \newcommand{\mappingMatrix}{\mathbf{W}} \newcommand{\mappingScalarTwo}{v} \newcommand{\mappingVectorTwo}{\mathbf{ \mappingScalarTwo}} \newcommand{\mappingMatrixTwo}{\mathbf{V}} \newcommand{\maxIters}{K} \newcommand{\meanMatrix}{\mathbf{M}} \newcommand{\meanScalar}{\mu} \newcommand{\meanTwoMatrix}{\mathbf{M}} \newcommand{\meanTwoScalar}{m} \newcommand{\meanTwoVector}{\mathbf{ \meanTwoScalar}} \newcommand{\meanVector}{\boldsymbol{ \meanScalar}} \newcommand{\mrnaConcentration}{m} \newcommand{\naturalFrequency}{\omega} \newcommand{\neighborhood}[1]{\mathcal{N}\left( #1 \right)} \newcommand{\neilurl}{http://inverseprobability.com/} \newcommand{\noiseMatrix}{\boldsymbol{ E}} \newcommand{\noiseScalar}{\epsilon} \newcommand{\noiseVector}{\boldsymbol{ \epsilon}} \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \newcommand{\normalizedLaplacianMatrix}{\hat{\mathbf{L}}} \newcommand{\normalizedLaplacianScalar}{\hat{\ell}} \newcommand{\normalizedLaplacianVector}{\hat{\mathbf{ \ell}}} \newcommand{\numActive}{m} \newcommand{\numBasisFunc}{m} \newcommand{\numComponents}{m} \newcommand{\numComps}{K} \newcommand{\numData}{n} \newcommand{\numFeatures}{K} \newcommand{\numHidden}{h} \newcommand{\numInducing}{m} \newcommand{\numLayers}{\ell} \newcommand{\numNeighbors}{K} \newcommand{\numSequences}{s} \newcommand{\numSuccess}{s} \newcommand{\numTasks}{m} \newcommand{\numTime}{T} \newcommand{\numTrials}{S} \newcommand{\outputIndex}{j} \newcommand{\paramVector}{\boldsymbol{ \theta}} \newcommand{\parameterMatrix}{\boldsymbol{ \Theta}} \newcommand{\parameterScalar}{\theta} \newcommand{\parameterVector}{\boldsymbol{ \parameterScalar}} \newcommand{\partDiff}[2]{\frac{\partial#1}{\partial#2}} \newcommand{\precisionScalar}{j} \newcommand{\precisionVector}{\mathbf{ \precisionScalar}} \newcommand{\precisionMatrix}{\mathbf{J}} \newcommand{\pseudotargetScalar}{\widetilde{y}} \newcommand{\pseudotargetVector}{\mathbf{ \pseudotargetScalar}} \newcommand{\pseudotargetMatrix}{\mathbf{ \widetilde{Y}}} \newcommand{\rank}[1]{\text{rank}\left(#1\right)} \newcommand{\rayleighDist}[2]{\mathcal{R}\left(#1|#2\right)} \newcommand{\rayleighSamp}[1]{\mathcal{R}\left(#1\right)} \newcommand{\responsibility}{r} \newcommand{\rotationScalar}{r} \newcommand{\rotationVector}{\mathbf{ \rotationScalar}} \newcommand{\rotationMatrix}{\mathbf{R}} \newcommand{\sampleCovScalar}{s} \newcommand{\sampleCovVector}{\mathbf{ \sampleCovScalar}} \newcommand{\sampleCovMatrix}{\mathbf{s}} \newcommand{\scalarProduct}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\sign}[1]{\text{sign}\left(#1\right)} \newcommand{\sigmoid}[1]{\sigma\left(#1\right)} \newcommand{\singularvalue}{\ell} \newcommand{\singularvalueMatrix}{\mathbf{L}} \newcommand{\singularvalueVector}{\mathbf{l}} \newcommand{\sorth}{\mathbf{u}} \newcommand{\spar}{\lambda} \newcommand{\trace}[1]{\text{tr}\left(#1\right)} \newcommand{\BasalRate}{B} \newcommand{\DampingCoefficient}{C} \newcommand{\DecayRate}{D} \newcommand{\Displacement}{X} \newcommand{\LatentForce}{F} \newcommand{\Mass}{M} \newcommand{\Sensitivity}{S} \newcommand{\basalRate}{b} \newcommand{\dampingCoefficient}{c} \newcommand{\mass}{m} \newcommand{\sensitivity}{s} \newcommand{\springScalar}{\kappa} \newcommand{\springVector}{\boldsymbol{ \kappa}} \newcommand{\springMatrix}{\boldsymbol{ \mathcal{K}}} \newcommand{\tfConcentration}{p} \newcommand{\tfDecayRate}{\delta} \newcommand{\tfMrnaConcentration}{f} \newcommand{\tfVector}{\mathbf{ \tfConcentration}} \newcommand{\velocity}{v} \newcommand{\sufficientStatsScalar}{g} \newcommand{\sufficientStatsVector}{\mathbf{ \sufficientStatsScalar}} \newcommand{\sufficientStatsMatrix}{\mathbf{G}} \newcommand{\switchScalar}{s} \newcommand{\switchVector}{\mathbf{ \switchScalar}} \newcommand{\switchMatrix}{\mathbf{S}} \newcommand{\tr}[1]{\text{tr}\left(#1\right)} \newcommand{\loneNorm}[1]{\left\Vert #1 \right\Vert_1} \newcommand{\ltwoNorm}[1]{\left\Vert #1 \right\Vert_2} \newcommand{\onenorm}[1]{\left\vert#1\right\vert_1} \newcommand{\twonorm}[1]{\left\Vert #1 \right\Vert} \newcommand{\vScalar}{v} \newcommand{\vVector}{\mathbf{v}} \newcommand{\vMatrix}{\mathbf{V}} \newcommand{\varianceDist}[2]{\text{var}_{#2}\left( #1 \right)} % Already defined by latex %\newcommand{\vec}{#1:} \newcommand{\vecb}[1]{\left(#1\right):} \newcommand{\weightScalar}{w} \newcommand{\weightVector}{\mathbf{ \weightScalar}} \newcommand{\weightMatrix}{\mathbf{W}} \newcommand{\weightedAdjacencyMatrix}{\mathbf{A}} \newcommand{\weightedAdjacencyScalar}{a} \newcommand{\weightedAdjacencyVector}{\mathbf{ \weightedAdjacencyScalar}} \newcommand{\onesVector}{\mathbf{1}} \newcommand{\zerosVector}{\mathbf{0}} $$[\small{[edit]}]{style="text-align:right"} ###
[@Rasmussen:book06]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
. . .
$$ \text{data} + \text{model} \xrightarrow{\text{compute}} \text{prediction}$$. . .
. . .
. . .
. . .
- To combine data with a model need:
- a prediction function $\mappingFunction(\cdot)$ includes our beliefs about the regularities of the universe
- an objective function $\errorFunction(\cdot)$ defines the cost of misprediction.
[\small{[edit]}]{style="text-align:right"} ### Neural Networks and Prediction Functions
$\mappingFunction(\cdot)$ is a scalar function with vector inputs,
$\activationVector(\cdot)$ is a vector function with vector inputs.
dimensionality of the vector function is known as the number of hidden units, or the number of neurons.
elements of $\activationVector(\cdot)$ are the activation function of the neural network
elements of $\mappingMatrix_{1}$ are the parameters of the activation functions.
In statistics activation functions are known as basis functions.
would think of this as a linear model: not linear predictions, linear in the parameters
$\mappingVector_{1}$ are static parameters.
In machine learning we optimize $\mappingMatrix_{1}$ as well as $\mappingMatrix_{2}$ (which would normally be denoted in statistics by $\boldsymbol{\beta}$).
Revisit that decision: follow the path of @Neal:bayesian94 and @MacKay:bayesian92.
Consider the probabilistic approach.
[\small{[edit]}]{style="text-align:right"} ### Probabilistic Modelling
. . .
$\mappingMatrix$ contains $\mappingMatrix_1$ and $\mappingMatrix_2$
$p(\mappingMatrix | \dataVector, \inputMatrix)$ is posterior density
$p(\dataScalar|\inputVector, \mappingMatrix)$ is the likelihood of data point
. . .
Normally assume independence: $$ p(\dataVector|\inputMatrix, \mappingMatrix) = \prod_{i=1}^\numData p(\dataScalar_i|\inputVector_i, \mappingMatrix),$$
Can also consider priors over latents $$ p(\dataVector_*|\dataVector) = \int p(\dataVector_*|\inputMatrix_*, \mappingMatrix) p(\mappingMatrix | \dataVector, \inputMatrix) p(\inputMatrix) p(\inputMatrix_*) \text{d} \mappingMatrix \text{d} \inputMatrix \text{d}\inputMatrix_* $$
This gives unsupervised learning.
Data: $\dataVector$
Model: $p(\dataVector, \dataVector^*)$
Prediction: $p(\dataVector^*| \dataVector)$
[\small{[edit]}]{style="text-align:right"} ### Graphical Models
import daft
from matplotlib import rc
rc("font", **{'family':'sans-serif','sans-serif':['Helvetica']}, size=30)
rc("text", usetex=True)
pgm = daft.PGM(shape=[3, 1],
origin=[0, 0],
grid_unit=5,
node_unit=1.9,
observed_style='shaded',
line_width=3)
pgm.add_node(daft.Node("y_1", r"$y_1$", 0.5, 0.5, fixed=False))
pgm.add_node(daft.Node("y_2", r"$y_2$", 1.5, 0.5, fixed=False))
pgm.add_node(daft.Node("y_3", r"$y_3$", 2.5, 0.5, fixed=False))
pgm.add_edge("y_1", "y_2")
pgm.add_edge("y_2", "y_3")
pgm.render().figure.savefig("../slides/diagrams/ml/markov.svg", transparent=True)
Predict Perioperative Risk of Clostridium Difficile Infection Following Colon Surgery [@Steele:predictive12]
Easy to write in probabilities
But underlying this is a wealth of computational challenges.
High dimensional integrals typically require approximation.
In statistics, focussed more on linear model implied by $$ \mappingFunction(\inputVector) = \left.\mappingVector^{(2)}\right.^\top \activationVector(\mappingMatrix_1, \inputVector) $$
Hold $\mappingMatrix_1$ fixed for given analysis.
Gaussian prior for $\mappingMatrix$, $$ \mappingVector^{(2)} \sim \gaussianSamp{\zerosVector}{\covarianceMatrix}. $$ $$ \dataScalar_i = \mappingFunction(\inputVector_i) + \noiseScalar_i, $$ where $$ \noiseScalar_i \sim \gaussianSamp{0}{\dataStd^2} $$
\newslides{Linear Gaussian Models}
[\small{[edit]}]{style="text-align:right"} ### Recall Univariate Gaussian Properties
. . .
. . .
$$\sum_{i=1}^{\numData} \dataScalar_i \sim \gaussianSamp{\sum_{i=1}^\numData \meanScalar_i}{\sum_{i=1}^\numData\dataStd_i^2}$$Scaling a Gaussian leads to a Gaussian.
. . .
. . .
$$\mappingScalar\dataScalar\sim \gaussianSamp{\mappingScalar\meanScalar}{\mappingScalar^2 \dataStd^2}$$[If]{style="text-align:left"} $$\inputVector \sim \gaussianSamp{\meanVector}{\covarianceMatrix}$$
. . .
[And]{style="text-align:left"} $$\dataVector= \mappingMatrix\inputVector$$
. . .
[Then]{style="text-align:left"} $$\dataVector \sim \gaussianSamp{\mappingMatrix\meanVector}{\mappingMatrix\covarianceMatrix\mappingMatrix^\top}$$
[\small{[edit]}]{style="text-align:right"} ### Multivariate Gaussian Properties
If $$ \dataVector = \mappingMatrix \inputVector + \noiseVector, $$
Assume $$ \begin{align} \inputVector & \sim \gaussianSamp{\meanVector}{\covarianceMatrix}\\ \noiseVector & \sim \gaussianSamp{\zerosVector}{\covarianceMatrixTwo} \end{align} $$
Then $$ \dataVector \sim \gaussianSamp{\mappingMatrix\meanVector}{\mappingMatrix\covarianceMatrix\mappingMatrix^\top + \covarianceMatrixTwo}. $$ If $\covarianceMatrixTwo=\dataStd^2\eye$, this is Probabilistic Principal Component Analysis [@Tipping:probpca99], because we integrated out the inputs (or latent variables they would be called in that case).
Define design matrix $$ \activationMatrix = \begin{bmatrix} \activationScalar_{1, 1} & \activationScalar_{1, 2} & \dots & \activationScalar_{1, \numHidden} \\ \activationScalar_{1, 2} & \activationScalar_{1, 2} & \dots & \activationScalar_{1, \numData} \\ \vdots & \vdots & \ddots & \vdots \\ \activationScalar_{\numData, 1} & \activationScalar_{\numData, 2} & \dots & \activationScalar_{\numData, \numHidden} \end{bmatrix}. $$
. . .
$$\dataVector = \activationMatrix\mappingVector + \noiseVector$$. . .
$$\noiseVector \sim \gaussianSamp{\zerosVector}{\dataStd^2\eye}$$Instead of making assumptions about our density over each data point, $\dataScalar_i$ as i.i.d.
make a joint Gaussian assumption over our data.
covariance matrix is now a function of both the parameters of the activation function, $\mappingMatrix_1$, and the input variables, $\inputMatrix$.
Arises from integrating out $\mappingVector^{(2)}$.
[\small{[edit]}]{style="text-align:right"} ### Non-degenerate Gaussian Processes
If $$ \begin{align*} \mappingVector^{(1)} & \sim p(\cdot)\\ \phi_i & = \activationScalar\left(\mappingVector^{(1)}, \inputVector_i\right), \end{align*} $$ has finite variance.
Then taking number of hidden units to infinity, is also a Gaussian process.
Chapter 2 of Neal's thesis [@Neal:bayesian94]
Rest of Neal's thesis. [@Neal:bayesian94]
David MacKay's PhD thesis [@MacKay:bayesian92]
[\small{[edit]}]{style="text-align:right"}
from mlai import Kernel
from mlai import eq_cov
kernel = Kernel(function=eq_cov,
name='Exponentiated Quadratic',
shortname='eq',
lengthscale=0.25)
import numpy as np
np.random.seed(10)
import teaching_plots as plot
plot.rejection_samples(kernel=kernel,
diagrams='../slides/diagrams/gp')
import pods
from ipywidgets import IntSlider
pods.notebook.display_plots('gp_rejection_sample{sample:0>3}.png',
directory='../slides/diagrams/gp',
sample=IntSlider(1,1,5,1))
import numpy as np
np.random.seed(4949)
Multi-variate Gaussians
import teaching_plots as plot
from mlai import Kernel, exponentiated_quadratic
kernel=Kernel(function=exponentiated_quadratic, lengthscale=0.5)
plot.two_point_sample(kernel.K, diagrams='../slides/diagrams/gp')
import pods
from ipywidgets import IntSlider
pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(0, 0, 8, 1))
import pods
from ipywidgets import IntSlider
pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(9, 9, 12, 1))
[\small{[edit]}]{style="text-align:right"}
import pods
from ipywidgets import IntSlider
pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(13, 13, 17, 1))
Posterior mean $$\mappingFunction_D(\inputVector_*) = \kernelVector(\inputVector_*, \inputMatrix) \kernelMatrix^{-1} \mathbf{y}$$
Posterior covariance $$\mathbf{C}_* = \kernelMatrix_{*,*} - \kernelMatrix_{*,\mappingFunctionVector} \kernelMatrix^{-1} \kernelMatrix_{\mappingFunctionVector, *}$$
$$\mappingFunction_D(\inputVector_*) = \kernelVector(\inputVector_*, \inputMatrix) \boldsymbol{\alpha}$$
from mlai import Kernel
from mlai import eq_cov
kernel = Kernel(function=eq_cov,
name='Exponentiated Quadratic',
shortname='eq',
formula='\kernelScalar(\inputVector, \inputVector^\prime) = \alpha \exp\left(-\frac{\ltwoNorm{\inputVector-\inputVector^\prime}^2}{2\lengthScale^2}\right)',
lengthscale=0.2)
import teaching_plots as plot
plot.covariance_func(kernel=kernel, diagrams='../slides/diagrams/kern/')
\includesvgclass{../slides/diagrams/kern/eq_covariance.svg} |
|
- Gold medal times for Olympic Marathon since 1896. - Marathons before 1924 didn’t have a standardised distance. - Present results using pace per km. - In 1904 Marathon was badly organised leading to very slow times. |
Image from Wikimedia Commons http://bit.ly/16kMKHQ |
import matplotlib.pyplot as plt
import teaching_plots as plot
import mlai
xlim = (1875,2030)
ylim = (2.5, 6.5)
yhat = (y-offset)/scale
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=10)
ax.set_xlabel('year', fontsize=20)
ax.set_ylabel('pace min/km', fontsize=20)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/datasets/olympic-marathon.svg',
transparent=True,
frameon=True)
[\small{[edit]}]{style="text-align:right"}
|
|
import teaching_plots as plot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/olympic-marathon-gp.svg',
transparent=True, frameon=True)
gpoptimizeInit
Can we determine covariance parameters from the data?
The parameters are inside the covariance function (matrix). \normalsize $$\kernelScalar_{i, j} = \kernelScalar(\inputVals_i, \inputVals_j; \parameterVector)$$
[\Large $$\kernelMatrix = \rotationMatrix \eigenvalueMatrix^2 \rotationMatrix^\top$$]{}
|
$\eigenvalueMatrix$ represents distance on axes. $\rotationMatrix$ gives rotation. |
import matplotlib.pyplot as plt
import numpy as np
fill_color = [1., 1., 0.]
black_color = [0., 0., 0.]
blue_color = [0., 0., 1.]
magenta_color = [1., 0., 1.]
fig, ax = plt.subplots(figsize=plot.big_figsize)
ax.set_axis_off()
cax = fig.add_axes([0., 0., 1., 1.])
cax.set_axis_off()
counter = 1
lambda1 = 0.5
lambda2 = 0.3
cax.set_xlim([0., 1.])
cax.set_ylim([0., 1.])
# Matrix label axes
tax2 = fig.add_axes([0, 0.47, 0.1, 0.1])
tax2.set_xlim([0, 1.])
tax2.set_ylim([0, 1.])
tax2.set_axis_off()
label_eigenvalue = tax2.text(0.5, 0.5, "\Large $\eigenvalueMatrix=$")
ax = fig.add_axes([0.5, 0.25, 0.5, 0.5])
ax.set_xlim([-0.25, 0.6])
ax.set_ylim([-0.25, 0.6])
from matplotlib.patches import Polygon
pat_hand = ax.add_patch(Polygon(np.column_stack(([0, 0, lambda1, lambda1],
[0, lambda2, lambda2, 0])),
facecolor=fill_color,
edgecolor=black_color,
visible=False))
data = pat_hand.get_path().vertices
rotation_matrix = np.asarray([[np.sqrt(2)/2, -np.sqrt(2)/2],
[np.sqrt(2)/2, np.sqrt(2)/2]])
new = np.dot(rotation_matrix,data.T)
pat_hand = ax.add_patch(Polygon(np.column_stack(([0, 0, lambda1, lambda1],
[0, lambda2, lambda2, 0])),
facecolor=fill_color,
edgecolor=black_color,
visible=False))
pat_hand_rot = ax.add_patch(Polygon(new.T,
facecolor=fill_color,
edgecolor=black_color))
pat_hand_rot.set(visible=False)
# 3D box
pat_hand3 = [ax.add_patch(Polygon(np.column_stack(([0, -0.2*lambda1, 0.8*lambda1, lambda1],
[0, -0.2*lambda2, -0.2*lambda2, 0])),
facecolor=fill_color,
edgecolor=black_color))]
pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([0, -0.2*lambda1, -0.2*lambda1, 0],
[0, -0.2*lambda2, 0.8*lambda2, lambda2])),
facecolor=fill_color,
edgecolor=black_color)))
pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([-0.2*lambda1, 0, lambda1, 0.8*lambda1],
[0.8*lambda2, lambda2, lambda2, 0.8*lambda2])),
facecolor=fill_color,
edgecolor=black_color)))
pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([lambda1, 0.8*lambda1, 0.8*lambda1, lambda1],
[lambda2, 0.8*lambda2, -0.2*lambda2, 0])),
facecolor=fill_color,
edgecolor=black_color)))
for hand in pat_hand3:
hand.set(visible=False)
ax.set_aspect('equal')
ax.set_axis_off()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xspan = xlim[1] - xlim[0]
yspan = ylim[1] - ylim[0]
#ar_one = ax.arrow([0, lambda1], [0, 0])
ar_one = ax.arrow(x=0, y=0, dx=lambda1, dy=0)
#ar_two = ax.arrow([0, 0], [0, lambda2])
ar_two = ax.arrow(x=0, y=0, dx=0, dy=lambda2)
#ar_three = ax.arrow([0, -0.2*lambda1], [0, -0.2*lambda2])
ar_three = ax.arrow(x=0, y=0, dx=-0.2*lambda1, dy=-0.2*lambda2)
ar_one_text = ax.text(0.5*lambda1, -0.05*yspan,
'$\eigenvalue_1$',
horizontalalignment='center')
ar_two_text = ax.text(-0.05*xspan, 0.5*lambda2,
'$\eigenvalue_2$',
horizontalalignment='center')
ar_three_text = ax.text(-0.05*xspan-0.1*lambda1, -0.1*lambda2+0.05*yspan,
'$\eigenvalue_3$',
horizontalalignment='center')
ar_one.set(linewidth=3, visible=False, color=blue_color)
ar_one_text.set(visible=False)
ar_two.set(linewidth=3, visible=False, color=blue_color)
ar_two_text.set(visible=False)
ar_three.set(linewidth=3, visible=False, color=blue_color)
ar_three_text.set(visible=False)
matrix_ax = fig.add_axes([0.2, 0.35, 0.3, 0.3])
matrix_ax.set_aspect('equal')
matrix_ax.set_axis_off()
eigenvals = [['$\eigenvalue_1$', '$0$'],['$0$', '$\eigenvalue_2$']]
plot.matrix(eigenvals,
matrix_ax,
bracket_style='square',
type='entries',
bracket_color=black_color)
# First arrow
matrix_ax.cla()
plot.matrix(eigenvals,
matrix_ax,
bracket_style='square',
type='entries',
highlight=True,
highlight_row=[0, 0],
highlight_col=':',
highlight_color=magenta_color,
bracket_color=black_color)
ar_one.set(visible=True)
ar_one_text.set(visible=True)
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
# Second arrow
matrix_ax.cla()
plot.matrix(eigenvals,
matrix_ax,
bracket_style='square',
type='entries',
highlight=True,
highlight_row=[1,1],
highlight_col=':',
highlight_color=magenta_color,
bracket_color=black_color)
ar_two.set(visible=True)
ar_two_text.set(visible=True)
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
matrix_ax.cla()
plot.matrix(eigenvals, matrix_ax,
bracket_style='square',
type='entries',
bracket_color=black_color)
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
tax = fig.add_axes([0.1, 0.1, 0.8, 0.1])
tax.set_axis_off()
tax.set_xlim([0, 1])
tax.set_ylim([0, 1])
det_text = text(0.5, 0.5,
'\Large $\det{\eigenvalueMatrix} = \eigenvalue_1 \eigenvalue_2$',
horizontalalignment='center')
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
axes(ax)
pat_hand.set(visible=True)
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
det_text_plot = text(0.5*lambda1,
0.5*lambda2,
'\Large $\det{\eigenvalueMatrix}$',
horizontalalignment='center')
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
eigenvals2 = {'$\eigenvalue_1$', '$0$' '$0$'; '$0$', '$\eigenvalue_2$' '$0$'; '$0$', '$0$' '$\eigenvalue_3$'}
axes(matrix_ax)
matrix_ax.cla()
plot.matrix(eigenvals2, matrix_ax,
bracket_style='square',
type='entries',
highlight=True,
highlight_row=[2,2],
highlight_col=':',
highlight_color=magenta_color)
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
ar_three.set(visible=True)
ar_three_text.set(visible=True)
pat_hand3.set(visible=True)
det_text.set(string='\Large $\det{\eigenvalueMatrix} = \eigenvalue_1 \eigenvalue_2\eigenvalue_3$')
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
matrix_ax.cla()
plot.matrix(eigenvals,
matrix_ax,
bracket_style='square',
type='entries',
bracket_color=black_color)
ar_three.set(visible=False)
ar_three_text.set(visible=False)
pat_hand3.set(visible=False)
det_text.set(string='\Large $\det{\eigenvalueMatrix} = \eigenvalue_1 \eigenvalue_2$')
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
det_text.set(string='\Large $\det{\rotationMatrix\eigenvalueMatrix} = \eigenvalue_1 \eigenvalue_2$')
label_eigenvalue.set(string='\Large $\rotationMatrix\eigenvalueMatrix=$')
rotate_object(rotation_matrix, ar_one)
rotate_object(rotation_matrix, ar_one_text)
rotate_object(rotation_matrix, ar_two)
rotate_object(rotation_matrix, ar_two_text)
rotate_object(rotation_matrix, det_textPlot)
pat_hand_rot.set(visible=True)
pat_hand.set(visible=False)
W = [['$\mappingScalar_{1, 1}$', '$\mappingScalar_{1, 2}$'],[ '$\mappingScalar_{2, 1}$', '$\mappingScalar_{2, 2}$']]
plot.matrix(W,
matrix_ax,
bracket_style='square',
type='entries',
bracket_color=black_color)
file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)
counter += 1
import matplotlib.pyplot as plt
import numpy as np
import os
import GPy
import teaching_plots as plot
import mlai
import gp_tutorial
np.random.seed(125)
diagrams = '../slides/diagrams/gp'
black_color=[0., 0., 0.]
red_color=[1., 0., 0.]
blue_color=[0., 0., 1.]
magenta_color=[1., 0., 1.]
fontsize=18
y_lim = [-2.2, 2.2]
y_ticks = [-2, -1, 0, 1, 2]
x_lim = [-2, 2]
x_ticks = [-2, -1, 0, 1, 2]
err_y_lim = [-12, 20]
linewidth=3
markersize=15
markertype='.'
x = np.linspace(-1, 1, 6)[:, np.newaxis]
xtest = np.linspace(x_lim[0], x_lim[1], 200)[:, np.newaxis]
# True data
true_kern = GPy.kern.RBF(1) + GPy.kern.White(1)
true_kern.rbf.lengthscale = 1.0
true_kern.white.variance = 0.01
K = true_kern.K(x)
y = np.random.multivariate_normal(np.zeros((6,)), K, 1).T
# Fitted model
kern = GPy.kern.RBF(1) + GPy.kern.White(1)
kern.rbf.lengthscale = 1.0
kern.white.variance = 0.01
lengthscales = np.asarray([0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 4, 8, 16, 100])
fig1, ax1 = plt.subplots(figsize=plot.one_figsize)
fig2, ax2 = plt.subplots(figsize=plot.one_figsize)
line = ax2.semilogx(np.NaN, np.NaN, 'x-',
color=black_color)
ax.set_ylim(err_y_lim)
ax.set_xlim([0.025, 32])
ax.grid(True)
ax.set_xticks([0.01, 0.1, 1, 10, 100])
ax.set_xticklabels(['$10^{-2}$', '$10^{-1}$', '$10^0$', '$10^1$', '$10^2$'])
err = np.zeros_like(lengthscales)
err_log_det = np.zeros_like(lengthscales)
err_fit = np.zeros_like(lengthscales)
counter = 0
for i, ls in enumerate(lengthscales):
kern.rbf.lengthscale=ls
K = kern.K(x)
invK, L, Li, log_det_K = GPy.util.linalg.pdinv(K)
err[i] = 0.5*(log_det_K + np.dot(np.dot(y.T,invK),y))
err_log_det[i] = 0.5*log_det_K
err_fit[i] = 0.5*np.dot(np.dot(y.T,invK), y)
Kx = kern.K(x, xtest)
ypred_mean = np.dot(np.dot(Kx.T, invK), y)
ypred_var = kern.Kdiag(xtest) - np.sum((np.dot(Kx.T,invK))*Kx.T, 1)
ypred_sd = np.sqrt(ypred_var)
ax1.clear()
_ = gp_tutorial.gpplot(xtest.flatten(),
ypred_mean.flatten(),
ypred_mean.flatten()-2*ypred_sd.flatten(),
ypred_mean.flatten()+2*ypred_sd.flatten(),
ax=ax1)
x_lim = ax1.get_xlim()
ax1.set_ylabel('$f(x)$', fontsize=fontsize)
ax1.set_xlabel('$x$', fontsize=fontsize)
p = ax1.plot(x, y, markertype, color=black_color, markersize=markersize, linewidth=linewidth)
ax1.set_ylim(y_lim)
ax1.set_xlim(x_lim)
ax1.set_xticks(x_ticks)
#ax.set(box=False)
ax1.plot([x_lim[0], x_lim[0]], y_lim, color=black_color)
ax1.plot(x_lim, [y_lim[0], y_lim[0]], color=black_color)
file_name = 'gp-optimise{counter:0>3}.svg'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name),
figure=fig1,
transparent=True)
counter += 1
ax2.clear()
t = ax2.semilogx(lengthscales[0:i+1], err[0:i+1], 'x-',
color=magenta_color,
markersize=markersize,
linewidth=linewidth)
t2 = ax2.semilogx(lengthscales[0:i+1], err_log_det[0:i+1], 'x-',
color=blue_color,
markersize=markersize,
linewidth=linewidth)
t3 = ax2.semilogx(lengthscales[0:i+1], err_fit[0:i+1], 'x-',
color=red_color,
markersize=markersize,
linewidth=linewidth)
ax2.set_ylim(err_y_lim)
ax2.set_xlim([0.025, 32])
ax2.set_xticks([0.01, 0.1, 1, 10, 100])
ax2.set_xticklabels(['$10^{-2}$', '$10^{-1}$', '$10^0$', '$10^1$', '$10^2$'])
ax2.grid(True)
ax2.set_ylabel('negative log likelihood', fontsize=fontsize)
ax2.set_xlabel('length scale, $\ell$', fontsize=fontsize)
file_name = 'gp-optimise{counter:0>3}.svg'.format(counter=counter)
mlai.write_figure(os.path.join(diagrams, file_name),
figure=fig2,
transparent=True)
counter += 1
#ax.set_box(False)
xlim = ax2.get_xlim()
ax2.plot([xlim[0], xlim[0]], err_y_lim, color=black_color)
ax2.plot(xlim, [err_y_lim[0], err_y_lim[0]], color=black_color)
\includesvg{../slides/diagrams/gp/gp-optimise001} |
\includesvg{../slides/diagrams/gp/gp-optimise003} |
\includesvg{../slides/diagrams/gp/gp-optimise005} |
\includesvg{../slides/diagrams/gp/gp-optimise007} |
\includesvg{../slides/diagrams/gp/gp-optimise009} |
\includesvg{../slides/diagrams/gp/gp-optimise011} |
\includesvg{../slides/diagrams/gp/gp-optimise013} |
\includesvg{../slides/diagrams/gp/gp-optimise015} |
\includesvg{../slides/diagrams/gp/gp-optimise017} |
\includesvg{../slides/diagrams/gp/gp-optimise019} |
\includesvg{../slides/diagrams/gp/gp-optimise021} |
[\small{[edit]}]{style="text-align:right"}
import matplotlib.pyplot as plt
import teaching_plots as plot
import mlai
xlim = (-20,260)
ylim = (5, 7.5)
yhat = (y-offset)/scale
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=10)
ax.set_xlabel('time/min', fontsize=20)
ax.set_ylabel('expression', fontsize=20)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/datasets/della-gatta-gene.svg',
transparent=True,
frameon=True)
import teaching_plots as plot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/della-gatta-gene-gp.svg',
transparent=True, frameon=True)
import teaching_plots as plot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full2, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full2.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/della-gatta-gene-gp2.svg',
transparent=True, frameon=True)
import teaching_plots as plot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full3, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full3.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/della-gatta-gene-gp3.svg',
transparent=True, frameon=True)
import teaching_plots as plot
plot.multiple_optima(diagrams='../slides/diagrams/gp')
[\small{[edit]}]{style="text-align:right"}
[]{style="text-align:right"}
[[@Andrade:consistent14,@Mubangizi:malaria14]]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
from mlai import Kernel
from mlai import linear_cov
from mlai import eq_cov
from mlai import add_cov
kernel = Kernel(function=add_cov,
name='Additive',
shortname='add',
formula='\kernelScalar_f(\inputVector, \inputVector^\prime) = \kernelScalar_g(\inputVector, \inputVector^\prime) + \kernelScalar_h(\inputVector, \inputVector^\prime)',
kerns=[linear_cov, eq_cov],
kern_args=[{'variance': 25}, {'lengthscale' : 0.2}])
import teaching_plots as plot
plot.covariance_func(kernel=kernel, diagrams='../slides/diagrams/kern/')
\includesvgclass{../slides/diagrams/kern/add_covariance.svg} |
|
{### Analysis of US Birth Rates ###
|
|
[\small{[edit]}]{style="text-align:right"}
import teaching_plots as plot
import mlai
import numpy as np
basis = mlai.Basis(function=radial,
number=3,
data_limits=[-0.5, 0.5],
width=0.125)
kernel = mlai.Kernel(function=basis_cov,
name='Basis',
shortname='basis',
formula='\kernel(\inputVector, \inputVector^\prime) = \basisVector(\inputVector)^\top \basisVector(\inputVector^\prime)',
basis=basis)
plot.covariance_func(kernel, diagrams='../slides/diagrams/kern/')
\includesvgclass{../slides/diagrams/kern/basis_covariance.svg} |
|
import teaching_plots as plot
import mlai
import numpy as np
t=np.linspace(0, 2, 200)[:, np.newaxis]
kernel = mlai.Kernel(function=brownian_cov,
name='Brownian',
formula='\kernelScalar(t, t^\prime)=\alpha \min(t, t^\prime)',
shortname='brownian')
plot.covariance_func(kernel, t, diagrams='../slides/diagrams/kern/')
\includesvgclass{../slides/diagrams/kern/brownian_covariance.svg} |
|
import teaching_plots as plot
import mlai
import numpy as np
kernel = mlai.Kernel(function=mlp_cov,
name='Multilayer Perceptron',
shortname='mlp',
formula='\kernelScalar(\inputVector, \inputVector^\prime) = \alpha \arcsin\left(\frac{w \inputVector^\top \inputVector^\prime + b}{\sqrt{\left(w \inputVector^\top \inputVector + b + 1\right)\left(w \left.\inputVector^\prime\right.^\top \inputVector^\prime + b + 1\right)}}\right)',
w=5, b=0.5)
plot.covariance_func(kernel, diagrams='../slides/diagrams/kern/')
\includesvgclass{../slides/diagrams/kern/mlp_covariance.svg} |
|
- |
\includesvgclass{../slides/diagrams/logo/gpss-logo.svg} |
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
- Work by Eric Meissner and Zhenwen Dai. - Probabilistic programming. - Available on [Github](https://github.com/amzn/mxfusion) |
In [ ]:
<img class="" src="../slides/diagrams/mxfusion-logo.png" width="" height="auto" align="center" style="background:none; border:none; box-shadow:none; display:block; margin-left:auto; margin-right:auto;vertical-align:middle">
</div>
</td>
</tr>
</table>
Acknowledgments¶Stefanos Eleftheriadis, John Bronskill, Hugh Salimbeni, Rich Turner, Zhenwen Dai, Javier Gonzalez, Andreas Damianou, Mark Pullin, Michael Smith, James Hensman, John Quinn, Martin Mubangizi. Thanks!¶
References {#references .unnumbered}¶ |