Abstract: Gaussian process models provide a flexible, non-parametric approach to modelling that sustains uncertainty about the function. However, computational demands and the joint Gaussian assumption make them inappropriate for some applications. In this talk we review low rank approximations for Gaussian processes and use stochastic process composition to create non-Gaussian processes. We illustrate the models on simple regression tasks to give a sense of how uncertainty propagates through the model. We end will demonstrations on unsupervised learning of digits and motion capture data.
$$ \newcommand{\Amatrix}{\mathbf{A}} \newcommand{\KL}[2]{\text{KL}\left( #1\,\|\,#2 \right)} \newcommand{\Kaast}{\kernelMatrix_{\mathbf{ \ast}\mathbf{ \ast}}} \newcommand{\Kastu}{\kernelMatrix_{\mathbf{ \ast} \inducingVector}} \newcommand{\Kff}{\kernelMatrix_{\mappingFunctionVector \mappingFunctionVector}} \newcommand{\Kfu}{\kernelMatrix_{\mappingFunctionVector \inducingVector}} \newcommand{\Kuast}{\kernelMatrix_{\inducingVector \bf\ast}} \newcommand{\Kuf}{\kernelMatrix_{\inducingVector \mappingFunctionVector}} \newcommand{\Kuu}{\kernelMatrix_{\inducingVector \inducingVector}} \newcommand{\Kuui}{\Kuu^{-1}} \newcommand{\Qaast}{\mathbf{Q}_{\bf \ast \ast}} \newcommand{\Qastf}{\mathbf{Q}_{\ast \mappingFunction}} \newcommand{\Qfast}{\mathbf{Q}_{\mappingFunctionVector \bf \ast}} \newcommand{\Qff}{\mathbf{Q}_{\mappingFunctionVector \mappingFunctionVector}} \newcommand{\aMatrix}{\mathbf{A}} \newcommand{\aScalar}{a} \newcommand{\aVector}{\mathbf{a}} \newcommand{\acceleration}{a} \newcommand{\bMatrix}{\mathbf{B}} \newcommand{\bScalar}{b} \newcommand{\bVector}{\mathbf{b}} \newcommand{\basisFunc}{\phi} \newcommand{\basisFuncVector}{\boldsymbol{ \basisFunc}} \newcommand{\basisFunction}{\phi} \newcommand{\basisLocation}{\mu} \newcommand{\basisMatrix}{\boldsymbol{ \Phi}} \newcommand{\basisScalar}{\basisFunction} \newcommand{\basisVector}{\boldsymbol{ \basisFunction}} \newcommand{\activationFunction}{\phi} \newcommand{\activationMatrix}{\boldsymbol{ \Phi}} \newcommand{\activationScalar}{\basisFunction} \newcommand{\activationVector}{\boldsymbol{ \basisFunction}} \newcommand{\bigO}{\mathcal{O}} \newcommand{\binomProb}{\pi} \newcommand{\cMatrix}{\mathbf{C}} \newcommand{\cbasisMatrix}{\hat{\boldsymbol{ \Phi}}} \newcommand{\cdataMatrix}{\hat{\dataMatrix}} \newcommand{\cdataScalar}{\hat{\dataScalar}} \newcommand{\cdataVector}{\hat{\dataVector}} \newcommand{\centeredKernelMatrix}{\mathbf{ \MakeUppercase{\centeredKernelScalar}}} \newcommand{\centeredKernelScalar}{b} \newcommand{\centeredKernelVector}{\centeredKernelScalar} \newcommand{\centeringMatrix}{\mathbf{H}} \newcommand{\chiSquaredDist}[2]{\chi_{#1}^{2}\left(#2\right)} \newcommand{\chiSquaredSamp}[1]{\chi_{#1}^{2}} \newcommand{\conditionalCovariance}{\boldsymbol{ \Sigma}} \newcommand{\coregionalizationMatrix}{\mathbf{B}} \newcommand{\coregionalizationScalar}{b} \newcommand{\coregionalizationVector}{\mathbf{ \coregionalizationScalar}} \newcommand{\covDist}[2]{\text{cov}_{#2}\left(#1\right)} \newcommand{\covSamp}[1]{\text{cov}\left(#1\right)} \newcommand{\covarianceScalar}{c} \newcommand{\covarianceVector}{\mathbf{ \covarianceScalar}} \newcommand{\covarianceMatrix}{\mathbf{C}} \newcommand{\covarianceMatrixTwo}{\boldsymbol{ \Sigma}} \newcommand{\croupierScalar}{s} \newcommand{\croupierVector}{\mathbf{ \croupierScalar}} \newcommand{\croupierMatrix}{\mathbf{ \MakeUppercase{\croupierScalar}}} \newcommand{\dataDim}{p} \newcommand{\dataIndex}{i} \newcommand{\dataIndexTwo}{j} \newcommand{\dataMatrix}{\mathbf{Y}} \newcommand{\dataScalar}{y} \newcommand{\dataSet}{\mathcal{D}} \newcommand{\dataStd}{\sigma} \newcommand{\dataVector}{\mathbf{ \dataScalar}} \newcommand{\decayRate}{d} \newcommand{\degreeMatrix}{\mathbf{ \MakeUppercase{\degreeScalar}}} \newcommand{\degreeScalar}{d} \newcommand{\degreeVector}{\mathbf{ \degreeScalar}} % Already defined by latex %\newcommand{\det}[1]{\left|#1\right|} \newcommand{\diag}[1]{\text{diag}\left(#1\right)} \newcommand{\diagonalMatrix}{\mathbf{D}} \newcommand{\diff}[2]{\frac{\text{d}#1}{\text{d}#2}} \newcommand{\diffTwo}[2]{\frac{\text{d}^2#1}{\text{d}#2^2}} \newcommand{\displacement}{x} \newcommand{\displacementVector}{\textbf{\displacement}} \newcommand{\distanceMatrix}{\mathbf{ \MakeUppercase{\distanceScalar}}} \newcommand{\distanceScalar}{d} \newcommand{\distanceVector}{\mathbf{ \distanceScalar}} \newcommand{\eigenvaltwo}{\ell} \newcommand{\eigenvaltwoMatrix}{\mathbf{L}} \newcommand{\eigenvaltwoVector}{\mathbf{l}} \newcommand{\eigenvalue}{\lambda} \newcommand{\eigenvalueMatrix}{\boldsymbol{ \Lambda}} \newcommand{\eigenvalueVector}{\boldsymbol{ \lambda}} \newcommand{\eigenvector}{\mathbf{ \eigenvectorScalar}} \newcommand{\eigenvectorMatrix}{\mathbf{U}} \newcommand{\eigenvectorScalar}{u} \newcommand{\eigenvectwo}{\mathbf{v}} \newcommand{\eigenvectwoMatrix}{\mathbf{V}} \newcommand{\eigenvectwoScalar}{v} \newcommand{\entropy}[1]{\mathcal{H}\left(#1\right)} \newcommand{\errorFunction}{E} \newcommand{\expDist}[2]{\left<#1\right>_{#2}} \newcommand{\expSamp}[1]{\left<#1\right>} \newcommand{\expectation}[1]{\left\langle #1 \right\rangle } \newcommand{\expectationDist}[2]{\left\langle #1 \right\rangle _{#2}} \newcommand{\expectedDistanceMatrix}{\mathcal{D}} \newcommand{\eye}{\mathbf{I}} \newcommand{\fantasyDim}{r} \newcommand{\fantasyMatrix}{\mathbf{ \MakeUppercase{\fantasyScalar}}} \newcommand{\fantasyScalar}{z} \newcommand{\fantasyVector}{\mathbf{ \fantasyScalar}} \newcommand{\featureStd}{\varsigma} \newcommand{\gammaCdf}[3]{\mathcal{GAMMA CDF}\left(#1|#2,#3\right)} \newcommand{\gammaDist}[3]{\mathcal{G}\left(#1|#2,#3\right)} \newcommand{\gammaSamp}[2]{\mathcal{G}\left(#1,#2\right)} \newcommand{\gaussianDist}[3]{\mathcal{N}\left(#1|#2,#3\right)} \newcommand{\gaussianSamp}[2]{\mathcal{N}\left(#1,#2\right)} \newcommand{\given}{|} \newcommand{\half}{\frac{1}{2}} \newcommand{\heaviside}{H} \newcommand{\hiddenMatrix}{\mathbf{ \MakeUppercase{\hiddenScalar}}} \newcommand{\hiddenScalar}{h} \newcommand{\hiddenVector}{\mathbf{ \hiddenScalar}} \newcommand{\identityMatrix}{\eye} \newcommand{\inducingInputScalar}{z} \newcommand{\inducingInputVector}{\mathbf{ \inducingInputScalar}} \newcommand{\inducingInputMatrix}{\mathbf{Z}} \newcommand{\inducingScalar}{u} \newcommand{\inducingVector}{\mathbf{ \inducingScalar}} \newcommand{\inducingMatrix}{\mathbf{U}} \newcommand{\inlineDiff}[2]{\text{d}#1/\text{d}#2} \newcommand{\inputDim}{q} \newcommand{\inputMatrix}{\mathbf{X}} \newcommand{\inputScalar}{x} \newcommand{\inputSpace}{\mathcal{X}} \newcommand{\inputVals}{\inputVector} \newcommand{\inputVector}{\mathbf{ \inputScalar}} \newcommand{\iterNum}{k} \newcommand{\kernel}{\kernelScalar} \newcommand{\kernelMatrix}{\mathbf{K}} \newcommand{\kernelScalar}{k} \newcommand{\kernelVector}{\mathbf{ \kernelScalar}} \newcommand{\kff}{\kernelScalar_{\mappingFunction \mappingFunction}} \newcommand{\kfu}{\kernelVector_{\mappingFunction \inducingScalar}} \newcommand{\kuf}{\kernelVector_{\inducingScalar \mappingFunction}} \newcommand{\kuu}{\kernelVector_{\inducingScalar \inducingScalar}} \newcommand{\lagrangeMultiplier}{\lambda} \newcommand{\lagrangeMultiplierMatrix}{\boldsymbol{ \Lambda}} \newcommand{\lagrangian}{L} \newcommand{\laplacianFactor}{\mathbf{ \MakeUppercase{\laplacianFactorScalar}}} \newcommand{\laplacianFactorScalar}{m} \newcommand{\laplacianFactorVector}{\mathbf{ \laplacianFactorScalar}} \newcommand{\laplacianMatrix}{\mathbf{L}} \newcommand{\laplacianScalar}{\ell} \newcommand{\laplacianVector}{\mathbf{ \ell}} \newcommand{\latentDim}{q} \newcommand{\latentDistanceMatrix}{\boldsymbol{ \Delta}} \newcommand{\latentDistanceScalar}{\delta} \newcommand{\latentDistanceVector}{\boldsymbol{ \delta}} \newcommand{\latentForce}{f} \newcommand{\latentFunction}{u} \newcommand{\latentFunctionVector}{\mathbf{ \latentFunction}} \newcommand{\latentFunctionMatrix}{\mathbf{ \MakeUppercase{\latentFunction}}} \newcommand{\latentIndex}{j} \newcommand{\latentScalar}{z} \newcommand{\latentVector}{\mathbf{ \latentScalar}} \newcommand{\latentMatrix}{\mathbf{Z}} \newcommand{\learnRate}{\eta} \newcommand{\lengthScale}{\ell} \newcommand{\rbfWidth}{\ell} \newcommand{\likelihoodBound}{\mathcal{L}} \newcommand{\likelihoodFunction}{L} \newcommand{\locationScalar}{\mu} \newcommand{\locationVector}{\boldsymbol{ \locationScalar}} \newcommand{\locationMatrix}{\mathbf{M}} \newcommand{\variance}[1]{\text{var}\left( #1 \right)} \newcommand{\mappingFunction}{f} \newcommand{\mappingFunctionMatrix}{\mathbf{F}} \newcommand{\mappingFunctionTwo}{g} \newcommand{\mappingFunctionTwoMatrix}{\mathbf{G}} \newcommand{\mappingFunctionTwoVector}{\mathbf{ \mappingFunctionTwo}} \newcommand{\mappingFunctionVector}{\mathbf{ \mappingFunction}} \newcommand{\scaleScalar}{s} \newcommand{\mappingScalar}{w} \newcommand{\mappingVector}{\mathbf{ \mappingScalar}} \newcommand{\mappingMatrix}{\mathbf{W}} \newcommand{\mappingScalarTwo}{v} \newcommand{\mappingVectorTwo}{\mathbf{ \mappingScalarTwo}} \newcommand{\mappingMatrixTwo}{\mathbf{V}} \newcommand{\maxIters}{K} \newcommand{\meanMatrix}{\mathbf{M}} \newcommand{\meanScalar}{\mu} \newcommand{\meanTwoMatrix}{\mathbf{M}} \newcommand{\meanTwoScalar}{m} \newcommand{\meanTwoVector}{\mathbf{ \meanTwoScalar}} \newcommand{\meanVector}{\boldsymbol{ \meanScalar}} \newcommand{\mrnaConcentration}{m} \newcommand{\naturalFrequency}{\omega} \newcommand{\neighborhood}[1]{\mathcal{N}\left( #1 \right)} \newcommand{\neilurl}{http://inverseprobability.com/} \newcommand{\noiseMatrix}{\boldsymbol{ E}} \newcommand{\noiseScalar}{\epsilon} \newcommand{\noiseVector}{\boldsymbol{ \epsilon}} \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \newcommand{\normalizedLaplacianMatrix}{\hat{\mathbf{L}}} \newcommand{\normalizedLaplacianScalar}{\hat{\ell}} \newcommand{\normalizedLaplacianVector}{\hat{\mathbf{ \ell}}} \newcommand{\numActive}{m} \newcommand{\numBasisFunc}{m} \newcommand{\numComponents}{m} \newcommand{\numComps}{K} \newcommand{\numData}{n} \newcommand{\numFeatures}{K} \newcommand{\numHidden}{h} \newcommand{\numInducing}{m} \newcommand{\numLayers}{\ell} \newcommand{\numNeighbors}{K} \newcommand{\numSequences}{s} \newcommand{\numSuccess}{s} \newcommand{\numTasks}{m} \newcommand{\numTime}{T} \newcommand{\numTrials}{S} \newcommand{\outputIndex}{j} \newcommand{\paramVector}{\boldsymbol{ \theta}} \newcommand{\parameterMatrix}{\boldsymbol{ \Theta}} \newcommand{\parameterScalar}{\theta} \newcommand{\parameterVector}{\boldsymbol{ \parameterScalar}} \newcommand{\partDiff}[2]{\frac{\partial#1}{\partial#2}} \newcommand{\precisionScalar}{j} \newcommand{\precisionVector}{\mathbf{ \precisionScalar}} \newcommand{\precisionMatrix}{\mathbf{J}} \newcommand{\pseudotargetScalar}{\widetilde{y}} \newcommand{\pseudotargetVector}{\mathbf{ \pseudotargetScalar}} \newcommand{\pseudotargetMatrix}{\mathbf{ \widetilde{Y}}} \newcommand{\rank}[1]{\text{rank}\left(#1\right)} \newcommand{\rayleighDist}[2]{\mathcal{R}\left(#1|#2\right)} \newcommand{\rayleighSamp}[1]{\mathcal{R}\left(#1\right)} \newcommand{\responsibility}{r} \newcommand{\rotationScalar}{r} \newcommand{\rotationVector}{\mathbf{ \rotationScalar}} \newcommand{\rotationMatrix}{\mathbf{R}} \newcommand{\sampleCovScalar}{s} \newcommand{\sampleCovVector}{\mathbf{ \sampleCovScalar}} \newcommand{\sampleCovMatrix}{\mathbf{s}} \newcommand{\scalarProduct}[2]{\left\langle{#1},{#2}\right\rangle} \newcommand{\sign}[1]{\text{sign}\left(#1\right)} \newcommand{\sigmoid}[1]{\sigma\left(#1\right)} \newcommand{\singularvalue}{\ell} \newcommand{\singularvalueMatrix}{\mathbf{L}} \newcommand{\singularvalueVector}{\mathbf{l}} \newcommand{\sorth}{\mathbf{u}} \newcommand{\spar}{\lambda} \newcommand{\trace}[1]{\text{tr}\left(#1\right)} \newcommand{\BasalRate}{B} \newcommand{\DampingCoefficient}{C} \newcommand{\DecayRate}{D} \newcommand{\Displacement}{X} \newcommand{\LatentForce}{F} \newcommand{\Mass}{M} \newcommand{\Sensitivity}{S} \newcommand{\basalRate}{b} \newcommand{\dampingCoefficient}{c} \newcommand{\mass}{m} \newcommand{\sensitivity}{s} \newcommand{\springScalar}{\kappa} \newcommand{\springVector}{\boldsymbol{ \kappa}} \newcommand{\springMatrix}{\boldsymbol{ \mathcal{K}}} \newcommand{\tfConcentration}{p} \newcommand{\tfDecayRate}{\delta} \newcommand{\tfMrnaConcentration}{f} \newcommand{\tfVector}{\mathbf{ \tfConcentration}} \newcommand{\velocity}{v} \newcommand{\sufficientStatsScalar}{g} \newcommand{\sufficientStatsVector}{\mathbf{ \sufficientStatsScalar}} \newcommand{\sufficientStatsMatrix}{\mathbf{G}} \newcommand{\switchScalar}{s} \newcommand{\switchVector}{\mathbf{ \switchScalar}} \newcommand{\switchMatrix}{\mathbf{S}} \newcommand{\tr}[1]{\text{tr}\left(#1\right)} \newcommand{\loneNorm}[1]{\left\Vert #1 \right\Vert_1} \newcommand{\ltwoNorm}[1]{\left\Vert #1 \right\Vert_2} \newcommand{\onenorm}[1]{\left\vert#1\right\vert_1} \newcommand{\twonorm}[1]{\left\Vert #1 \right\Vert} \newcommand{\vScalar}{v} \newcommand{\vVector}{\mathbf{v}} \newcommand{\vMatrix}{\mathbf{V}} \newcommand{\varianceDist}[2]{\text{var}_{#2}\left( #1 \right)} % Already defined by latex %\newcommand{\vec}{#1:} \newcommand{\vecb}[1]{\left(#1\right):} \newcommand{\weightScalar}{w} \newcommand{\weightVector}{\mathbf{ \weightScalar}} \newcommand{\weightMatrix}{\mathbf{W}} \newcommand{\weightedAdjacencyMatrix}{\mathbf{A}} \newcommand{\weightedAdjacencyScalar}{a} \newcommand{\weightedAdjacencyVector}{\mathbf{ \weightedAdjacencyScalar}} \newcommand{\onesVector}{\mathbf{1}} \newcommand{\zerosVector}{\mathbf{0}} $$$=f\Bigg($$\Bigg)$
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
Inference in a GP has the following demands:
Complexity: $\bigO(\numData^3)$ Storage: $\bigO(\numData^2)$
Inference in a low rank GP has the following demands:
Complexity: $\bigO(\numData\numInducing^2)$ Storage: $\bigO(\numData\numInducing)$
where $\numInducing$ is a user chosen parameter.
@Snelson:pseudo05,@Quinonero:unifying05,@Lawrence:larger07,@Titsias:variational09,@Thang:unifying17
[\small{[edit]}]{style="text-align:right"} ### Variational Compression
[\small{[edit]}]{style="text-align:right"}
We’ve seen how we go from parametric to non-parametric.
The limit implies infinite dimensional $\mappingVector$.
Gaussian processes are generally non-parametric: combine data with covariance function to get model.
This representation cannot be summarized by a parameter vector of a fixed size.
Parametric models have a representation that does not respond to increasing training set size.
Bayesian posterior distributions over parameters contain the information about the training data.
- Use Bayes’ rule from training data,
$p\left(\mappingVector|\dataVector, \inputMatrix\right)$,
- Make predictions on test data
$$p\left(\dataScalar_*|\inputMatrix_*, \dataVector, \inputMatrix\right) = \int
p\left(\dataScalar_*|\mappingVector,\inputMatrix_*\right)p\left(\mappingVector|\dataVector,
\inputMatrix)\text{d}\mappingVector\right).$$
$\mappingVector$ becomes a bottleneck for information about the training set to pass to the test set.
Solution: increase $\numBasisFunc$ so that the bottleneck is so large that it no longer presents a problem.
How big is big enough for $\numBasisFunc$? Non-parametrics says $\numBasisFunc \rightarrow \infty$.
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
to augment our model $$ \dataScalar(\inputVector) = \mappingFunction(\inputVector) + \noiseScalar, $$ giving $$ p(\dataVector) = \int p(\dataVector|\mappingFunctionVector) p(\mappingFunctionVector) \text{d}\mappingFunctionVector, $$ where for the independent case we have $p(\dataVector | \mappingFunctionVector) = \prod_{i=1}^\numData p(\dataScalar_i|\mappingFunction_i)$.
Integrating over $\mappingFunctionVector$ $$ p(\dataVector) = \int p(\dataVector|\inducingVector) p(\inducingVector) \text{d}\inducingVector. $$
but $\inducingVector$ is not a parameter
Unfortunately computing $p(\dataVector|\inducingVector)$ is intractable
[\small{[edit]}]{style="text-align:right"}
Maximizing lower bound minimizes the KL divergence (information gain): $$ \KL{p(\mappingFunctionVector|\inducingVector)}{p(\mappingFunctionVector|\dataVector, \inducingVector)} = \int p(\mappingFunctionVector|\inducingVector) \log \frac{p(\mappingFunctionVector|\inducingVector)}{p(\mappingFunctionVector|\dataVector, \inducingVector)}\text{d}\inducingVector $$
This is minimized when the information stored about $\dataVector$ is stored already in $\inducingVector$.
The bound seeks an optimal compression from the information gain perspective.
If $\inducingVector = \mappingFunctionVector$ bound is exact ($\mappingFunctionVector$ $d$-separates $\dataVector$ from $\inducingVector$).
with $$ \kernelMatrix = \begin{bmatrix} \Kff & \Kfu \\ \Kuf & \Kuu \end{bmatrix} $$
[\small{[edit]}]{style="text-align:right"}
import matplotlib.pyplot as plt
import mlai
import teaching_plots as plot
from gp_tutorial import gpplot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2)
xlim = ax.get_xlim()
ylim = ax.get_ylim()
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/sparse-demo-full-gp.svg',
transparent=True, frameon=True)
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2, xlim=xlim, ylim=ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/sparse-demo-constrained-inducing-6-unlearned-gp.svg',
transparent=True, frameon=True)
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2, xlim=xlim, ylim=ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/sparse-demo-constrained-inducing-6-learned-gp.svg',
transparent=True, frameon=True)
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2,xlim=xlim, ylim=ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/sparse-demo-unconstrained-inducing-6-gp.svg',
transparent=True, frameon=True)
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2, xlim=xlim, ylim=ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/sparse-demo-sparse-inducing-8-gp.svg',
transparent=True, frameon=True)
[\small{[edit]}]{style="text-align:right"} ### Leads to Other Approximations ...
\catdoc
[[@Hensman:bigdata13]]{style="text-align:left"} | []{style="text-align:right"} |
[[@Hensman:bigdata13]]{style="text-align:left"} | []{style="text-align:right"} |
A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation @Thang:unifying17
Deep Gaussian Processes and Variational Propagation of Uncertainty @Damianou:thesis2015
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
MacKay: NeurIPS Tutorial 1997 “Have we thrown out the baby with the bathwater?” [Published as @MacKay:gpintroduction98]
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
plot.deep_nn(diagrams='../slides/diagrams/deepgp/')
[\small{[edit]}]{style="text-align:right"} ### Overfitting
Potential problem: if number of nodes in two adjacent layers is big, corresponding $\mappingMatrix$ is also very big and there is the potential to overfit.
Proposed solution: “dropout”.
Alternative solution: parameterize $\mappingMatrix$ with its SVD. $$ \mappingMatrix = \eigenvectorMatrix\eigenvalueMatrix\eigenvectwoMatrix^\top $$ or $$ \mappingMatrix = \eigenvectorMatrix\eigenvectwoMatrix^\top $$ where if $\mappingMatrix \in \Re^{k_1\times k_2}$ then $\eigenvectorMatrix\in \Re^{k_1\times q}$ and $\eigenvectwoMatrix \in \Re^{k_2\times q}$, i.e. we have a low rank matrix factorization for the weights.
plot.low_rank_approximation(diagrams='../slides/diagrams')
plot.deep_nn_bottleneck(diagrams='../slides/diagrams/deepgp')
The network can now be written mathematically as $$ \begin{align} \latentVector_{1} &= \eigenvectwoMatrix^\top_1 \inputVector\\ \hiddenVector_{1} &= \basisFunction\left(\eigenvectorMatrix_1 \latentVector_{1}\right)\\ \latentVector_{2} &= \eigenvectwoMatrix^\top_2 \hiddenVector_{1}\\ \hiddenVector_{2} &= \basisFunction\left(\eigenvectorMatrix_2 \latentVector_{2}\right)\\ \latentVector_{3} &= \eigenvectwoMatrix^\top_3 \hiddenVector_{2}\\ \hiddenVector_{3} &= \basisFunction\left(\eigenvectorMatrix_3 \latentVector_{3}\right)\\ \dataVector &= \mappingVector_4^\top\hiddenVector_{3}. \end{align} $$
Replace each neural network with a Gaussian process $$ \begin{align} \latentVector_{1} &= \mappingFunctionVector_1\left(\inputVector\right)\\ \latentVector_{2} &= \mappingFunctionVector_2\left(\latentVector_{1}\right)\\ \latentVector_{3} &= \mappingFunctionVector_3\left(\latentVector_{2}\right)\\ \dataVector &= \mappingFunctionVector_4\left(\latentVector_{3}\right) \end{align} $$
Equivalent to prior over parameters, take width of each layer to infinity.
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three locally-connected layers and two fully-connected layers. Color illustrates feature maps produced at each layer. The net includes more than 120 million parameters, where more than 95% come from the local and fully connected.]{.fragment .fade-in}
[Source: DeepFace [@Taigman:deepface14]]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
import pods
from ipywidgets import IntSlider
pods.notebook.display_plots('pinball{sample:0>3}.svg',
'../slides/diagrams',
sample=IntSlider(1, 1, 2, 1))
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
from matplotlib import rc
rc("font", **{'family':'sans-serif','sans-serif':['Helvetica'],'size':30})
rc("text", usetex=True)
pgm = plot.horizontal_chain(depth=5)
pgm.render().figure.savefig("../slides/diagrams/deepgp/deep-markov.svg", transparent=True)
from matplotlib import rc
rc("font", **{'family':'sans-serif','sans-serif':['Helvetica'], 'size':15})
rc("text", usetex=True)
pgm = plot.vertical_chain(depth=5)
pgm.render().figure.savefig("../slides/diagrams/deepgp/deep-markov-vertical.svg", transparent=True)
Gaussian processes give priors over functions.
Elegant properties:
e.g. Derivatives of process are also Gaussian distributed (if they exist).
For particular covariance functions they are ‘universal approximators’, i.e. all functions can have support under the prior.
Gaussian derivatives might ring alarm bells.
E.g. a priori they don’t believe in function ‘jumps’.
From a process perspective: process composition.
A (new?) way of constructing more complex processes based on simpler components.
pgm = plot.vertical_chain(depth=5, shape=[2, 7])
pgm.add_node(daft.Node('y_2', r'$\mathbf{y}_2$', 1.5, 3.5, observed=True))
pgm.add_edge('f_2', 'y_2')
pgm.render().figure.savefig("../slides/diagrams/deepgp/deep-markov-vertical-side.svg", transparent=True)
import teaching_plots as plot
plot.non_linear_difficulty_plot_3(diagrams='../../slides/diagrams/dimred/')
Propagate a probability distribution through a non-linear mapping.
Normalisation of distribution becomes intractable.
plot.non_linear_difficulty_plot_2(diagrams='../../slides/diagrams/dimred/')
Propagate a probability distribution through a non-linear mapping.
Normalisation of distribution becomes intractable.
plot.non_linear_difficulty_plot_1(diagrams='../../slides/diagrams/dimred')
Propagate a probability distribution through a non-linear mapping.
Normalisation of distribution becomes intractable.
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
import teaching_plots as plot
plot.stack_gp_sample(kernel=GPy.kern.Linear,
diagrams="../../slides/diagrams/deepgp")
import pods
pods.notebook.display_plots('stack-gp-sample-Linear-{sample:0>1}.svg',
directory='../../slides/diagrams/deepgp', sample=(0,4))
plot.stack_gp_sample(kernel=GPy.kern.RBF,
diagrams="../../slides/diagrams/deepgp")
pods.notebook.display_plots('stack-gp-sample-RBF-{sample:0>1}.svg',
directory='../../slides/diagrams/deepgp', sample=(0,4))
Avoiding pathologies in very deep networks @Duvenaud:pathologies14 show that the derivative distribution of the process becomes more heavy tailed as number of layers increase.
How Deep Are Deep Gaussian Processes? @Dunlop:deep2017 perform a theoretical analysis possible through conditional Gaussian Markov property.
from IPython.lib.display import YouTubeVideo
YouTubeVideo('XhIvygQYFFQ')
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
- Gold medal times for Olympic Marathon since 1896. - Marathons before 1924 didn’t have a standardised distance. - Present results using pace per km. - In 1904 Marathon was badly organised leading to very slow times. |
Image from Wikimedia Commons http://bit.ly/16kMKHQ |
import matplotlib.pyplot as plt
import teaching_plots as plot
import mlai
xlim = (1875,2030)
ylim = (2.5, 6.5)
yhat = (y-offset)/scale
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=10)
ax.set_xlabel('year', fontsize=20)
ax.set_ylabel('pace min/km', fontsize=20)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/datasets/olympic-marathon.svg',
transparent=True,
frameon=True)
[\small{[edit]}]{style="text-align:right"}
|
|
import teaching_plots as plot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/olympic-marathon-gp.svg',
transparent=True, frameon=True)
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km',
fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp.svg',
transparent=True, frameon=True)
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax,
xlabel='year', ylabel='pace min/km', portion = 0.225)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp-samples.svg',
transparent=True, frameon=True)
m.visualize(scale=scale, offset=offset, xlabel='year',
ylabel='pace min/km',xlim=xlim, ylim=ylim,
dataset='olympic-marathon',
diagrams='../slides/diagrams/deepgp')
import pods
pods.notebook.display_plots('olympic-marathon-deep-gp-layer-{sample:0>1}.svg',
'../slides/diagrams/deepgp', sample=(0,1))
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
m.visualize_pinball(ax=ax, scale=scale, offset=offset, points=30, portion=0.1,
xlabel='year', ylabel='pace km/min', vertical=True)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp-pinball.svg',
transparent=True, frameon=True)
import matplotlib.pyplot as plt
import teaching_plots as plot
import mlai
xlim = (-20,260)
ylim = (5, 7.5)
yhat = (y-offset)/scale
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=10)
ax.set_xlabel('time/min', fontsize=20)
ax.set_ylabel('expression', fontsize=20)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/datasets/della-gatta-gene.svg',
transparent=True,
frameon=True)
import teaching_plots as plot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/della-gatta-gene-gp.svg',
transparent=True, frameon=True)
import teaching_plots as plot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full2, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full2.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/della-gatta-gene-gp2.svg',
transparent=True, frameon=True)
import teaching_plots as plot
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full3, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full3.log_likelihood()), fontsize=20)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/della-gatta-gene-gp3.svg',
transparent=True, frameon=True)
import teaching_plots as plot
plot.multiple_optima(diagrams='../slides/diagrams/gp')
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, scale=scale, offset=offset, ax=ax, fontsize=20, portion=0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(filename='../slides/diagrams/deepgp/della-gatta-gene-deep-gp.svg',
transparent=True, frameon=True)
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax, portion = 0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/della-gatta-gene-deep-gp-samples.svg',
transparent=True, frameon=True)
m.visualize(offset=offset, scale=scale, xlim=xlim, ylim=ylim,
dataset='della-gatta-gene',
diagrams='../slides/diagrams/deepgp')
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
m.visualize_pinball(offset=offset, ax=ax, scale=scale, xlim=xlim, ylim=ylim, portion=0.1, points=50)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/della-gatta-gene-deep-gp-pinball.svg',
transparent=True, frameon=True, ax=ax)
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=10)
_ = ax.set_xlabel('$x$', fontsize=20)
_ = ax.set_ylabel('$y$', fontsize=20)
xlim = (-2, 2)
ylim = (-0.6, 1.6)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(figure=fig, filename='../../slides/diagrams/datasets/step-function.svg',
transparent=True, frameon=True)
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, scale=scale, offset=offset, ax=ax, fontsize=20, portion=0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(figure=fig,filename='../slides/diagrams/gp/step-function-gp.svg',
transparent=True, frameon=True)
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, scale=scale, offset=offset, ax=ax, fontsize=20, portion=0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(filename='../slides/diagrams/deepgp/step-function-deep-gp.svg',
transparent=True, frameon=True)
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax, portion = 0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/step-function-deep-gp-samples.svg',
transparent=True, frameon=True)
m.visualize(offset=offset, scale=scale, xlim=xlim, ylim=ylim,
dataset='step-function',
diagrams='../slides/diagrams/deepgp')
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
m.visualize_pinball(offset=offset, ax=ax, scale=scale, xlim=xlim, ylim=ylim, portion=0.1, points=50)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/step-function-deep-gp-pinball.svg',
transparent=True, frameon=True, ax=ax)
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=10)
_ = ax.set_xlabel('time', fontsize=20)
_ = ax.set_ylabel('acceleration', fontsize=20)
xlim = (-20, 80)
ylim = (-175, 125)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(filename='../slides/diagrams/datasets/motorcycle-helmet.svg',
transparent=True, frameon=True)
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='time', ylabel='acceleration/$g$', fontsize=20, portion=0.5)
xlim=(-20,80)
ylim=(-180,120)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(figure=fig,filename='../../slides/diagrams/gp/motorcycle-helmet-gp.svg',
transparent=True, frameon=True)
import teaching_plots as plot
import mlai
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, scale=scale, offset=offset, ax=ax, xlabel='time', ylabel='acceleration/$g$', fontsize=20, portion=0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(filename='../slides/diagrams/deepgp/motorcycle-helmet-deep-gp.svg',
transparent=True, frameon=True)
import teaching_plots as plot
import mlai
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax, xlabel='time', ylabel='acceleration/$g$', portion = 0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/motorcycle-helmet-deep-gp-samples.svg',
transparent=True, frameon=True)
m.visualize(xlim=xlim, ylim=ylim, scale=scale,offset=offset,
xlabel="time", ylabel="acceleration/$g$", portion=0.5,
dataset='motorcycle-helmet',
diagrams='../slides/diagrams/deepgp')
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
m.visualize_pinball(ax=ax, xlabel='time', ylabel='acceleration/g',
points=50, scale=scale, offset=offset, portion=0.1)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/motorcycle-helmet-deep-gp-pinball.svg',
transparent=True, frameon=True)
fig, ax = plt.subplots(figsize=plot.big_figsize)
plt.plot(data['X'][:, 1], data['X'][:, 2], 'r.', markersize=5)
ax.set_xlabel('x position', fontsize=20)
ax.set_ylabel('y position', fontsize=20)
mlai.write_figure(figure=fig, filename='../../slides/diagrams/datasets/robot-wireless-ground-truth.svg', transparent=True, frameon=True)
output_dim=1
xlim = (-0.3, 1.3)
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x.flatten(), y[:, output_dim],
'r.', markersize=5)
ax.set_xlabel('time', fontsize=20)
ax.set_ylabel('signal strength', fontsize=20)
xlim = (-0.2, 1.2)
ylim = (-0.6, 2.0)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/datasets/robot-wireless-dim-' + str(output_dim) + '.svg',
transparent=True, frameon=True)
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, output_dim=output_dim, scale=scale, offset=offset, ax=ax,
xlabel='time', ylabel='signal strength', fontsize=20, portion=0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(filename='../slides/diagrams/gp/robot-wireless-gp-dim-' + str(output_dim)+ '.svg',
transparent=True, frameon=True)
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, output_dim=output_dim, scale=scale, offset=offset, ax=ax,
xlabel='time', ylabel='signal strength', fontsize=20, portion=0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/robot-wireless-deep-gp-dim-' + str(output_dim)+ '.svg',
transparent=True, frameon=True)
fig, ax=plt.subplots(figsize=plot.big_wide_figsize)
plot.model_sample(m, output_dim=output_dim, scale=scale, offset=offset, samps=10, ax=ax,
xlabel='time', ylabel='signal strength', fontsize=20, portion=0.5)
ax.set_ylim(ylim)
ax.set_xlim(xlim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/robot-wireless-deep-gp-samples-dim-' + str(output_dim)+ '.svg',
transparent=True, frameon=True)
fig, ax = plt.subplots(figsize=plot.big_figsize)
ax.plot(m.layers[-2].latent_space.mean[:, 0],
m.layers[-2].latent_space.mean[:, 1],
'r.-', markersize=5)
ax.set_xlabel('latent dimension 1', fontsize=20)
ax.set_ylabel('latent dimension 2', fontsize=20)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/robot-wireless-latent-space.svg',
transparent=True, frameon=True)
import matplotlib.pyplot as plt
from matplotlib import rc
import teaching_plots as plot
import mlai
rc("font", **{'family':'sans-serif','sans-serif':['Helvetica'],'size':20})
fig, ax = plt.subplots(figsize=plot.big_figsize)
for d in digits:
ax.plot(m.layer_1.X.mean[labels==d,0],m.layer_1.X.mean[labels==d,1],'.',label=str(d))
_ = plt.legend()
mlai.write_figure(figure=fig, filename="../slides/diagrams/deepgp/usps-digits-latent.svg", transparent=True)
import matplotlib.pyplot as plt
import mlai
fig, ax = plt.subplots(figsize=plot.big_figsize)
for i in range(5):
for j in range(i):
dims=[i, j]
ax.cla()
for d in digits:
ax.plot(m.obslayer.X.mean[labels==d,dims[0]],
m.obslayer.X.mean[labels==d,dims[1]],
'.', label=str(d))
plt.legend()
plt.xlabel('dimension ' + str(dims[0]))
plt.ylabel('dimension ' + str(dims[1]))
mlai.write_figure(figure=fig, filename="../slides/diagrams/deepgp/usps-digits-hidden-" + str(dims[0]) + '-' + str(dims[1]) + '.svg', transparent=True)
import matplotlib.pyplot as plt
import mlai
yt = m.predict(x)
fig, axs = plt.subplots(rows,cols,figsize=(10,6))
for i in range(rows):
for j in range(cols):
#v = np.random.normal(loc=yt[0][i*cols+j, :], scale=np.sqrt(yt[1][i*cols+j, :]))
v = yt[0][i*cols+j, :]
axs[i,j].imshow(v.reshape(28,28),
cmap='gray', interpolation='none',
aspect='equal')
axs[i,j].set_axis_off()
mlai.write_figure(figure=fig, filename="../slides/diagrams/deepgp/digit-samples-deep-gp.svg", transparent=True)
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
Work Adam Hirst, Software Engineering Intern and Cliff McCollum.
Tutorial on emulation.
Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known.
Designing an F1 Car requires CFD, Wind Tunnel, Track Testing etc.
How to combine them?
where $\textbf{u}_t$ is the action force, $\inputVector_t = (p_t, v_t)$ is the vehicle state
HTML(anim.to_jshtml())
mc.save_frames(frames,
diagrams='../slides/diagrams/uq',
filename='mountain_car_random.html')
HTML(anim.to_jshtml())
mc.save_frames(frames,
diagrams='../slides/diagrams/uq',
filename='mountain_car_simulated.html')
For standard Bayesian Optimization ignored dynamics of the car.
For more data efficiency, first emulate the dynamics.
Then do Bayesian optimization of the emulator.
Use a Gaussian process to model $$\Delta v_{t+1} = v_{t+1} - v_{t}$$ and $$\Delta x_{t+1} = p_{t+1} - p_{t}$$
Two processes, one with mean $v_{t}$ one with mean $p_{t}$
Used 500 randomly selected points to train emulators.
Can make proces smore efficient through experimental design.
control = mc.plot_control(velocity_model)
interact(control.plot_slices, control=(-1, 1, 0.05))
mc.emu_sim_comparison(env, controller_gains, [position_model, velocity_model],
max_steps=500, diagrams='../slides/diagrams/uq')
HTML(anim.to_jshtml())
mc.save_frames(frames,
diagrams='../slides/diagrams/uq',
filename='mountain_car_emulated.html')
Our emulator used only 500 calls to the simulator.
Optimizing the simulator directly required 37,500 calls to the simulator.
HTML(anim.to_jshtml())
mc.save_frames(frames,
diagrams='../slides/diagrams/uq',
filename='mountain_car_multi_fidelity.html')
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
[\small{[edit]}]{style="text-align:right"}
- Work by Eric Meissner and Zhenwen Dai. - Probabilistic programming. - Available on [Github](https://github.com/amzn/mxfusion) |
In [ ]:
<img class="" src="../slides/diagrams/mxfusion-logo.png" width="" height="auto" align="center" style="background:none; border:none; box-shadow:none; display:block; margin-left:auto; margin-right:auto;vertical-align:middle">
</div>
</td>
</tr>
</table>
MxFusion¶
Why another framework?¶
We needed both Key Requirements¶
Modularity¶
What does it look like?¶Modelling Inference Modelling¶Directed Graphs¶
Example¶In [ ]:
m = Model()
m.mu = Variable()
m.s = Variable(transformation=PositiveTransformation())
m.Y = Normal.define_variable(mean=m.mu, variance=m.s)
In [ ]:
infr = GradBasedInference(inference_algorithm=MAP(model=m, observed=[m.Y]))
infr.run(Y=data)
Modules¶
Long term Aim¶
Acknowledgments¶Stefanos Eleftheriadis, John Bronskill, Hugh Salimbeni, Rich Turner, Zhenwen Dai, Javier Gonzalez, Andreas Damianou, Mark Pullin, Eric Meissner. Thanks!¶
References {#references .unnumbered}¶ |