#!/usr/bin/env python
# coding: utf-8
# #### Towards Machine Learning Systems Design
#
# #### 2018-05-02
#
# #### Neil D. Lawrence
#
# #### Amazon Research Cambridge and University of Sheffield
#
# `@lawrennd` [inverseprobability.com](http://inverseprobability.com)
#
# ### What is Machine Learning?
#
# First of all, we'll consider the question, what is machine learning? By
# my definition Machine Learning is a combination of
#
# $$\text{data} + \text{model} \rightarrow \text{prediction}$$
#
# where *data* is our observations. They can be actively or passively
# acquired (meta-data). The *model* contains our assumptions, based on
# previous experience. THat experience can be other data, it can come from
# transfer learning, or it can merely be our beliefs about the
# regularities of the universe. In humans our models include our inductive
# biases. The *prediction* is an action to be taken or a categorization or
# a quality score. The reason that machine learning has become a mainstay
# of artificial intelligence is the importance of predictions in
# artificial intelligence.
#
# In practice we normally perform machine learning using two functions. To
# combine data with a model we typically make use of:
#
# **a prediction function** a function which is used to make the
# predictions. It includes our beliefs about the regularities of the
# universe, our assumptions about how the world works, e.g. smoothness,
# spatial similarities, temporal similarities.
#
# **an objective function** a function which defines the cost of
# misprediction. Typically it includes knowledge about the world's
# generating processes (probabilistic objectives) or the costs we pay for
# mispredictions (empiricial risk minimization).
#
# The combination of data and model through the prediction function and
# the objectie function leads to a *learning algorithm*. The class of
# prediction functions and objective functions we can make use of is
# restricted by the algorithms they lead to. If the prediction function or
# the objective function are too complex, then it can be difficult to find
# an appropriate learning algorithm.
#
# A useful reference for state of the art in machine learning is the UK
# Royal Society Report, [Machine Learning: Power and Promise of Computers
# that Learn by
# Example](https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf).
#
# You can also check my blog post on ["What is Machine
# Learning?"](http://inverseprobability.com/2017/07/17/what-is-machine-learning)
#
# ### Machine Learning as the Driver ...
#
# ... of two different domains
#
# 1. *Data Science*: arises from the fact that we now capture data by
# happenstance.
#
# 2. *Artificial Intelligence*: emulation of human behaviour.
#
# ### What does Machine Learning do?
#
# - ML Automates through Data
# - *Strongly* related to statistics.
# - Field underpins revolution in *data science* and *AI*
# - With AI: logic, *robotics*, *computer vision*, *speech*
# - With Data Science: *databases*, *data mining*, *statistics*,
# *visualization*
#
# ### "Embodiment Factors"
#
#
#
#
# |
#
#
# |
#
#
# |
#
#
#
# compute
# |
#
# \~10 gigaflops
# |
#
# \~ 1000 teraflops?
# |
#
#
#
# communicate
# |
#
# \~1 gigbit/s
# |
#
# \~ 100 bit/s
# |
#
# (compute/communicate)
# |
#
# 10
# |
#
# \~ 1013
#
# |
# See ["Living Together: Mind and Machine
# Intelligence"](https://arxiv.org/abs/1705.07996)
# In[ ]:
import pods
pods.notebook.display_plots('information-flow{sample:0>3}.svg',
'../slides/diagrams/data-science', sample=(1,3))
# ### What does Machine Learning do?
#
# - We scale by codifying processes and automating them.
# - Ensure components are compatible (Whitworth threads)
# - Then interconnect them as efficiently as possible.
# - cf Colt 45, Ford Model T
#
# ### Codify Through Mathematical Functions
#
# - How does machine learning work?
#
# - Jumper (jersey/sweater) purchase with logistic regression
#
# $$ \text{odds} = \frac{\text{bought}}{\text{not bought}} $$
#
# $$ \log \text{odds} = \beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}$$
#
# ### Codify Through Mathematical Functions
#
# - How does machine learning work?
#
# - Jumper (jersey/sweater) purchase with logistic regression
#
# $$ p(\text{bought}) = {f}\left(\beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}\right)$$
#
# ### Codify Through Mathematical Functions
#
# - How does machine learning work?
#
# - Jumper (jersey/sweater) purchase with logistic regression
#
# $$ p(\text{bought}) = {f}\left(\boldsymbol{\beta}^\top {{\bf {x}}}\right)$$
#
# . . .
#
# We call ${f}(\cdot)$ the *prediction function*
#
# ### Fit to Data
#
# - Use an objective function
#
# $${E}(\boldsymbol{\beta}, {\mathbf{Y}}, {{\bf X}})$$
#
# . . .
#
# - E.g. least squares
#
# $${E}(\boldsymbol{\beta}) = \sum_{i=1}^{n}\left({y}_i - {f}({{\bf {x}}}_i)\right)^2$$
#
# ### Two Components
#
# - Prediction function, ${f}(\cdot)$
#
# - Objective function, ${E}(\cdot)$
#
# ### Deep Learning
#
# - These are interpretable models: vital for disease etc.
#
# - Modern machine learning methods are less interpretable
#
# - Example: face recognition
#
# ###
#
# Outline of the DeepFace architecture. A front-end of a single
# convolution-pooling-convolution filtering on the rectified input,
# followed by three locally-connected layers and two fully-connected
# layers. Color illustrates feature maps produced at each layer. The net
# includes more than 120 million parameters, where more than 95% come from
# the local and fully connected.
#
#
#
#
# Source: DeepFace
#
# ###
#
#
#
# We can think of what these models are doing as being similar to early
# pin ball machines. In a neural network, we input a number (or numbers),
# whereas in pinball, we input a ball. The location of the ball on the
# left-right axis can be thought of as the number. As the ball falls
# through the machine, each layer of pins can be thought of as a different
# layer of neurons. Each layer acts to move the ball from left to right.
#
# In a pinball machine, when the ball gets to the bottom it might fall
# into a hole defining a score, in a neural network, that is equivalent to
# the decision: a classification of the input object.
#
# An image has more than one number associated with it, so it's like
# playing pinball in a *hyper-space*.
# In[ ]:
import pods
pods.notebook.display_plots('pinball{sample:0>3}.svg',
'../slides/diagrams', sample=(1,2))
# At initialization, the pins aren't in the right place to bring the ball
# to the correct decision.
#
# Learning involves moving all the pins to be in the right position, so
# that the ball falls in the right place. But moving all these pins in
# hyperspace can be difficult. In a hyper space you have to put a lot of
# data through the machine for to explore the positions of all the pins.
# Adversarial learning reflects the fact that a ball can be moved a small
# distance and lead to a very different result.
#
# Probabilistic methods explore more of the space by considering a range
# of possible paths for the ball through the machine.
# In[ ]:
import numpy as np
import teaching_plots as plot
# In[ ]:
get_ipython().run_line_magic('load', '-s compute_kernel mlai.py')
# In[ ]:
get_ipython().run_line_magic('load', '-s eq_cov mlai.py')
# In[ ]:
np.random.seed(10)
plot.rejection_samples(compute_kernel, kernel=eq_cov,
lengthscale=0.25, diagrams='../slides/diagrams/gp')
# In[ ]:
import pods
pods.notebook.display_plots('gp_rejection_samples{sample:0>3}.svg',
'../slides/diagrams/gp', sample=(1,5))
# In[ ]:
import pods
import matplotlib.pyplot as plt
# ### Olympic Marathon Data
#
# The first thing we will do is load a standard data set for regression
# modelling. The data consists of the pace of Olympic Gold Medal Marathon
# winners for the Olympics from 1896 to present. First we load in the data
# and plot.
# In[ ]:
data = pods.datasets.olympic_marathon_men()
x = data['X']
y = data['Y']
offset = y.mean()
scale = np.sqrt(y.var())
xlim = (1875,2030)
ylim = (2.5, 6.5)
yhat = (y-offset)/scale
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
_ = ax.plot(x, y, 'r.',markersize=10)
ax.set_xlabel('year', fontsize=20)
ax.set_ylabel('pace min/km', fontsize=20)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/datasets/olympic-marathon.svg', transparent=True, frameon=True)
# ### Olympic Marathon Data
#
#
#
#
# - Gold medal times for Olympic Marathon since 1896.
#
# - Marathons before 1924 didn’t have a standardised distance.
#
# - Present results using pace per km.
#
# - In 1904 Marathon was badly organised leading to very slow times.
#
# |
#
# ![image](../slides/diagrams/Stephen_Kiprotich.jpg) Image from
# Wikimedia Commons
# |
#
#
#
#
# Things to notice about the data include the outlier in 1904, in this
# year, the olympics was in St Louis, USA. Organizational problems and
# challenges with dust kicked up by the cars following the race meant that
# participants got lost, and only very few participants completed.
#
# More recent years see more consistently quick marathons.
#
# Our first objective will be to perform a Gaussian process fit to the
# data, we'll do this using the [GPy
# software](https://github.com/SheffieldML/GPy).
# In[ ]:
m_full = GPy.models.GPRegression(x,yhat)
_ = m_full.optimize() # Optimize parameters of covariance function
# The first command sets up the model, then
# In[ ]:
m_full.optimize()
# optimizes the parameters of the covariance function and the noise level
# of the model. Once the fit is complete, we'll try creating some test
# points, and computing the output of the GP model in terms of the mean
# and standard deviation of the posterior functions between 1870 and 2030.
# We plot the mean function and the standard deviation at 200 locations.
# We can obtain the predictions using
# In[ ]:
y_mean, y_var = m_full.predict(xt)
# In[ ]:
xt = np.linspace(1870,2030,200)[:,np.newaxis]
yt_mean, yt_var = m_full.predict(xt)
yt_sd=np.sqrt(yt_var)
# Now we plot the results using the helper function in `teaching_plots`.
# In[ ]:
import teaching_plots as plot
# In[ ]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km', fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig,
filename='../slides/diagrams/gp/olympic-marathon-gp.svg',
transparent=True, frameon=True)
#
#
# ### Fit Quality
#
# In the fit we see that the error bars (coming mainly from the noise
# variance) are quite large. This is likely due to the outlier point in
# 1904, ignoring that point we can see that a tighter fit is obtained. To
# see this making a version of the model, `m_clean`, where that point is
# removed.
# In[ ]:
x_clean=np.vstack((x[0:2, :], x[3:, :]))
y_clean=np.vstack((y[0:2, :], y[3:, :]))
m_clean = GPy.models.GPRegression(x_clean,y_clean)
_ = m_clean.optimize()
# ### Deep GP Fit
#
# Let's see if a deep Gaussian process can help here. We will construct a
# deep Gaussian process with one hidden layer (i.e. one Gaussian process
# feeding into another).
#
# Build a Deep GP with an additional hidden layer (one dimensional) to fit
# the model.
# In[ ]:
hidden = 1
m = deepgp.DeepGP([y.shape[1],hidden,x.shape[1]],Y=yhat, X=x, inits=['PCA','PCA'],
kernels=[GPy.kern.RBF(hidden,ARD=True),
GPy.kern.RBF(x.shape[1],ARD=True)], # the kernels for each layer
num_inducing=50, back_constraint=False)
# Deep Gaussian process models also can require some thought in
# initialization. Here we choose to start by setting the noise variance to
# be one percent of the data variance.
#
# Optimization requires moving variational parameters in the hidden layer
# representing the mean and variance of the expected values in that layer.
# Since all those values can be scaled up, and this only results in a
# downscaling in the output of the first GP, and a downscaling of the
# input length scale to the second GP. It makes sense to first of all fix
# the scales of the covariance function in each of the GPs.
#
# Sometimes, deep Gaussian processes can find a local minima which
# involves increasing the noise level of one or more of the GPs. This
# often occurs because it allows a minimum in the KL divergence term in
# the lower bound on the likelihood. To avoid this minimum we habitually
# train with the likelihood variance (the noise on the output of the GP)
# fixed to some lower value for some iterations.
#
# Let's create a helper function to initialize the models we use in the
# notebook.
# In[ ]:
def initialize(self, noise_factor=0.01, linear_factor=1):
"""Helper function for deep model initialization."""
self.obslayer.likelihood.variance = self.Y.var()*noise_factor
for layer in self.layers:
if type(layer.X) is GPy.core.parameterization.variational.NormalPosterior:
if layer.kern.ARD:
var = layer.X.mean.var(0)
else:
var = layer.X.mean.var()
else:
if layer.kern.ARD:
var = layer.X.var(0)
else:
var = layer.X.var()
# Average 0.5 upcrossings in four standard deviations.
layer.kern.lengthscale = linear_factor*np.sqrt(layer.kern.input_dim)*2*4*np.sqrt(var)/(2*np.pi)
# Bind the new method to the Deep GP object.
deepgp.DeepGP.initialize=initialize
# In[ ]:
# Call the initalization
m.initialize()
# Now optimize the model. The first stage of optimization is working on
# variational parameters and lengthscales only.
# In[ ]:
m.optimize(messages=False,max_iters=100)
# Now we remove the constraints on the scale of the covariance functions
# associated with each GP and optimize again.
# In[ ]:
for layer in m.layers:
pass #layer.kern.variance.constrain_positive(warning=False)
m.obslayer.kern.variance.constrain_positive(warning=False)
m.optimize(messages=False,max_iters=100)
# Finally, we allow the noise variance to change and optimize for a large
# number of iterations.
# In[ ]:
for layer in m.layers:
layer.likelihood.variance.constrain_positive(warning=False)
m.optimize(messages=True,max_iters=10000)
# For our optimization process we define a new function.
# In[ ]:
def staged_optimize(self, iters=(1000,1000,10000), messages=(False, False, True)):
"""Optimize with parameters constrained and then with parameters released"""
for layer in self.layers:
# Fix the scale of each of the covariance functions.
layer.kern.variance.fix(warning=False)
layer.kern.lengthscale.fix(warning=False)
# Fix the variance of the noise in each layer.
layer.likelihood.variance.fix(warning=False)
self.optimize(messages=messages[0],max_iters=iters[0])
for layer in self.layers:
layer.kern.lengthscale.constrain_positive(warning=False)
self.obslayer.kern.variance.constrain_positive(warning=False)
self.optimize(messages=messages[1],max_iters=iters[1])
for layer in self.layers:
layer.kern.variance.constrain_positive(warning=False)
layer.likelihood.variance.constrain_positive(warning=False)
self.optimize(messages=messages[2],max_iters=iters[2])
# Bind the new method to the Deep GP object.
deepgp.DeepGP.staged_optimize=staged_optimize
# In[ ]:
m.staged_optimize(messages=(True,True,True))
# ### Plot the prediction
#
# The prediction of the deep GP can be extracted in a similar way to the
# normal GP. Although, in this case, it is an approximation to the true
# distribution, because the true distribution is not Gaussian.
# In[ ]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_output(m, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km',
fontsize=20, portion=0.2)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp.svg',
transparent=True, frameon=True)
# ### Olympic Marathon Data Deep GP
#
#
# In[ ]:
def posterior_sample(self, X, **kwargs):
"""Give a sample from the posterior of the deep GP."""
Z = X
for i, layer in enumerate(reversed(self.layers)):
Z = layer.posterior_samples(Z, size=1, **kwargs)[:, :, 0]
return Z
deepgp.DeepGP.posterior_sample = posterior_sample
# In[ ]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax,
xlabel='year', ylabel='pace min/km', portion = 0.225)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp-samples.svg',
transparent=True, frameon=True)
# ### Olympic Marathon Data Deep GP {#olympic-marathon-data-deep-gp-1 data-transition="None"}
#
#
#
# ### Fitted GP for each layer
#
# Now we explore the GPs the model has used to fit each layer. First of
# all, we look at the hidden layer.
# In[ ]:
def visualize(self, scale=1.0, offset=0.0, xlabel='input', ylabel='output',
xlim=None, ylim=None, fontsize=20, portion=0.2,dataset=None,
diagrams='../diagrams'):
"""Visualize the layers in a deep GP with one-d input and output."""
depth = len(self.layers)
if dataset is None:
fname = 'deep-gp-layer'
else:
fname = dataset + '-deep-gp-layer'
filename = os.path.join(diagrams, fname)
last_name = xlabel
last_x = self.X
for i, layer in enumerate(reversed(self.layers)):
if i>0:
plt.plot(last_x, layer.X.mean, 'r.',markersize=10)
last_x=layer.X.mean
ax=plt.gca()
name = 'layer ' + str(i)
plt.xlabel(last_name, fontsize=fontsize)
plt.ylabel(name, fontsize=fontsize)
last_name=name
mlai.write_figure(filename=filename + '-' + str(i-1) + '.svg',
transparent=True, frameon=True)
if i==0 and xlim is not None:
xt = plot.pred_range(np.array(xlim), portion=0.0)
elif i>0:
xt = plot.pred_range(np.array(next_lim), portion=0.0)
else:
xt = plot.pred_range(last_x, portion=portion)
yt_mean, yt_var = layer.predict(xt)
if layer==self.obslayer:
yt_mean = yt_mean*scale + offset
yt_var *= scale*scale
yt_sd = np.sqrt(yt_var)
gpplot(xt,yt_mean,yt_mean-2*yt_sd,yt_mean+2*yt_sd)
ax = plt.gca()
if i>0:
ax.set_xlim(next_lim)
elif xlim is not None:
ax.set_xlim(xlim)
next_lim = plt.gca().get_ylim()
plt.plot(last_x, self.Y*scale + offset, 'r.',markersize=10)
plt.xlabel(last_name, fontsize=fontsize)
plt.ylabel(ylabel, fontsize=fontsize)
mlai.write_figure(filename=filename + '-' + str(i) + '.svg',
transparent=True, frameon=True)
if ylim is not None:
ax=plt.gca()
ax.set_ylim(ylim)
# Bind the new method to the Deep GP object.
deepgp.DeepGP.visualize=visualize
# In[ ]:
m.visualize(scale=scale, offset=offset, xlabel='year',
ylabel='pace min/km',xlim=xlim, ylim=ylim,
dataset='olympic-marathon',
diagrams='../slides/diagrams/deepgp')
# In[ ]:
import pods
pods.notebook.display_plots('olympic-marathon-deep-gp-layer-{sample:0>1}.svg',
'../slides/diagrams/deepgp', sample=(0,1))
# In[ ]:
def scale_data(x, portion):
scale = (x.max()-x.min())/(1-2*portion)
offset = x.min() - portion*scale
return (x-offset)/scale, scale, offset
def visualize_pinball(self, ax=None, scale=1.0, offset=0.0, xlabel='input', ylabel='output',
xlim=None, ylim=None, fontsize=20, portion=0.2, points=50, vertical=True):
"""Visualize the layers in a deep GP with one-d input and output."""
if ax is None:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
depth = len(self.layers)
last_name = xlabel
last_x = self.X
# Recover input and output scales from output plot
plot_model_output(self, scale=scale, offset=offset, ax=ax,
xlabel=xlabel, ylabel=ylabel,
fontsize=fontsize, portion=portion)
xlim=ax.get_xlim()
xticks=ax.get_xticks()
xtick_labels=ax.get_xticklabels().copy()
ylim=ax.get_ylim()
yticks=ax.get_yticks()
ytick_labels=ax.get_yticklabels().copy()
# Clear axes and start again
ax.cla()
if vertical:
ax.set_xlim((0, 1))
ax.invert_yaxis()
ax.set_ylim((depth, 0))
else:
ax.set_ylim((0, 1))
ax.set_xlim((0, depth))
ax.set_axis_off()#frame_on(False)
def pinball(x, y, y_std, color_scale=None,
layer=0, depth=1, ax=None,
alpha=1.0, portion=0.0, vertical=True):
scaledx, xscale, xoffset = scale_data(x, portion=portion)
scaledy, yscale, yoffset = scale_data(y, portion=portion)
y_std /= yscale
# Check whether data is anti-correlated on output
if np.dot((scaledx-0.5).T, (scaledy-0.5))<0:
scaledy=1-scaledy
flip=-1
else:
flip=1
if color_scale is not None:
color_scale, _, _=scale_data(color_scale, portion=0)
scaledy = (1-alpha)*scaledx + alpha*scaledy
def color_value(x, cmap=None, width=None, centers=None):
"""Return color as a function of position along x axis"""
if cmap is None:
cmap = np.asarray([[1, 0, 0], [1, 1, 0], [0, 1, 0]])
ncenters = cmap.shape[0]
if centers is None:
centers = np.linspace(0+0.5/ncenters, 1-0.5/ncenters, ncenters)
if width is None:
width = 0.25/ncenters
r = (x-centers)/width
weights = np.exp(-0.5*r*r).flatten()
weights /=weights.sum()
weights = weights[:, np.newaxis]
return np.dot(cmap.T, weights).flatten()
for i in range(x.shape[0]):
if color_scale is not None:
color = color_value(color_scale[i])
else:
color=(1, 0, 0)
x_plot = np.asarray((scaledx[i], scaledy[i])).flatten()
y_plot = np.asarray((layer, layer+alpha)).flatten()
if vertical:
ax.plot(x_plot, y_plot, color=color, alpha=0.5, linewidth=3)
ax.plot(x_plot, y_plot, color='k', alpha=0.5, linewidth=0.5)
else:
ax.plot(y_plot, x_plot, color=color, alpha=0.5, linewidth=3)
ax.plot(y_plot, x_plot, color='k', alpha=0.5, linewidth=0.5)
# Plot error bars that increase as sqrt of distance from start.
std_points = 50
stdy = np.linspace(0, alpha,std_points)
stdx = np.sqrt(stdy)*y_std[i]
stdy += layer
mean_vals = np.linspace(scaledx[i], scaledy[i], std_points)
upper = mean_vals+stdx
lower = mean_vals-stdx
fillcolor=color
x_errorbars=np.hstack((upper,lower[::-1]))
y_errorbars=np.hstack((stdy,stdy[::-1]))
if vertical:
ax.fill(x_errorbars,y_errorbars,
color=fillcolor, alpha=0.1)
ax.plot(scaledy[i], layer+alpha, '.',markersize=10, color=color, alpha=0.5)
else:
ax.fill(y_errorbars,x_errorbars,
color=fillcolor, alpha=0.1)
ax.plot(layer+alpha, scaledy[i], '.',markersize=10, color=color, alpha=0.5)
# Marker to show end point
return flip
# Whether final axis is flipped
flip = 1
first_x=last_x
for i, layer in enumerate(reversed(self.layers)):
if i==0:
xt = plot.pred_range(last_x, portion=portion, points=points)
color_scale=xt
yt_mean, yt_var = layer.predict(xt)
if layer==self.obslayer:
yt_mean = yt_mean*scale + offset
yt_var *= scale*scale
yt_sd = np.sqrt(yt_var)
flip = flip*pinball(xt,yt_mean,yt_sd,color_scale,portion=portion,
layer=i, ax=ax, depth=depth,vertical=vertical)#yt_mean-2*yt_sd,yt_mean+2*yt_sd)
xt = yt_mean
# Make room for axis labels
if vertical:
ax.set_ylim((2.1, -0.1))
ax.set_xlim((-0.02, 1.02))
ax.set_yticks(range(depth,0,-1))
else:
ax.set_xlim((-0.1, 2.1))
ax.set_ylim((-0.02, 1.02))
ax.set_xticks(range(0, depth))
def draw_axis(ax, scale=1.0, offset=0.0, level=0.0, flip=1,
label=None,up=False, nticks=10, ticklength=0.05,
vertical=True,
fontsize=20):
def clean_gap(gap):
nsf = np.log10(gap)
if nsf>0:
nsf = np.ceil(nsf)
else:
nsf = np.floor(nsf)
lower_gaps = np.asarray([0.005, 0.01, 0.02, 0.03, 0.04, 0.05,
0.1, 0.25, 0.5,
1, 1.5, 2, 2.5, 3, 4, 5, 10, 25, 50, 100])
upper_gaps = np.asarray([1, 2, 3, 4, 5, 10])
if nsf >2 or nsf<-2:
d = np.abs(gap-upper_gaps*10**nsf)
ind = np.argmin(d)
return upper_gaps[ind]*10**nsf
else:
d = np.abs(gap-lower_gaps)
ind = np.argmin(d)
return lower_gaps[ind]
tickgap = clean_gap(scale/(nticks-1))
nticks = int(scale/tickgap) + 1
tickstart = np.round(offset/tickgap)*tickgap
ticklabels = np.asarray(range(0, nticks))*tickgap + tickstart
ticks = (ticklabels-offset)/scale
axargs = {'color':'k', 'linewidth':1}
if not up:
ticklength=-ticklength
tickspot = np.linspace(0, 1, nticks)
if flip < 0:
ticks = 1-ticks
for tick, ticklabel in zip(ticks, ticklabels):
if vertical:
ax.plot([tick, tick], [level, level-ticklength], **axargs)
ax.text(tick, level-ticklength*2, ticklabel, horizontalalignment='center',
fontsize=fontsize/2)
ax.text(0.5, level-5*ticklength, label, horizontalalignment='center', fontsize=fontsize)
else:
ax.plot([level, level-ticklength], [tick, tick], **axargs)
ax.text(level-ticklength*2, tick, ticklabel, horizontalalignment='center',
fontsize=fontsize/2)
ax.text(level-5*ticklength, 0.5, label, horizontalalignment='center', fontsize=fontsize)
if vertical:
xlim = list(ax.get_xlim())
if ticks.min()xlim[1]:
xlim[1] = ticks.max()
ax.set_xlim(xlim)
ax.plot([ticks.min(), ticks.max()], [level, level], **axargs)
else:
ylim = list(ax.get_ylim())
if ticks.min()ylim[1]:
ylim[1] = ticks.max()
ax.set_ylim(ylim)
ax.plot([level, level], [ticks.min(), ticks.max()], **axargs)
_, xscale, xoffset = scale_data(first_x, portion=portion)
_, yscale, yoffset = scale_data(yt_mean, portion=portion)
draw_axis(ax=ax, scale=xscale, offset=xoffset, level=0.0, label=xlabel,
up=True, vertical=vertical)
draw_axis(ax=ax, scale=yscale, offset=yoffset,
flip=flip, level=depth, label=ylabel, up=False, vertical=vertical)
#for txt in xticklabels:
# txt.set
# Bind the new method to the Deep GP object.
deepgp.DeepGP.visualize_pinball=visualize_pinball
# In[ ]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
m.visualize_pinball(ax=ax, scale=scale, offset=offset, points=30, portion=0.1,
xlabel='year', ylabel='pace km/min', vertical=True)
mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp-pinball.svg',
transparent=True, frameon=True)
# ### Olympic Marathon Pinball Plot
#
#
#
# The pinball plot shows the flow of any input ball through the deep
# Gaussian process. In a pinball plot a series of vertical parallel lines
# would indicate a purely linear function. For the olypmic marathon data
# we can see the first layer begins to shift from input towards the right.
# Note it also does so with some uncertainty (indicated by the shaded
# backgrounds). The second layer has less uncertainty, but bunches the
# inputs more strongly to the right. This input layer of uncertainty,
# followed by a layer that pushes inputs to the right is what gives the
# heteroschedastic noise.
#
#
#
#
#
#
#
#
#
#
# ### Artificial Intelligence
#
# - Challenges in deploying AI.
#
# - Currently this is in the form of "machine learning systems"
#
# ### Internet of People
#
# - Fog computing: barrier between cloud and device blurring.
# - Computing on the Edge
# - Complex feedback between algorithm and implementation
#
# ### Deploying ML in Real World: Machine Learning Systems Design
#
# - Major new challenge for systems designers.
#
# - Internet of Intelligence but currently:
# - AI systems are *fragile*
#
# ### Fragility of AI Systems
#
# The way we are deploying artificial intelligence systems in practice is
# to build up systems of machine learning components. To build a machine
# learning system, we decompose the task into parts which we can emulate
# with ML methods. Each of these parts can be, typically, independently
# constructed and verified. For example, in a driverless car we can
# decompose the tasks into components such as "pedestrian detection" and
# "road line detection". Each of these components can be constructed with,
# for example, an independent classifier. We can then superimpose a logic
# on top. For example, "Follow the road line unless you detect a
# pedestrian in the road".
#
# This allows for verification of car performance, as long as we can
# verify the individual components. However, it also implies that the AI
# systems we deploy are *fragile*.
#
# ### Rapid Reimplementation
#
# ### Early AI
#
#
#
# ### Machine Learning Systems Design
#
#
#
# ### Adversaries
#
# - Stuxnet
# - Mischevious-Adversarial
#
# ### Turnaround And Update
#
# - There is a massive need for turn around and update
# - A redeploy of the entire system.
# - This involves changing the way we design and deploy.
# - Interface between security engineering and machine learning.
#
# ### Peppercorns
#
# - A new name for system failures which aren't bugs.
# - Difference between finding a fly in your soup vs a peppercorn in
# your soup.
#
#
# ### Uncertainty Quantification
#
# - Deep nets are powerful approach to images, speech, language.
#
# - Proposal: Deep GPs may also be a great approach, but better to
# deploy according to natural strengths.
#
# ### Uncertainty Quantification
#
# - Probabilistic numerics, surrogate modelling, emulation, and UQ.
#
# - Not a fan of AI as a term.
#
# - But we are faced with increasing amounts of *algorithmic decision
# making*.
#
# ### ML and Decision Making
#
# - When trading off decisions: compute or acquire data?
#
# - There is a critical need for uncertainty.
#
# ### Uncertainty Quantification
#
# > Uncertainty quantification (UQ) is the science of quantitative
# > characterization and reduction of uncertainties in both computational
# > and real world applications. It tries to determine how likely certain
# > outcomes are if some aspects of the system are not exactly known.
#
# - Interaction between physical and virtual worlds of major interest
# for Amazon.
#
# We will to illustrate different concepts of [Uncertainty
# Quantification](https://en.wikipedia.org/wiki/Uncertainty_quantification)
# (UQ) and the role that Gaussian processes play in this field. Based on a
# simple simulator of a car moving between a valley and a mountain, we are
# going to illustrate the following concepts:
#
# - **Systems emulation**. Many real world decisions are based on
# simulations that can be computationally very demanding. We will show
# how simulators can be replaced by *emulators*: Gaussian process
# models fitted on a few simulations that can be used to replace the
# *simulator*. Emulators are cheap to compute, fast to run, and always
# provide ways to quantify the uncertainty of how precise they are
# compared the original simulator.
#
# - **Emulators in optimization problems**. We will show how emulators
# can be used to optimize black-box functions that are expensive to
# evaluate. This field is also called Bayesian Optimization and has
# gained an increasing relevance in machine learning as emulators can
# be used to optimize computer simulations (and machine learning
# algorithms) quite efficiently.
#
# - **Multi-fidelity emulation methods**. In many scenarios we have
# simulators of different quality about the same measure of interest.
# In these cases the goal is to merge all sources of information under
# the same model so the final emulator is cheaper and more accurate
# than an emulator fitted only using data from the most accurate and
# expensive simulator.
#
# ### Example: Formula One Racing
#
# - Designing an F1 Car requires CFD, Wind Tunnel, Track Testing etc.
#
# - How to combine them?
#
# ### Mountain Car Simulator
#
# To illustrate the above mentioned concepts we we use the [mountain car
# simulator](https://github.com/openai/gym/wiki/MountainCarContinuous-v0).
# This simulator is widely used in machine learning to test reinforcement
# learning algorithms. The goal is to define a control policy on a car
# whose objective is to climb a mountain. Graphically, the problem looks
# as follows:
#
#
#
# The goal is to define a sequence of actions (push the car right or left
# with certain intensity) to make the car reach the flag after a number
# $T$ of time steps.
#
# At each time step $t$, the car is characterized by a vector
# ${{\bf {x}}}_{t} = (p_t,v_t)$ of states which are respectively the the
# position and velocity of the car at time $t$. For a sequence of states
# (an episode), the dynamics of the car is given by
#
# $${{\bf {x}}}_{t+1} = {f}({{\bf {x}}}_{t},\textbf{u}_{t})$$
#
# where $\textbf{u}_{t}$ is the value of an action force, which in this
# example corresponds to push car to the left (negative value) or to the
# right (positive value). The actions across a full episode are
# represented in a policy $\textbf{u}_{t} = \pi({{\bf {x}}}_{t},\theta)$
# that acts according to the current state of the car and some parameters
# $\theta$. In the following examples we will assume that the policy is
# linear which allows us to write $\pi({{\bf {x}}}_{t},\theta)$ as
#
# $$\pi({{\bf {x}}},\theta)= \theta_0 + \theta_p p + \theta_vv.$$
#
# For $t=1,\dots,T$ now given some initial state ${{\bf {x}}}_{0}$ and
# some some values of each $\textbf{u}_{t}$, we can **simulate** the full
# dynamics of the car for a full episode using
# [Gym](https://gym.openai.com/envs/). The values of $\textbf{u}_{t}$ are
# fully determined by the parameters of the linear controller.
#
# After each episode of length $T$ is complete, a reward function
# $R_{T}(\theta)$ is computed. In the mountain car example the reward is
# computed as 100 for reaching the target of the hill on the right hand
# side, minus the squared sum of actions (a real negative to push to the
# left and a real positive to push to the right) from start to goal. Note
# that our reward depend on $\theta$ as we make it dependent on the
# parameters of the linear controller.
#
# ### Emulate the Mountain Car
# In[ ]:
import gym
# In[ ]:
env = gym.make('MountainCarContinuous-v0')
# Our goal in this section is to find the parameters $\theta$ of the
# linear controller such that
#
# $$\theta^* = arg \max_{\theta} R_T(\theta).$$
#
# In this section, we directly use Bayesian optimization to solve this
# problem. We will use [GPyOpt](https://sheffieldml.github.io/GPyOpt/) so
# we first define the objective function:
# In[ ]:
import mountain_car as mc
import GPyOpt
# In[ ]:
obj_func = lambda x: mc.run_simulation(env, x)[0]
objective = GPyOpt.core.task.SingleObjective(obj_func)
# For each set of parameter values of the linear controller we can run an
# episode of the simulator (that we fix to have a horizon of $T=500$) to
# generate the reward. Using as input the parameters of the controller and
# as outputs the rewards we can build a Gaussian process emulator of the
# reward.
#
# We start defining the input space, which is three-dimensional:
# In[ ]:
## --- We define the input space of the emulator
space= [{'name':'postion_parameter', 'type':'continuous', 'domain':(-1.2, +1)},
{'name':'velocity_parameter', 'type':'continuous', 'domain':(-1/0.07, +1/0.07)},
{'name':'constant', 'type':'continuous', 'domain':(-1, +1)}]
design_space = GPyOpt.Design_space(space=space)
# Now we initizialize a Gaussian process emulator.
# In[ ]:
model = GPyOpt.models.GPModel(optimize_restarts=5, verbose=False, exact_feval=True, ARD=True)
# In Bayesian optimization an acquisition function is used to balance
# exploration and exploitation to evaluate new locations close to the
# optimum of the objective. In this notebook we select the expected
# improvement (EI). For further details have a look to the review paper of
# [Shahriari et al
# (2015)](http://www.cs.ox.ac.uk/people/nando.defreitas/publications/BayesOptLoop.pdf).
# In[ ]:
aquisition_optimizer = GPyOpt.optimization.AcquisitionOptimizer(design_space)
acquisition = GPyOpt.acquisitions.AcquisitionEI(model, design_space, optimizer=aquisition_optimizer)
evaluator = GPyOpt.core.evaluators.Sequential(acquisition) # Collect points sequentially, no parallelization.
# To initalize the model we start sampling some initial points (25) for
# the linear controler randomly.
# In[ ]:
from GPyOpt.experiment_design.random_design import RandomDesign
# In[ ]:
n_initial_points = 25
random_design = RandomDesign(design_space)
initial_design = random_design.get_samples(n_initial_points)
# Before we start any optimization, lets have a look to the behavior of
# the car with the first of these initial points that we have selected
# randomly.
# In[ ]:
import numpy as np
# In[ ]:
random_controller = initial_design[0,:]
_, _, _, frames = mc.run_simulation(env, np.atleast_2d(random_controller), render=True)
anim=mc.animate_frames(frames, 'Random linear controller')
# In[ ]:
from IPython.core.display import HTML
# In[ ]:
HTML(anim.to_jshtml())
# In[ ]:
mc.save_frames(frames,
diagrams='../slides/diagrams/uq',
filename='mountain_car_random.html')
#
# As we can see the random linear controller does not manage to push the
# car to the top of the mountain. Now, let's optimize the regret using
# Bayesian optimization and the emulator for the reward. We try 50 new
# parameters chosen by the EI.
# In[ ]:
max_iter = 50
bo = GPyOpt.methods.ModularBayesianOptimization(model, design_space, objective, acquisition, evaluator, initial_design)
bo.run_optimization(max_iter = max_iter )
# Now we visualize the result for the best controller that we have found
# with Bayesian optimization.
# In[ ]:
_, _, _, frames = mc.run_simulation(env, np.atleast_2d(bo.x_opt), render=True)
anim=mc.animate_frames(frames, 'Best controller after 50 iterations of Bayesian optimization')
# In[ ]:
HTML(anim.to_jshtml())
# In[ ]:
mc.save_frames(frames,
diagrams='../slides/diagrams/uq',
filename='mountain_car_simulated.html')
#
# he car can now make it to the top of the mountain! Emulating the reward
# function and using the EI helped as to find a linear controller that
# solves the problem.
#
# ### Data Efficient Emulation
#
# In the previous section we solved the mountain car problem by directly
# emulating the reward but no considerations about the dynamics
# ${{\bf {x}}}_{t+1} = {f}({{\bf {x}}}_{t},\textbf{u}_{t})$ of the system
# were made. Note that we had to run 75 episodes of 500 steps each to
# solve the problem, which required to call the simulator
# $500\times 75 =37500$ times. In this section we will show how it is
# possible to reduce this number by building an emulator for $f$ that can
# later be used to directly optimize the control.
#
# The inputs of the model for the dynamics are the velocity, the position
# and the value of the control so create this space accordingly.
# In[ ]:
import gym
# In[ ]:
env = gym.make('MountainCarContinuous-v0')
# In[ ]:
import GPyOpt
# In[ ]:
space_dynamics = [{'name':'position', 'type':'continuous', 'domain':[-1.2, +0.6]},
{'name':'velocity', 'type':'continuous', 'domain':[-0.07, +0.07]},
{'name':'action', 'type':'continuous', 'domain':[-1, +1]}]
design_space_dynamics = GPyOpt.Design_space(space=space_dynamics)
# The outputs are the velocity and the position. Indeed our model will
# capture the change in position and velocity on time. That is, we will
# model
#
# $$\Delta v_{t+1} = v_{t+1} - v_{t}$$
#
# $$\Delta x_{t+1} = p_{t+1} - p_{t}$$
#
# with Gaussian processes with prior mean $v_{t}$ and $p_{t}$
# respectively. As a covariance function, we use a Matern52. We need
# therefore two models to capture the full dynamics of the system.
# In[ ]:
position_model = GPyOpt.models.GPModel(optimize_restarts=5, verbose=False, exact_feval=True, ARD=True)
velocity_model = GPyOpt.models.GPModel(optimize_restarts=5, verbose=False, exact_feval=True, ARD=True)
# Next, we sample some input parameters and use the simulator to compute
# the outputs. Note that in this case we are not running the full
# episodes, we are just using the simulator to compute ${{\bf {x}}}_{t+1}$
# given ${{\bf {x}}}_{t}$ and $\textbf{u}_{t}$.
# In[ ]:
import numpy as np
from GPyOpt.experiment_design.random_design import RandomDesign
import mountain_car as mc
# In[ ]:
### --- Random locations of the inputs
n_initial_points = 500
random_design_dynamics = RandomDesign(design_space_dynamics)
initial_design_dynamics = random_design_dynamics.get_samples(n_initial_points)
# In[ ]:
### --- Simulation of the (normalized) outputs
y = np.zeros((initial_design_dynamics.shape[0], 2))
for i in range(initial_design_dynamics.shape[0]):
y[i, :] = mc.simulation(initial_design_dynamics[i, :])
# Normalize the data from the simulation
y_normalisation = np.std(y, axis=0)
y_normalised = y/y_normalisation
# In general we might use much smarter strategies to design our emulation
# of the simulator. For example, we could use the variance of the
# predictive distributions of the models to collect points using
# uncertainty sampling, which will give us a better coverage of the space.
# For simplicity, we move ahead with the 500 randomly selected points.
#
# Now that we have a data set, we can update the emulators for the
# location and the velocity.
# In[ ]:
position_model.updateModel(initial_design_dynamics, y[:, [0]], None, None)
velocity_model.updateModel(initial_design_dynamics, y[:, [1]], None, None)
# We can now have a look to how the emulator and the simulator match.
# First, we show a contour plot of the car aceleration for each pair of
# can position and velocity. You can use the bar bellow to play with the
# values of the controler to compare the emulator and the simulator.
# In[ ]:
from IPython.html.widgets import interact
# In[ ]:
control = mc.plot_control(velocity_model)
interact(control.plot_slices, control=(-1, 1, 0.05))
#
# We can see how the emulator is doing a fairly good job approximating the
# simulator. On the edges, however, it struggles to captures the dynamics
# of the simulator.
#
# Given some input parameters of the linear controlling, how do the
# dynamics of the emulator and simulator match? In the following figure we
# show the position and velocity of the car for the 500 time steps of an
# episode in which the parameters of the linear controller have been fixed
# beforehand. The value of the input control is also shown.
# In[ ]:
controller_gains = np.atleast_2d([0, .6, 1]) # change the valus of the linear controller to observe the trayectories.
# In[ ]:
mc.emu_sim_comparison(env, controller_gains, [position_model, velocity_model],
max_steps=500, diagrams='../slides/diagrams/uq')
#
#
# We now make explicit use of the emulator, using it to replace the
# simulator and optimize the linear controller. Note that in this
# optimization, we don't need to query the simulator anymore as we can
# reproduce the full dynamics of an episode using the emulator. For
# illustrative purposes, in this example we fix the initial location of
# the car.
#
# We define the objective reward function in terms of the simulator.
# In[ ]:
### --- Optimize control parameters with emulator
car_initial_location = np.asarray([-0.58912799, 0])
### --- Reward objective function using the emulator
obj_func_emulator = lambda x: mc.run_emulation([position_model, velocity_model], x, car_initial_location)[0]
objective_emulator = GPyOpt.core.task.SingleObjective(obj_func_emulator)
# And as before, we use Bayesian optimization to find the best possible
# linear controller.
# In[ ]:
### --- Elements of the optimization that will use the multi-fidelity emulator
model = GPyOpt.models.GPModel(optimize_restarts=5, verbose=False, exact_feval=True, ARD=True)
# The design space is the three continuous variables that make up the
# linear controller.
# In[ ]:
space= [{'name':'linear_1', 'type':'continuous', 'domain':(-1/1.2, +1)},
{'name':'linear_2', 'type':'continuous', 'domain':(-1/0.07, +1/0.07)},
{'name':'constant', 'type':'continuous', 'domain':(-1, +1)}]
design_space = GPyOpt.Design_space(space=space)
aquisition_optimizer = GPyOpt.optimization.AcquisitionOptimizer(design_space)
random_design = RandomDesign(design_space)
initial_design = random_design.get_samples(25)
# We set the acquisition function to be expected improvement using
# `GPyOpt`.
# In[ ]:
acquisition = GPyOpt.acquisitions.AcquisitionEI(model, design_space, optimizer=aquisition_optimizer)
evaluator = GPyOpt.core.evaluators.Sequential(acquisition)
# In[ ]:
bo_emulator = GPyOpt.methods.ModularBayesianOptimization(model, design_space, objective_emulator, acquisition, evaluator, initial_design)
bo_emulator.run_optimization(max_iter=50)
# In[ ]:
_, _, _, frames = mc.run_simulation(env, np.atleast_2d(bo_emulator.x_opt), render=True)
anim=mc.animate_frames(frames, 'Best controller using the emulator of the dynamics')
# In[ ]:
from IPython.core.display import HTML
# In[ ]:
HTML(anim.to_jshtml())
# In[ ]:
mc.save_frames(frames,
diagrams='../slides/diagrams/uq',
filename='mountain_car_emulated.html')
#
# And the problem is again solved, but in this case we have replaced the
# simulator of the car dynamics by a Gaussian process emulator that we
# learned by calling the simulator only 500 times. Compared to the 37500
# calls that we needed when applying Bayesian optimization directly on the
# simulator this is a great gain.
#
# In some scenarios we have simulators of the same environment that have
# different fidelities, that is that reflect with different level of
# accuracy the dynamics of the real world. Running simulations of the
# different fidelities also have a different cost: hight fidelity
# simulations are more expensive the cheaper ones. If we have access to
# these simulators we can combine high and low fidelity simulations under
# the same model.
#
# So let's assume that we have two simulators of the mountain car
# dynamics, one of high fidelity (the one we have used) and another one of
# low fidelity. The traditional approach to this form of multi-fidelity
# emulation is to assume that
#
# $${f}_i\left({{\bf {x}}}\right) = \rho{f}_{i-1}\left({{\bf {x}}}\right) + \delta_i\left({{\bf {x}}}\right)$$
#
# where ${f}_{i-1}\left({{\bf {x}}}\right)$ is a low fidelity simulation
# of the problem of interest and ${f}_i\left({{\bf {x}}}\right)$ is a
# higher fidelity simulation. The function
# $\delta_i\left({{\bf {x}}}\right)$ represents the difference between the
# lower and higher fidelity simulation, which is considered additive. The
# additive form of this covariance means that if
# ${f}_{0}\left({{\bf {x}}}\right)$ and
# $\left\{\delta_i\left({{\bf {x}}}\right)\right\}_{i=1}^m$ are all
# Gaussian processes, then the process over all fidelities of simuation
# will be a joint Gaussian process.
#
# But with Deep Gaussian processes we can consider the form
#
# $${f}_i\left({{\bf {x}}}\right) = {g}_{i}\left({f}_{i-1}\left({{\bf {x}}}\right)\right) + \delta_i\left({{\bf {x}}}\right),$$
#
# where the low fidelity representation is non linearly transformed by
# ${g}(\cdot)$ before use in the process. This is the approach taken in
# @Perdikaris:multifidelity17. But once we accept that these models can be
# composed, a highly flexible framework can emerge. A key point is that
# the data enters the model at different levels, and represents different
# aspects. For example these correspond to the two fidelities of the
# mountain car simulator.
#
# We start by sampling both of them at 250 random input locations.
# In[ ]:
import gym
# In[ ]:
env = gym.make('MountainCarContinuous-v0')
# In[ ]:
import GPyOpt
# In[ ]:
### --- Collect points from low and high fidelity simulator --- ###
space = GPyOpt.Design_space([
{'name':'position', 'type':'continuous', 'domain':(-1.2, +1)},
{'name':'velocity', 'type':'continuous', 'domain':(-0.07, +0.07)},
{'name':'action', 'type':'continuous', 'domain':(-1, +1)}])
n_points = 250
random_design = GPyOpt.experiment_design.RandomDesign(space)
x_random = random_design.get_samples(n_points)
# Next, we evaluate the high and low fidelity simualtors at those
# locations.
# In[ ]:
import numpy as np
import mountain_car as mc
# In[ ]:
d_position_hf = np.zeros((n_points, 1))
d_velocity_hf = np.zeros((n_points, 1))
d_position_lf = np.zeros((n_points, 1))
d_velocity_lf = np.zeros((n_points, 1))
# --- Collect high fidelity points
for i in range(0, n_points):
d_position_hf[i], d_velocity_hf[i] = mc.simulation(x_random[i, :])
# --- Collect low fidelity points
for i in range(0, n_points):
d_position_lf[i], d_velocity_lf[i] = mc.low_cost_simulation(x_random[i, :])
# It is time to build the multi-fidelity model for both the position and
# the velocity.
#
# As we did in the previous section we use the emulator to optimize the
# simulator. In this case we use the high fidelity output of the emulator.
# In[ ]:
### --- Optimize controller parameters
obj_func = lambda x: mc.run_simulation(env, x)[0]
obj_func_emulator = lambda x: mc.run_emulation([position_model, velocity_model], x, car_initial_location)[0]
objective_multifidelity = GPyOpt.core.task.SingleObjective(obj_func)
# And we optimize using Bayesian optimzation.
# In[ ]:
from GPyOpt.experiment_design.random_design import RandomDesign
# In[ ]:
model = GPyOpt.models.GPModel(optimize_restarts=5, verbose=False, exact_feval=True, ARD=True)
space= [{'name':'linear_1', 'type':'continuous', 'domain':(-1/1.2, +1)},
{'name':'linear_2', 'type':'continuous', 'domain':(-1/0.07, +1/0.07)},
{'name':'constant', 'type':'continuous', 'domain':(-1, +1)}]
design_space = GPyOpt.Design_space(space=space)
aquisition_optimizer = GPyOpt.optimization.AcquisitionOptimizer(design_space)
n_initial_points = 25
random_design = RandomDesign(design_space)
initial_design = random_design.get_samples(n_initial_points)
acquisition = GPyOpt.acquisitions.AcquisitionEI(model, design_space, optimizer=aquisition_optimizer)
evaluator = GPyOpt.core.evaluators.Sequential(acquisition)
# In[ ]:
bo_multifidelity = GPyOpt.methods.ModularBayesianOptimization(model, design_space, objective_multifidelity, acquisition, evaluator, initial_design)
bo_multifidelity.run_optimization(max_iter=50)
# In[ ]:
_, _, _, frames = mc.run_simulation(env, np.atleast_2d(bo_multifidelity.x_opt), render=True)
anim=mc.animate_frames(frames, 'Best controller with multi-fidelity emulator')
# In[ ]:
from IPython.core.display import HTML
# In[ ]:
HTML(anim.to_jshtml())
# In[ ]:
mc.save_frames(frames,
diagrams='../slides/diagrams/uq',
filename='mountain_car_multi_fidelity.html')
#
# And problem solved! We see how the problem is also solved with 250
# observations of the high fidelity simulator and 250 of the low fidelity
# simulator.
#
# ### Conclusion
#
# - Artificial Intelligence and Data Science are fundamentally
# different.
#
# - In one you are dealing with data collected by happenstance.
#
# - In the other you are trying to build systems in the real world,
# often by actively collecting data.
#
# - Our approaches to systems design are building powerful machines that
# will be deployed in evolving environments.
#
# ### Thanks!
#
# - twitter: @lawrennd
# - blog:
# [http://inverseprobability.com](http://inverseprobability.com/blog.html)
# - [Natural vs Artifical
# Intelligence](http://inverseprobability.com/2018/02/06/natural-and-artificial-intelligence)
# - [Mike Jordan's Medium
# Post](https://medium.com/@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7)