import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
%load_ext watermark
az.style.use('arviz-darkgrid')
Approximate Bayesian Computation methods (also called likelihood free inference methods), are a group of techniques developed for inferring posterior distributions in cases where the likelihood function is intractable or costly to evaluate. This does not mean that the likelihood function is not part of the analysis, rather that it is not directly evaluated.
ABC comes useful when modelling complex phenomena in certain fields of study, like systems biology. Such models often contain unobservable random quantities, which make the likelihood function hard to specify, but data can be simulated from the model.
These methods follow a general form:
1- Sample a parameter $\theta^*$ from a prior/proposal distribution $\pi(\theta)$.
2- Simulate a data set $y^*$ using a function that takes $\theta$ and returns a data set of the same dimensions as the observed data set $y_0$ (simulator).
3- Compare the simulated dataset $y^*$ with the experimental data set $y_0$ using a distance function $d$ and a tolerance threshold $\epsilon$.
In some cases a distance function is computed between two summary statistics $d(S(y_0), S(y^*))$, avoiding the issue of computing distances for entire datasets.
As a result we obtain a sample of parameters from a distribution $\pi(\theta | d(y_0, y^*)) \leqslant \epsilon$.
If $\epsilon$ is sufficiently small this distribution will be a good approximation of the posterior distribution $\pi(\theta | y_0)$.
Sequential monte carlo ABC is a method that iteratively morphs the prior into a posterior by propagating the sampled parameters through a series of proposal distributions $\phi(\theta^{(i)})$, weighting the accepted parameters $\theta^{(i)}$ like:
$$ w^{(i)} \propto \frac{\pi(\theta^{(i)})}{\phi(\theta^{(i)})} $$It combines the advantages of traditional SMC, i.e. ability to sample from distributions with multiple peaks, but without the need for evaluating the likelihood function.
(Lintusaari, 2016), (Toni, T., 2008), (Nuñez, Prangle, 2015)
Estimating the mean and standard deviation of normal data
data = np.random.normal(loc=0, scale=1, size=1000)
def normal_sim(a, b):
return np.random.normal(a, b, 1000)
with pm.Model() as example:
a = pm.Normal("a", mu=0, sd=5)
b = pm.HalfNormal("b", sd=1)
s = pm.Simulator("s", normal_sim, params=(a, b), observed=np.sort(data))
trace_example = pm.sample_smc(kernel="ABC", sum_stat="sorted")
Sample initial stage: ... /dependencies/pymc3/pymc3/smc/smc.py:134: UserWarning: Warning: SMC-ABC methods are experimental step methods and not yet recommended for use in PyMC3! warnings.warn(EXPERIMENTAL_WARNING) Stage: 0 Beta: 0.000 Steps: 25 Acce: 1.000 Stage: 1 Beta: 0.002 Steps: 25 Acce: 0.493 Stage: 2 Beta: 0.008 Steps: 6 Acce: 0.393 Stage: 3 Beta: 0.026 Steps: 9 Acce: 0.343 Stage: 4 Beta: 0.087 Steps: 10 Acce: 0.385 Stage: 5 Beta: 0.288 Steps: 9 Acce: 0.307 Stage: 6 Beta: 0.883 Steps: 12 Acce: 0.266 Stage: 7 Beta: 1.000 Steps: 14 Acce: 0.171
az.plot_trace(trace_example);
/dependencies/arviz/arviz/data/io_pymc3.py:89: FutureWarning: Using `from_pymc3` without the model will be deprecated in a future release. Not using the model will return less accurate and less useful results. Make sure you use the model argument or call from_pymc3 within a model context. FutureWarning,
az.summary(trace_example)
/dependencies/arviz/arviz/data/io_pymc3.py:89: FutureWarning: Using `from_pymc3` without the model will be deprecated in a future release. Not using the model will return less accurate and less useful results. Make sure you use the model argument or call from_pymc3 within a model context. FutureWarning, arviz.stats.stats_utils - WARNING - Shape validation failed: input_shape: (1, 1000), minimum_shape: (chains=2, draws=4) arviz.stats.stats_utils - WARNING - Shape validation failed: input_shape: (1, 1000), minimum_shape: (chains=2, draws=4)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_mean | ess_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|---|---|
a | -0.013 | 0.045 | -0.089 | 0.072 | 0.001 | 0.001 | 1004.0 | 1004.0 | 999.0 | 1024.0 | NaN |
b | 1.040 | 0.039 | 0.961 | 1.109 | 0.001 | 0.001 | 1029.0 | 1029.0 | 1024.0 | 947.0 | NaN |
_, ax = plt.subplots(figsize=(10, 4))
az.plot_kde(data, label="True data", ax=ax, plot_kwargs={"color": "C2"})
az.plot_kde(normal_sim(trace_example["a"].mean(), trace_example["b"].mean()), ax=ax)
for i in np.random.randint(0, 500, 25):
az.plot_kde(
normal_sim(trace_example["a"][i], trace_example["b"][i]),
ax=ax,
plot_kwargs={"zorder": 0, "alpha": 0.2},
)
ax.legend();
In this example we will try to find parameters for the Lotka-Volterra equations. A common biological competition model for describing how the number of individuals of each species changes when there is a predator/prey interaction (A Biologist’s Guide to Mathematical Modeling in Ecology and Evolution,Otto and Day, 2007). For example, rabbits and foxes. Given an initial population number for each species, the integration of this ordinary differential equations (ODE) describes curves for the progression of both populations. This ODE’s takes four parameters:
This example is based on the Scipy Lokta-Volterra Tutorial.
from scipy.integrate import odeint
First we will generate data using known parameters.
# Definition of parameters
a = 1.0
b = 0.1
c = 1.5
d = 0.75
# initial population of rabbits and foxes
X0 = [10.0, 5.0]
# size of data
size = 100
# time lapse
time = 15
t = np.linspace(0, time, size)
# Lotka - Volterra equation
def dX_dt(X, t, a, b, c, d):
""" Return the growth rate of fox and rabbit populations. """
return np.array([a * X[0] - b * X[0] * X[1], -c * X[1] + d * b * X[0] * X[1]])
This model is based on a simulator, a function that returns data in the same dimensions as the observed data. In this case, the function solves the ODE.
# simulator function
def competition_model(a, b):
return odeint(dX_dt, y0=X0, t=t, rtol=0.1, args=(a, b, c, d))
Using the simulator function we will obtain a dataset with some noise added, for using it as observed data.
# function for generating noisy data to be used as observed data.
def add_noise(a, b, c, d):
noise = np.random.normal(size=(size, 2))
simulated = competition_model(a, b)
simulated += noise
indexes = np.sort(np.random.randint(low=0, high=size, size=size))
return simulated[indexes]
# plotting observed data.
observed = add_noise(a, b, c, d)
_, ax = plt.subplots(figsize=(12, 4))
ax.plot(observed[:, 0], "x", label="prey")
ax.plot(observed[:, 1], "x", label="predator")
ax.set_xlabel("time")
ax.set_ylabel("population")
ax.set_title("Observed data")
ax.legend();
On this model, instead of specifyng a likelihood function, we use pm.Simulator()
, a "container" that stores the simulator function and the observed data. During sampling, samples from a and b priors will be passed to the simulator function.
with pm.Model() as model:
a = pm.Normal("a", mu=1, sd=5)
b = pm.Normal("b", mu=1, sd=5)
simulator = pm.Simulator(
"simulator", competition_model, params=(a, b), observed=observed
)
trace = pm.sample_smc(kernel="ABC", epsilon=20)
Sample initial stage: ... /dependencies/pymc3/pymc3/smc/smc.py:134: UserWarning: Warning: SMC-ABC methods are experimental step methods and not yet recommended for use in PyMC3! warnings.warn(EXPERIMENTAL_WARNING) /env/miniconda3/lib/python3.7/site-packages/scipy/integrate/odepack.py:248: ODEintWarning: Repeated error test failures (internal error). Run with full_output = 1 to get quantitative information. warnings.warn(warning_msg, ODEintWarning) /env/miniconda3/lib/python3.7/site-packages/scipy/integrate/odepack.py:248: ODEintWarning: Repeated convergence failures (perhaps bad Jacobian or tolerances). Run with full_output = 1 to get quantitative information. warnings.warn(warning_msg, ODEintWarning) Stage: 0 Beta: 0.042 Steps: 25 Acce: 1.000 Stage: 1 Beta: 0.080 Steps: 25 Acce: 0.419 Stage: 2 Beta: 0.120 Steps: 8 Acce: 0.243 Stage: 3 Beta: 0.190 Steps: 16 Acce: 0.103 Stage: 4 Beta: 0.280 Steps: 25 Acce: 0.043 Stage: 5 Beta: 0.463 Steps: 25 Acce: 0.032 Stage: 6 Beta: 1.000 Steps: 25 Acce: 0.045
az.plot_trace(trace);
/dependencies/arviz/arviz/data/io_pymc3.py:89: FutureWarning: Using `from_pymc3` without the model will be deprecated in a future release. Not using the model will return less accurate and less useful results. Make sure you use the model argument or call from_pymc3 within a model context. FutureWarning,
az.plot_posterior(trace);
/dependencies/arviz/arviz/data/io_pymc3.py:89: FutureWarning: Using `from_pymc3` without the model will be deprecated in a future release. Not using the model will return less accurate and less useful results. Make sure you use the model argument or call from_pymc3 within a model context. FutureWarning,
# plot results
_, ax = plt.subplots(figsize=(14, 6))
ax.plot(observed[:, 0], "x", label="prey", c="C0")
ax.plot(observed[:, 1], "x", label="predator", c="C1")
ax.plot(competition_model(trace["a"].mean(), trace["b"].mean()), linewidth=2.5)
for i in np.random.randint(0, size, 75):
ax.plot(
competition_model(trace["a"][i], trace["b"][i])[:, 0],
alpha=0.1,
c="C2",
zorder=0,
)
ax.plot(
competition_model(trace["a"][i], trace["b"][i])[:, 1],
alpha=0.1,
c="C3",
zorder=0,
)
ax.set_xlabel("time")
ax.set_ylabel("population")
ax.legend();
%watermark -n -u -v -iv -w
pymc3 3.9.0 numpy 1.18.5 arviz 0.8.3 last updated: Fri Jun 12 2020 CPython 3.7.7 IPython 7.15.0 watermark 2.0.2