In [1]:
%pylab inline
rc('text', usetex=True)
rc('axes', labelsize=15, titlesize=15)

Populating the interactive namespace from numpy and matplotlib

In [2]:
import seaborn as sns
sns.set_style("white")
np.random.seed(10)


# ABC PMC on a 2D gaussian example¶

In this example we're looking at a dataset that has been drawn from a 2D gaussian distribution. We're going to assume that we don't have a proper likelihood but that we know the covariance matrix $\Sigma$ of the distribution. Using the ABC PMC algorithm we will approximate the posterior of the distribtion of the mean values.

First we generate a new dataset by drawing random variables from a mulitvariate gaussian around mean=[1.1, 1.5]. This is going to be our observed data set

In [3]:
samples_size = 1000
sigma = np.eye(2) * 0.25
means = [1.1, 1.5]

data = np.random.multivariate_normal(means, sigma, samples_size)
matshow(sigma)
title("covariance matrix sigma")
colorbar()

Out[3]:
<matplotlib.colorbar.Colorbar instance at 0x10c542950>
/Users/jakeret/Library/Python/2.7/lib/python/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors == str('face'):


Then we need to define our model/simulation. In this case this is simple: we draw again random variables from a multivariate gaussian distribution using the given mean and the sigma from above

In [4]:
def create_new_sample(theta):
return np.random.multivariate_normal(theta, sigma, samples_size)


Next, we need to define a distance measure. We will use the sum of the absolute differences of the means of the simulated and the observed data

In [5]:
def dist_measure(x, y):
return np.sum(np.abs(np.mean(x, axis=0) - np.mean(y, axis=0)))


### Verification¶

To verify if everything works and to see the effect of the random samples in the simulation we compute the distance for 1000 simulations at the true mean values

In [6]:
distances = [dist_measure(data, create_new_sample(means)) for _ in range(1000)]

In [7]:
sns.distplot(distances, axlabel="distances", )
title("Variablility of distance from simulations")

Out[7]:
<matplotlib.text.Text at 0x10c7317d0>

### Setup¶

Now we're going to set up the ABC PMC sampling

In [8]:
import abcpmc


As a prior we're going to use a gaussian prior using our best guess about the distribution of the means.

In [9]:
prior = abcpmc.GaussianPrior(mu=[1.0, 1.0], sigma=np.eye(2) * 0.5)


As threshold $\epsilon$ we're going to use the $\alpha^{th}$ percentile of the sorted distances of the particles of the current iteration. The simplest way to do this is to define a constant $\epsilon$ and iteratively adapt the theshold. As starting value we're going to define a sufficiently high value so that the acceptance ratio is reasonable and we will sample for T iterations

In [10]:
alpha = 75
T = 20
eps_start = 1.0
eps = abcpmc.ConstEps(T, eps_start)


Finally, we create an instance of your sampler. We want to use 5000 particles and the functions we defined above. Additionally we're going to make use of the built-in parallelization and use 7 cores for the sampling

In [11]:
sampler = abcpmc.Sampler(N=5000, Y=data, postfn=create_new_sample, dist=dist_measure, threads=7)


Optionally, we can customize the proposal creation. Here we're going to use a "Optimal Local Covariance Matrix"-kernel (OLCM) as proposed by (Fillipi et al. 2012). This has shown to yield a high acceptance ratio togheter with a faster decrease of the thresold.

In [12]:
sampler.particle_proposal_cls = abcpmc.OLCMParticleProposal


### Sampling¶

Now we're ready to sample. All we need to do is to iterate over the yielded values of your sampler instance. The sample function returns a namedtuple per iteration that contains all the information that we're interestend in

In [13]:
def launch():
eps = abcpmc.ConstEps(T, eps_start)

pools = []
for pool in sampler.sample(prior, eps):
print("T: {0}, eps: {1:>.4f}, ratio: {2:>.4f}".format(pool.t, eps(pool.eps), pool.ratio))

for i, (mean, std) in enumerate(zip(*abcpmc.weighted_avg_and_std(pool.thetas, pool.ws, axis=0))):
print(u"    theta[{0}]: {1:>.4f} \u00B1 {2:>.4f}".format(i, mean,std))

eps.eps = np.percentile(pool.dists, alpha) # reduce eps value
pools.append(pool)
sampler.close()
return pools

In [14]:
import time
t0 = time.time()
pools = launch()
print "took", (time.time() - t0)

T: 0, eps: 1.0000, ratio: 0.3907
theta[0]: 1.0648 ± 0.3761
theta[1]: 1.3485 ± 0.3746
T: 1, eps: 0.8351, ratio: 0.5106
theta[0]: 1.0694 ± 0.2965
theta[1]: 1.3742 ± 0.2963
T: 2, eps: 0.6666, ratio: 0.5363
theta[0]: 1.0802 ± 0.2421
theta[1]: 1.4044 ± 0.2410
T: 3, eps: 0.5282, ratio: 0.5230
theta[0]: 1.0808 ± 0.1926
theta[1]: 1.4349 ± 0.1924
T: 4, eps: 0.4158, ratio: 0.5268
theta[0]: 1.0863 ± 0.1570
theta[1]: 1.4595 ± 0.1524
T: 5, eps: 0.3299, ratio: 0.5270
theta[0]: 1.0855 ± 0.1257
theta[1]: 1.4696 ± 0.1248
T: 6, eps: 0.2624, ratio: 0.4958
theta[0]: 1.0926 ± 0.0993
theta[1]: 1.4822 ± 0.0999
T: 7, eps: 0.2085, ratio: 0.5031
theta[0]: 1.0904 ± 0.0780
theta[1]: 1.4843 ± 0.0805
T: 8, eps: 0.1663, ratio: 0.5006
theta[0]: 1.0906 ± 0.0633
theta[1]: 1.4885 ± 0.0652
T: 9, eps: 0.1322, ratio: 0.4851
theta[0]: 1.0905 ± 0.0520
theta[1]: 1.4903 ± 0.0516
T: 10, eps: 0.1062, ratio: 0.4743
theta[0]: 1.0919 ± 0.0430
theta[1]: 1.4917 ± 0.0417
T: 11, eps: 0.0846, ratio: 0.4652
theta[0]: 1.0915 ± 0.0346
theta[1]: 1.4931 ± 0.0347
T: 12, eps: 0.0683, ratio: 0.4381
theta[0]: 1.0920 ± 0.0286
theta[1]: 1.4931 ± 0.0290
T: 13, eps: 0.0563, ratio: 0.4155
theta[0]: 1.0923 ± 0.0250
theta[1]: 1.4943 ± 0.0254
T: 14, eps: 0.0463, ratio: 0.3605
theta[0]: 1.0920 ± 0.0224
theta[1]: 1.4944 ± 0.0219
T: 15, eps: 0.0386, ratio: 0.3187
theta[0]: 1.0924 ± 0.0200
theta[1]: 1.4945 ± 0.0193
T: 16, eps: 0.0319, ratio: 0.2739
theta[0]: 1.0919 ± 0.0181
theta[1]: 1.4948 ± 0.0179
T: 17, eps: 0.0271, ratio: 0.2259
theta[0]: 1.0917 ± 0.0174
theta[1]: 1.4946 ± 0.0166
T: 18, eps: 0.0231, ratio: 0.1805
theta[0]: 1.0917 ± 0.0164
theta[1]: 1.4948 ± 0.0159
T: 19, eps: 0.0197, ratio: 0.1425
theta[0]: 1.0922 ± 0.0157
theta[1]: 1.4949 ± 0.0158
took 149.579810143


### Postprocessing¶

How did the sampled values evolve over the iterations? As the threshold is decreasing we expect the errors to shrink while the means converge to the true means.

In [15]:
for i in range(len(means)):
moments = np.array([abcpmc.weighted_avg_and_std(pool.thetas[:,i], pool.ws, axis=0) for pool in pools])
errorbar(range(T), moments[:, 0], moments[:, 1])
hlines(means, 0, T, linestyle="dotted", linewidth=0.7)
_ = xlim([-.5, T])


How does the distribution of the distances look like after we have approximated the posterior? If we're close to the true posterior we expect to have a high bin count around the values we've found in the earlier distribution plot

In [16]:
distances = np.array([pool.dists for pool in pools]).flatten()
sns.distplot(distances, axlabel="distance")

Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x10c5682d0>

How did our $\epsilon$ values behave over the iterations? Using the $\alpha^{th}$ percentile causes the threshold to decrease relatively fast in the beginning and to plateau later on

In [17]:
eps_values = np.array([pool.eps for pool in pools])
plot(eps_values, label=r"$\epsilon$ values")
xlabel("Iteration")
ylabel(r"$\epsilon$")
legend(loc="best")

Out[17]:
<matplotlib.legend.Legend at 0x10d769110>

What about the acceptance ratio? ABC PMC with the OLCM kernel gives us a relatively high acceptance ratio.

In [18]:
acc_ratios = np.array([pool.ratio for pool in pools])
plot(acc_ratios, label="Acceptance ratio")
ylim([0, 1])
xlabel("Iteration")
ylabel("Acceptance ratio")
legend(loc="best")

Out[18]:
<matplotlib.legend.Legend at 0x10cc2e210>
In [19]:
%pylab inline
rc('text', usetex=True)
rc('axes', labelsize=15, titlesize=15)

Populating the interactive namespace from numpy and matplotlib


Finally what does our posterior look like? For the visualization we're using triangle.py (https://github.com/dfm/triangle.py)

In [20]:
import triangle
samples = np.vstack([pool.thetas for pool in pools])
fig = triangle.corner(samples, truths= means)

/Users/jakeret/Library/Python/2.7/lib/python/site-packages/matplotlib/collections.py:650: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors_original != str('face'):


Omitting the first couple of iterations..

In [21]:
idx = -1
samples = pools[idx].thetas
fig = triangle.corner(samples, weights=pools[idx].ws, truths= means)

for mean, std in zip(*abcpmc.weighted_avg_and_std(samples, pools[idx].ws, axis=0)):
print(u"mean: {0:>.4f} \u00B1 {1:>.4f}".format(mean,std))

mean: 1.0922 ± 0.0157
mean: 1.4949 ± 0.0158