In [2]:
%matplotlib inline
import pymc3 as pm
import numpy as np
import seaborn as sns


First, let's run an analysis of 100 binomial samples, with zero positive outcomes:

In [3]:
n1 = 100
x1 = 0

In [6]:
with pm.Model() as first_dataset:
θ = pm.Beta('θ', 1, 1)
x = pm.Binomial('x', n=n1, p=θ, observed=x1)

trace1 = pm.sample(2000)

Applied logodds-transform to θ and added transformed θ_logodds_ to model.
Assigned NUTS to θ_logodds_
100%|██████████| 2000/2000 [00:00<00:00, 2721.90it/s]


The parameter estimate is 0.012, with a credible interval of length 0.031.

In [7]:
pm.summary(trace1)

θ:

Mean             SD               MC Error         95% HPD interval
-------------------------------------------------------------------

0.012            0.025            0.001            [0.000, 0.031]

Posterior quantiles:
2.5            25             50             75             97.5
|--------------|==============|==============|--------------|

0.000          0.003          0.008          0.014          0.037



Now, let's add another 100 samples, but this time with 10 positive outcomes:

In [8]:
n2 = 100
x2 = 10

with pm.Model() as combined_dataset:
θ = pm.Beta('θ', 1, 1)
x = pm.Binomial('x', n=n1+n2, p=θ, observed=x1+x2)

trace2 = pm.sample(2000)

Applied logodds-transform to θ and added transformed θ_logodds_ to model.
Assigned NUTS to θ_logodds_
100%|██████████| 2000/2000 [00:00<00:00, 3415.85it/s]

In [9]:
pm.summary(trace2)

θ:

Mean             SD               MC Error         95% HPD interval
-------------------------------------------------------------------

0.056            0.030            0.002            [0.025, 0.087]

Posterior quantiles:
2.5            25             50             75             97.5
|--------------|==============|==============|--------------|

0.027          0.042          0.053          0.064          0.091



Notice that the credible interval is twice as large, even with a doubling of the sample size!