Developing probabilistic models using grid methods and MCMC.
Thanks to Chris Fonnesback for his help with this notebook, and to Colin Carroll, who added features to pymc3 to support some of these examples.
Copyright 2018 Allen Downey
MIT License: https://opensource.org/licenses/MIT
# If we're running on Colab, install empiricaldist
# https://pypi.org/project/empiricaldist/
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
!pip install empiricaldist
!pip install pymc3>=3.8
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
I'll model hockey as a Poisson process, where each team has some long-term average scoring rate, $\lambda$, in goals per game.
For the first example, I'll assume that $\lambda$ is somehow known to be 2.4. Since regulation play (as opposed to overtime) is 60 minutes, we can compute the goal scoring rate per minute.
lam_per_game = 2.4
min_per_game = 60
lam_per_min = lam_per_game / min_per_game
lam_per_min
0.04
If we assume that a goal is equally likely during any minute of the game, and we ignore the possibility of scoring more than one goal in the same minute, we can simulate a game by generating one random value each minute.
def simulate_game(p, n=60):
goals = np.random.choice([0, 1], n, p=[1-p, p])
return np.sum(goals)
And simulate 10 games.
size = 10
sample = [simulate_game(lam_per_min) for i in range(size)]
If we simulate 1000 games, we can see what the distribution looks like. The average of this sample should be close to lam_per_game
.
size = 1000
sample_sim = [simulate_game(lam_per_min) for i in range(size)]
np.mean(sample_sim), lam_per_game
(2.35, 2.4)
To visualize distributions, I'll start with a probability mass function (PMF).
I'll use the Pmf
object from empiricaldist
, which is a subtype of a Pandas Series
.
from empiricaldist import Pmf
pmf_sim = Pmf.from_seq(sample_sim)
pmf_sim
probs | |
---|---|
0 | 0.116 |
1 | 0.218 |
2 | 0.228 |
3 | 0.207 |
4 | 0.142 |
5 | 0.055 |
6 | 0.029 |
7 | 0.003 |
8 | 0.001 |
9 | 0.001 |
Here's what the distribution looks like.
def set_colors():
"""Set the color cycle for goals"""
plt.gca().set_prop_cycle(color=['#2ca02c', '#9467bd',])
def decorate_goals(ylabel='PMF'):
"""Decorate the axes."""
plt.xlabel('Number of goals')
plt.ylabel(ylabel)
plt.title('Distribution of goals scored')
plt.legend()
set_colors()
pmf_sim.bar(label='simulation')
decorate_goals()
plt.savefig('zigzag1.png', dpi=150)
For large values of n
, the process we just simulated converges to the Poisson distribution with parameter mu = n * p
, which is also mu = lam_per_game
.
We can use NumPy to generate a sample from a Poisson distribution.
n = 60
p = lam_per_min
mu = n * p
sample_poisson = np.random.poisson(mu, size)
np.mean(sample_poisson)
2.377
And confirm that the results are similar to what we got from the model.
from empiricaldist import Pmf
def plot_pmf(sample, **options):
"""Compute and plot the PMF of a sample."""
Pmf.from_seq(sample).bar(**options)
set_colors()
plot_pmf(sample_poisson, label='Poisson sample', alpha=0.7)
decorate_goals()
plt.savefig('zigzag2.png', dpi=150)
But plotting PMFs is a bad way to compare distributions. It's better to use the cumulative distribution function (CDF).
from empiricaldist import Cdf
def plot_cdf(sample, **options):
"""Compute and plot the CDF of a sample."""
Cdf.from_seq(sample).plot(**options)
Comparing CDFs makes it clearer that the results from the simulation are consistent with the Poisson model.
set_colors()
plot_cdf(sample_poisson, color='gray', label='Poisson')
plot_cdf(sample_sim, label='simulation')
decorate_goals('CDF')
Soon we will want to use pymc3
to do inference, which is really what it's for. But just to get warmed up, I will use it to generate a sample from a Poisson distribution.
model = pm.Model()
with model:
goals = pm.Poisson('goals', mu)
trace = pm.sample_prior_predictive(1000)
len(trace['goals'])
1000
sample_pymc = trace['goals']
np.mean(sample_pymc)
2.419
This example is like using a cannon to kill a fly. But it help us learn to use the cannon.
set_colors()
plot_pmf(sample_poisson, label='Poisson', alpha=0.5)
plot_pmf(sample_pymc, label='PyMC', alpha=0.5)
decorate_goals('CDF')
plt.savefig('zigzag3a.png', dpi=150)
set_colors()
plot_cdf(sample_poisson, label='Poisson')
plot_cdf(sample_pymc, label='PyMC')
decorate_goals('CDF')
plt.savefig('zigzag3b.png', dpi=150)
One of the nice things about the Poisson distribution is that we can compute its PMF TBLlytically, and SciPy provides an implementation.
from scipy.stats import poisson
xs = np.arange(11)
mu = 2.4
ps = poisson.pmf(xs, mu)
set_colors()
plt.bar(xs, ps, label='Poisson PMF')
decorate_goals()
Here's the probability of scoring 4 goals in a game if the long-term rate is 2.4 goals per game.
data = 4
poisson.pmf(data, mu)
0.1254084986272838
mus = np.linspace(0, 15, num=101)
likelihood = poisson.pmf(4, mus)
likelihood
array([0.00000000e+00, 1.81555589e-05, 2.50026149e-04, 1.08944747e-03, 2.96358283e-03, 6.22748873e-03, 1.11145981e-02, 1.77229800e-02, 2.60231799e-02, 3.58778394e-02, 4.70665182e-02, 5.93114635e-02, 7.23017337e-02, 8.57142371e-02, 9.92310359e-02, 1.12552785e-01, 1.25408499e-01, 1.37562022e-01, 1.48815687e-01, 1.59011643e-01, 1.68031356e-01, 1.75793720e-01, 1.82252178e-01, 1.87391193e-01, 1.91222339e-01, 1.93780255e-01, 1.95118616e-01, 1.95306276e-01, 1.94423652e-01, 1.92559429e-01, 1.89807621e-01, 1.86264988e-01, 1.82028837e-01, 1.77195165e-01, 1.71857150e-01, 1.66103943e-01, 1.60019753e-01, 1.53683172e-01, 1.47166728e-01, 1.40536620e-01, 1.33852618e-01, 1.27168093e-01, 1.20530156e-01, 1.13979883e-01, 1.07552602e-01, 1.01278229e-01, 9.51816428e-02, 8.92830728e-02, 8.35984979e-02, 7.81400478e-02, 7.29163965e-02, 6.79331437e-02, 6.31931800e-02, 5.86970320e-02, 5.44431858e-02, 5.04283870e-02, 4.66479169e-02, 4.30958446e-02, 3.97652554e-02, 3.66484555e-02, 3.37371552e-02, 3.10226309e-02, 2.84958667e-02, 2.61476786e-02, 2.39688206e-02, 2.19500757e-02, 2.00823331e-02, 1.83566515e-02, 1.67643118e-02, 1.52968592e-02, 1.39461360e-02, 1.27043063e-02, 1.15638747e-02, 1.05176971e-02, 9.55898817e-03, 8.68132323e-03, 7.87863661e-03, 7.14521711e-03, 6.47570069e-03, 5.86506121e-03, 5.30859947e-03, 4.80193100e-03, 4.34097292e-03, 3.92193024e-03, 3.54128163e-03, 3.19576513e-03, 2.88236372e-03, 2.59829116e-03, 2.34097794e-03, 2.10805767e-03, 1.89735382e-03, 1.70686702e-03, 1.53476276e-03, 1.37935980e-03, 1.23911899e-03, 1.11263277e-03, 9.98615235e-04, 8.95892745e-04, 8.03395164e-04, 7.20147598e-04, 6.45262707e-04])
plt.plot(mus, likelihood)
plt.xlabel('Goals per game (mu)')
plt.ylabel('Likelihood');
Ok, it's finally time to do some inference! The function we just wrote computes the likelihood of the data, given a hypothetical value of mu
:
$\mathrm{Prob}~(x ~|~ \mu)$
But what we really want is the distribution of mu
, given the data:
$\mathrm{Prob}~(\mu ~|~ x)$
If only there were some theorem that relates these probabilities!
I'll start with a uniform prior just to keep things simple. We'll choose a better prior later.
uniform_prior = Pmf(1, mus)
uniform_prior.normalize()
101
Initially uniform_prior
represents the prior distribution of mu
.
def decorate_rate(ylabel='PDF'):
"""Decorate the axes."""
plt.xlabel('Goals per game (mu)')
plt.ylabel(ylabel)
plt.title('Distribution of goal scoring rate')
handles, labels = plt.gca().get_legend_handles_labels()
if len(labels):
plt.legend()
uniform_prior.plot(label='prior')
decorate_rate()
plt.savefig('zigzag4.png', dpi=150)
Now we can update it with the data and plot the posterior.
data = 4
likelihood = poisson.pmf(data, mus)
posterior1 = uniform_prior * likelihood
posterior1.normalize()
0.06595319262654087
posterior1.plot(label='posterior')
decorate_rate()
plt.savefig('zigzag5.png', dpi=150)
With a uniform prior, the posterior is the likelihood function, and the MAP is the value of mu
that maximizes likelihood, which is the observed number of goals, 4.
This result is probably not reasonable, because the prior was not reasonable.
To construct a better prior, I'll use a Gamma distribution with parameters chosen to be consistent with previous playoff games.
We can use make_gamma_dist
to construct a prior suite with the given parameters.
from scipy.stats import gamma
def make_gamma_dist(xs, alpha, beta):
"""Makes a Pmf based on a gamma distribution.
xs: places to evaluate the PDF
alpha, beta: parameters of the distribution
returns: Suite
"""
ps = gamma(a=alpha, scale=1/beta).pdf(xs)
pmf = Pmf(ps, xs)
pmf.normalize()
return pmf
alpha = 4.4
beta = 1.8
alpha, beta
(4.4, 1.8)
mus = np.linspace(0, 10, num=101)
gamma_prior = make_gamma_dist(mus, alpha, beta)
gamma_prior.mean(), gamma_prior.std()
(2.4441837515202747, 1.1644330431484435)
Here's what it looks like.
gamma_prior.plot(label='gamma prior')
decorate_rate()
plt.savefig('zigzag6.png', dpi=150)
And we can update this prior using the observed data.
data = 4
likelihood = poisson.pmf(data, mus)
posterior2 = gamma_prior * likelihood
posterior2.normalize()
0.10917679489111866
gamma_prior.plot(color='gray', label='prior')
posterior2.plot(label='posterior')
decorate_rate()
plt.savefig('zigzag7.png', dpi=150)
Previously we used PyMC to draw a sample from a Poisson distribution with known mu
.
Now we'll use it to draw a sample from a gamma distribution of mu
, with known alpha
and beta
.
And then use that value of mu
to draw a sample from a Poisson distribution.
Here are the parameters we used to create the gamma prior:
print(alpha, beta)
4.8 2
Here's what the PyMC model looks like:
model = pm.Model()
with model:
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu)
trace = pm.sample_prior_predictive(1000)
The distribution of mu
from this model is the prior. Let's see what it looks like:
sample_prior_pymc = trace['mu']
np.mean(sample_prior_pymc)
2.445926077198612
And compare it to a sample from the gamma prior.
sample_prior_gamma = gamma_prior.sample(1000, replace=True)
plot_cdf(sample_prior_gamma, label='prior gamma')
plot_cdf(sample_prior_pymc, label='prior pymc')
decorate_rate('CDF')
plt.savefig('zigzag10.png', dpi=150)
It looks pretty good.
The distributions of goals
from this model is the prior predictive.
sample_prior_pred_pymc = trace['goals']
np.mean(sample_prior_pred_pymc)
2.43
And let's compare it to a prior predictive distribution estimated by sampling.
sample_prior_pred = np.random.poisson(sample_prior_gamma)
np.mean(sample_prior_pred)
2.492
def plot_goals(sample, pos='left', **options):
options['align'] = 'edge'
if pos == 'left':
options['width'] = -0.45
else:
options['width'] = 0.45
cdf = Cdf.from_seq(sample)
plt.plot(cdf.qs, cdf.ps, **options)
set_colors()
plot_cdf(sample_prior_pred, label='prior pred grid')
plot_cdf(sample_prior_pred_pymc, label='prior pred PyMC')
decorate_goals('CDF')
plt.savefig('zigzag11.png', dpi=150)
Looks good.
Finally, we are ready to use PyMC for actual inference. We just have to make one small change.
Instead of generating goals
, we'll mark goals as observed
and provide the observed data, 4
.
And instead of called sample_prior_predictive
, we'll call sample
, which is understood to sample from the posterior distribution of mu
.
model = pm.Model()
with model:
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu, observed=4)
trace = pm.sample(500)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [mu] Sampling 2 chains, 0 divergences: 100%|██████████| 2000/2000 [00:00<00:00, 2301.77draws/s] The acceptance probability does not match the target. It is 0.8821095734410509, but should be close to 0.8. Try to increase the number of tuning steps.
With goals
fixed, the only unknown is mu
, so trace
contains a sample drawn from the posterior distribution of mu
. We can plot the posterior using a function provided by PyMC:
len(trace['mu'])
1000
with model:
pm.plot_posterior(trace)
We can extract a sample of mu
from the trace.
sample_post_pymc = trace['mu']
np.mean(sample_post_pymc)
2.9224513318372063
And compare it to the sample we drew from the grid approximation:
sample_post = posterior2.sample(1000, replace=True)
sample_post.mean()
2.9052
plot_cdf(sample_post, label='posterior grid')
plot_cdf(sample_post_pymc, label='posterior pymc')
decorate_rate()
plt.savefig('zigzag12.png', dpi=150)
Again, it looks pretty good.
To sample from the posterior predictive distribution, we can use sample_posterior_predictive
:
with model:
post_pred = pm.sample_posterior_predictive(trace, samples=1000)
100%|██████████| 1000/1000 [00:00<00:00, 2524.62it/s]
Here's what it looks like:
sample_post_pred_pymc = post_pred['goals']
np.mean(sample_post_pred_pymc)
2.95
sample_post_pred = poisson(sample_post).rvs()
sample_post_pred.mean()
2.888
set_colors()
plot_cdf(sample_post_pred, label='posterior pred grid')
plot_cdf(sample_post_pred_pymc, label='posterior pred pm')
decorate_goals()
plt.savefig('zigzag13.png', dpi=150)
Looks pretty good!
We can extend the model to estimate different values of mu
for the two teams.
model = pm.Model()
with model:
mu_BOS = pm.Gamma('mu_BOS', alpha, beta)
mu_TBL = pm.Gamma('mu_TBL', alpha, beta)
goals_BOS = pm.Poisson('goals_BOS', mu_BOS, observed=3)
goals_TBL = pm.Poisson('goals_TBL', mu_TBL, observed=2)
trace = pm.sample(500)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [mu_TBL, mu_BOS] Sampling 2 chains, 0 divergences: 100%|██████████| 2000/2000 [00:00<00:00, 2169.16draws/s] The acceptance probability does not match the target. It is 0.8810527302399683, but should be close to 0.8. Try to increase the number of tuning steps.
We can use traceplot
to review the results and do some visual diagnostics.
with model:
pm.traceplot(trace);
Here are the posterior distribitions for mu_BOS
and mu_TBL
.
mu_TBL = trace['mu_TBL']
plot_cdf(mu_TBL, label='mu_TBL posterior')
mu_BOS = trace['mu_BOS']
plot_cdf(mu_BOS, label='mu_BOS posterior')
decorate_rate('CDF')
np.mean(mu_BOS), np.mean(mu_TBL)
plt.savefig('zigzag14.png', dpi=150)
On the basis of one game here's the probability that Boston is the better team.
np.mean(mu_BOS > mu_TBL)
0.617
np.mean(mu_BOS == mu_TBL)
0.0
Even if Boston is the better team, that doesn't mean they'll win the next game.
We can use sample_posterior_predictive
to generate predictions.
with model:
post_pred = pm.sample_posterior_predictive(trace, samples=1000)
100%|██████████| 1000/1000 [00:00<00:00, 1443.61it/s]
Here are the posterior predictive distributions of goals scored.
goals_BOS = post_pred['goals_BOS'].flatten()
goals_TBL = post_pred['goals_TBL'].flatten()
set_colors()
plot_cdf(goals_TBL, label='TBL')
plot_cdf(goals_BOS, label='BOS')
decorate_goals('CDF')
plt.savefig('zigzag15.png', dpi=150)
Here's the chance that Boston wins the next game.
win = np.mean(goals_BOS > goals_TBL)
win
0.475
The chance that they lose.
lose = np.mean(goals_TBL > goals_BOS)
lose
0.338
And the chance of a tie.
tie = np.mean(goals_BOS == goals_TBL)
tie
0.187
So far, all of this is based on a gamma prior. To choose the parameters of the prior, I used data from previous Stanley Cup finals and computed a maximum likelihood estimate (MLE). But that's not correct, because
alpha
and beta
as known values rather than parameters to estimate.In other words, I have ignored two important sources of uncertainty. As a result, my predictions are almost certainly too confident.
The solution is a hierarchical model, where alpha
and beta
are the parameters that control mu
and mu
is the parameter that controls goals
. Then we can use observed goals
to update the distributions of all three unknown parameters.
Of course, now we need a prior distribution for alpha
and beta
. A common choice is the half Cauchy distribution (see Gelman), but on advice of counsel, I'm going with exponential.
Here's a PyMC model that generates alpha
and beta
from an exponential distribution.
model = pm.Model()
with model:
alpha = pm.Exponential('alpha', lam=1)
beta = pm.Exponential('beta', lam=1)
trace = pm.sample_prior_predictive(1000)
Here's what the distributions of alpha
and beta
look like.
sample_prior_alpha = trace['alpha']
plot_cdf(sample_prior_alpha, label='alpha prior')
sample_prior_beta = trace['beta']
plot_cdf(sample_prior_beta, label='beta prior')
plt.xscale('log')
plt.xlabel('Parameter of a gamma distribution')
plt.ylabel('CDF')
np.mean(sample_prior_alpha)
0.9842304166724799
Now that we have alpha
and beta
, we can generate mu
.
model = pm.Model()
with model:
alpha = pm.Exponential('alpha', lam=1)
beta = pm.Exponential('beta', lam=1)
mu = pm.Gamma('mu', alpha, beta)
trace = pm.sample_prior_predictive(1000)
Here's what the prior distribution of mu
looks like.
sample_prior_mu = trace['mu']
plot_cdf(sample_prior_mu, label='mu prior hierarchical')
decorate_rate('CDF')
np.mean(sample_prior_mu)
12.976167851569219
In effect, the model is saying "I have never seen a hockey game before. As far as I know, it could be soccer, could be basketball, could be pinball."
If we zoom in on the range 0 to 10, we can compare the prior implied by the hierarchical model with the gamma prior I hand picked.
plot_cdf(sample_prior_mu, label='mu prior hierarchical')
plot_cdf(sample_prior_gamma, label='mu prior gamma', color='gray')
plt.xlim(0, 10)
decorate_rate('CDF')
Obviously, they are very different. They agree that the most likely values are less than 10, but the hierarchical model admits the possibility that mu
could be orders of magnitude bigger.
Strange as it sounds, that's probably what we want in a non-committal prior.
Ok, last step of the forward process, let's generate some goals.
model = pm.Model()
with model:
alpha = pm.Exponential('alpha', lam=1)
beta = pm.Exponential('beta', lam=1)
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu)
trace = pm.sample_prior_predictive(1000)
Here's the prior predictive distribution of goals.
sample_prior_goals = trace['goals']
set_colors()
plot_cdf(sample_prior_goals, label='goals prior')
decorate_goals('CDF')
np.mean(sample_prior_goals)
5.944
To see whether that distribution is right, I ran samples using SciPy.
import scipy.stats as st
def forward_hierarchical(size=1):
alpha = st.expon().rvs(size=size)
beta = st.expon().rvs(size=size)
mu = st.gamma(a=alpha, scale=1/beta).rvs(size=size)
goals = st.poisson(mu).rvs(size=size)
return goals[0]
sample_prior_goals_st = [forward_hierarchical() for i in range(1000)];
set_colors()
plot_cdf(sample_prior_goals, label='goals prior')
plot_cdf(sample_prior_goals_st, label='goals prior scipy')
decorate_goals('CDF')
plt.xlim(0, 50)
plt.legend(loc='lower right')
np.mean(sample_prior_goals_st)
6.937
Once we have the forward process working, we only need a small change to run the reverse process.
model = pm.Model()
with model:
alpha = pm.Exponential('alpha', lam=1)
beta = pm.Exponential('beta', lam=1)
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu, observed=[3, 3])
trace = pm.sample(1000)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [mu, beta, alpha] Sampling 2 chains, 1 divergences: 100%|██████████| 3000/3000 [00:02<00:00, 1444.74draws/s] There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
Here's the posterior distribution of mu
. The posterior mean is close to the observed value, which is what we expect with a weakly informative prior.
sample_post_mu = trace['mu']
plot_cdf(sample_post_mu, label='mu posterior')
decorate_rate('CDF')
np.mean(sample_post_mu)
2.865235353825642
We can extend the model to estimate different values of mu
for the two teams.
model = pm.Model()
with model:
alpha = pm.Exponential('alpha', lam=1)
beta = pm.Exponential('beta', lam=1)
mu_BOS = pm.Gamma('mu_BOS', alpha, beta)
mu_TBL = pm.Gamma('mu_TBL', alpha, beta)
goals_BOS = pm.Poisson('goals_BOS', mu_BOS, observed=[3, 3])
goals_TBL = pm.Poisson('goals_TBL', mu_TBL, observed=[2, 3])
trace = pm.sample(1000, nuts_kwargs=dict(target_accept=0.95))
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [mu_TBL, mu_BOS, beta, alpha] Sampling 2 chains, 0 divergences: 100%|██████████| 3000/3000 [00:03<00:00, 861.25draws/s]
with model:
pm.traceplot(trace)
with model:
pm.plot_posterior(trace['mu_BOS'])
with model:
pm.plot_posterior(trace['mu_TBL'])
Here are the posterior distribitions for mu_TBL
and mu_BOS
.
sample_post_mu_TBL = trace['mu_TBL']
plot_cdf(sample_post_mu_TBL, label='mu_TBL posterior')
sample_post_mu_BOS = trace['mu_BOS']
plot_cdf(sample_post_mu_BOS, label='mu_BOS posterior')
decorate_rate('CDF')
np.mean(sample_post_mu_TBL), np.mean(sample_post_mu_BOS)
(2.5273712319352972, 2.857168894779832)
On the basis of one game (and never having seen a previous game), here's the probability that Boston is the better team.
np.mean(sample_post_mu_BOS > sample_post_mu_TBL)
0.603
But let's take advantage of more information. Here are the results from the seven most recent Stanley Cup finals, ignoring games that went into overtime.
data = dict(BOS13 = [2, 1, 2],
CHI13 = [0, 3, 3],
NYR14 = [0, 2],
LAK14 = [3, 1],
TBL15 = [1, 4, 3, 1, 1, 0],
CHI15 = [2, 3, 2, 2, 2, 2],
SJS16 = [2, 1, 4, 1],
PIT16 = [3, 3, 2, 3],
NSH17 = [3, 1, 5, 4, 0, 0],
PIT17 = [5, 4, 1, 1, 6, 2],
VGK18 = [6, 2, 1, 2, 3],
WSH18 = [4, 3, 3, 6, 4],
BOS19 = [2, 7, 2, 1, 5, 1],
STL19 = [2, 2, 4, 2, 1, 4],
BOS20 = [3, 2],
TBL20 = [2, 3],
)
Here's how we can get the data into the model.
model = pm.Model()
with model:
alpha = pm.Exponential('alpha', lam=1)
beta = pm.Exponential('beta', lam=1)
mu = dict()
goals = dict()
for name, observed in data.items():
mu[name] = pm.Gamma('mu_'+name, alpha, beta)
goals[name] = pm.Poisson(name, mu[name], observed=observed)
trace = pm.sample(500, nuts_kwargs=dict(target_accept=0.95))
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [mu_TBL20, mu_BOS20, mu_STL19, mu_BOS19, mu_WSH18, mu_VGK18, mu_PIT17, mu_NSH17, mu_PIT16, mu_SJS16, mu_CHI15, mu_TBL15, mu_LAK14, mu_NYR14, mu_CHI13, mu_BOS13, beta, alpha] Sampling 2 chains, 0 divergences: 100%|██████████| 3000/3000 [00:13<00:00, 224.03draws/s]
Here are the posterior means.
sample_post_mu_BOS = trace['mu_BOS20']
np.mean(sample_post_mu_BOS)
2.479121501369579
sample_post_mu_TBL = trace['mu_TBL20']
np.mean(sample_post_mu_TBL)
2.4738678038205064
They are lower with the background information than without, and closer together. Here's the updated chance that Boston is the better team.
np.mean(sample_post_mu_BOS > sample_post_mu_TBL)
0.4935
plot_cdf(trace['alpha'], label='alpha')
plot_cdf(trace['beta'], label='beta')
plt.xlabel('Parameter of a gamma distribution')
plt.ylabel('CDF')
plt.legend();
np.mean(trace['alpha'])
4.360177013060098
np.mean(trace['beta'])
1.7947170342792527
np.mean(trace['alpha'] / trace['beta'])
2.475142954843771
np.mean(trace['alpha'] / trace['beta']**2)
1.6342667744048895