Bayesian Zig Zag¶

Developing probabilistic models using grid methods and MCMC.

Thanks to Chris Fonnesback for his help with this notebook, and to Colin Carroll, who added features to pymc3 to support some of these examples.

MIT License: https://opensource.org/licenses/MIT

In [1]:

# If we're running on Colab, install empiricaldist
# https://pypi.org/project/empiricaldist/

import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    !pip install empiricaldist
    !pip install pymc3>=3.8

In [2]:

import matplotlib.pyplot as plt

import numpy as np
import pymc3 as pm

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

Simulating hockey¶

I'll model hockey as a Poisson process, where each team has some long-term average scoring rate, $\lambda$, in goals per game.

For the first example, I'll assume that $\lambda$ is somehow known to be 2.4. Since regulation play (as opposed to overtime) is 60 minutes, we can compute the goal scoring rate per minute.

In [3]:

lam_per_game = 2.4
min_per_game = 60
lam_per_min = lam_per_game / min_per_game
lam_per_min

Out[3]:

0.04

If we assume that a goal is equally likely during any minute of the game, and we ignore the possibility of scoring more than one goal in the same minute, we can simulate a game by generating one random value each minute.

In [4]:

def simulate_game(p, n=60):
    goals = np.random.choice([0, 1], n, p=[1-p, p])
    return np.sum(goals)

And simulate 10 games.

In [5]:

size = 10
sample = [simulate_game(lam_per_min) for i in range(size)]

If we simulate 1000 games, we can see what the distribution looks like. The average of this sample should be close to lam_per_game.

In [6]:

size = 1000
sample_sim = [simulate_game(lam_per_min) for i in range(size)]
np.mean(sample_sim), lam_per_game

Out[6]:

(2.35, 2.4)

PMFs¶

To visualize distributions, I'll start with a probability mass function (PMF).

I'll use the Pmf object from empiricaldist, which is a subtype of a Pandas Series.

In [7]:

from empiricaldist import Pmf

pmf_sim = Pmf.from_seq(sample_sim)
pmf_sim

Out[7]:

	probs
0	0.116
1	0.218
2	0.228
3	0.207
4	0.142
5	0.055
6	0.029
7	0.003
8	0.001
9	0.001

Here's what the distribution looks like.

In [8]:

def set_colors():
    """Set the color cycle for goals"""
    plt.gca().set_prop_cycle(color=['#2ca02c', '#9467bd',])

In [9]:

def decorate_goals(ylabel='PMF'):
    """Decorate the axes."""
    plt.xlabel('Number of goals')
    plt.ylabel(ylabel)
    plt.title('Distribution of goals scored')
    plt.legend()

In [10]:

set_colors()
pmf_sim.bar(label='simulation')
decorate_goals()

plt.savefig('zigzag1.png', dpi=150)

Poisson process¶

For large values of n, the process we just simulated converges to the Poisson distribution with parameter mu = n * p, which is also mu = lam_per_game.

We can use NumPy to generate a sample from a Poisson distribution.

In [104]:

n = 60
p = lam_per_min
mu = n * p
sample_poisson = np.random.poisson(mu, size)
np.mean(sample_poisson)

Out[104]:

2.377

And confirm that the results are similar to what we got from the model.

In [112]:

from empiricaldist import Pmf

def plot_pmf(sample, **options):
    """Compute and plot the PMF of a sample."""
    Pmf.from_seq(sample).bar(**options)

In [113]:

set_colors()
plot_pmf(sample_poisson, label='Poisson sample', alpha=0.7)
decorate_goals()

plt.savefig('zigzag2.png', dpi=150)

But plotting PMFs is a bad way to compare distributions. It's better to use the cumulative distribution function (CDF).

In [114]:

from empiricaldist import Cdf

def plot_cdf(sample, **options):
    """Compute and plot the CDF of a sample."""
    Cdf.from_seq(sample).plot(**options)

Comparing CDFs makes it clearer that the results from the simulation are consistent with the Poisson model.

In [116]:

set_colors()
plot_cdf(sample_poisson, color='gray', label='Poisson')
plot_cdf(sample_sim, label='simulation')
decorate_goals('CDF')

Warming up PyMC¶

Soon we will want to use pymc3 to do inference, which is really what it's for. But just to get warmed up, I will use it to generate a sample from a Poisson distribution.

In [16]:

model = pm.Model()

with model:
    goals = pm.Poisson('goals', mu)
    trace = pm.sample_prior_predictive(1000)

In [17]:

len(trace['goals'])

Out[17]:

In [18]:

sample_pymc = trace['goals']
np.mean(sample_pymc)

Out[18]:

2.419

This example is like using a cannon to kill a fly. But it help us learn to use the cannon.

In [119]:

set_colors()
plot_pmf(sample_poisson, label='Poisson', alpha=0.5)
plot_pmf(sample_pymc, label='PyMC', alpha=0.5)
decorate_goals('CDF')

plt.savefig('zigzag3a.png', dpi=150)

In [120]:

set_colors()
plot_cdf(sample_poisson, label='Poisson')
plot_cdf(sample_pymc, label='PyMC')
decorate_goals('CDF')

plt.savefig('zigzag3b.png', dpi=150)

Evaluating the Poisson distribution¶

One of the nice things about the Poisson distribution is that we can compute its PMF TBLlytically, and SciPy provides an implementation.

In [121]:

from scipy.stats import poisson

xs = np.arange(11)
mu = 2.4

ps = poisson.pmf(xs, mu)
set_colors()
plt.bar(xs, ps, label='Poisson PMF')
decorate_goals()

Here's the probability of scoring 4 goals in a game if the long-term rate is 2.4 goals per game.

In [122]:

data = 4
poisson.pmf(data, mu)

Out[122]:

0.1254084986272838

In [123]:

mus = np.linspace(0, 15, num=101)
likelihood = poisson.pmf(4, mus)
likelihood

Out[123]:

array([0.00000000e+00, 1.81555589e-05, 2.50026149e-04, 1.08944747e-03,
       2.96358283e-03, 6.22748873e-03, 1.11145981e-02, 1.77229800e-02,
       2.60231799e-02, 3.58778394e-02, 4.70665182e-02, 5.93114635e-02,
       7.23017337e-02, 8.57142371e-02, 9.92310359e-02, 1.12552785e-01,
       1.25408499e-01, 1.37562022e-01, 1.48815687e-01, 1.59011643e-01,
       1.68031356e-01, 1.75793720e-01, 1.82252178e-01, 1.87391193e-01,
       1.91222339e-01, 1.93780255e-01, 1.95118616e-01, 1.95306276e-01,
       1.94423652e-01, 1.92559429e-01, 1.89807621e-01, 1.86264988e-01,
       1.82028837e-01, 1.77195165e-01, 1.71857150e-01, 1.66103943e-01,
       1.60019753e-01, 1.53683172e-01, 1.47166728e-01, 1.40536620e-01,
       1.33852618e-01, 1.27168093e-01, 1.20530156e-01, 1.13979883e-01,
       1.07552602e-01, 1.01278229e-01, 9.51816428e-02, 8.92830728e-02,
       8.35984979e-02, 7.81400478e-02, 7.29163965e-02, 6.79331437e-02,
       6.31931800e-02, 5.86970320e-02, 5.44431858e-02, 5.04283870e-02,
       4.66479169e-02, 4.30958446e-02, 3.97652554e-02, 3.66484555e-02,
       3.37371552e-02, 3.10226309e-02, 2.84958667e-02, 2.61476786e-02,
       2.39688206e-02, 2.19500757e-02, 2.00823331e-02, 1.83566515e-02,
       1.67643118e-02, 1.52968592e-02, 1.39461360e-02, 1.27043063e-02,
       1.15638747e-02, 1.05176971e-02, 9.55898817e-03, 8.68132323e-03,
       7.87863661e-03, 7.14521711e-03, 6.47570069e-03, 5.86506121e-03,
       5.30859947e-03, 4.80193100e-03, 4.34097292e-03, 3.92193024e-03,
       3.54128163e-03, 3.19576513e-03, 2.88236372e-03, 2.59829116e-03,
       2.34097794e-03, 2.10805767e-03, 1.89735382e-03, 1.70686702e-03,
       1.53476276e-03, 1.37935980e-03, 1.23911899e-03, 1.11263277e-03,
       9.98615235e-04, 8.95892745e-04, 8.03395164e-04, 7.20147598e-04,
       6.45262707e-04])

In [124]:

plt.plot(mus, likelihood)
plt.xlabel('Goals per game (mu)')
plt.ylabel('Likelihood');

Bayesian inference with grid approximation¶

Ok, it's finally time to do some inference! The function we just wrote computes the likelihood of the data, given a hypothetical value of mu:

$\mathrm{Prob}~(x ~|~ \mu)$

But what we really want is the distribution of mu, given the data:

$\mathrm{Prob}~(\mu ~|~ x)$

If only there were some theorem that relates these probabilities!

I'll start with a uniform prior just to keep things simple. We'll choose a better prior later.

In [125]:

uniform_prior = Pmf(1, mus)
uniform_prior.normalize()

Out[125]:

Initially uniform_prior represents the prior distribution of mu.

In [126]:

def decorate_rate(ylabel='PDF'):
    """Decorate the axes."""
    plt.xlabel('Goals per game (mu)')
    plt.ylabel(ylabel)
    plt.title('Distribution of goal scoring rate')
    
    handles, labels = plt.gca().get_legend_handles_labels()
    if len(labels):
        plt.legend()

In [127]:

uniform_prior.plot(label='prior')
decorate_rate()

plt.savefig('zigzag4.png', dpi=150)

Now we can update it with the data and plot the posterior.

In [128]:

data = 4
likelihood = poisson.pmf(data, mus)

In [129]:

posterior1 = uniform_prior * likelihood
posterior1.normalize()

Out[129]:

0.06595319262654087

In [130]:

posterior1.plot(label='posterior')
decorate_rate()

plt.savefig('zigzag5.png', dpi=150)

With a uniform prior, the posterior is the likelihood function, and the MAP is the value of mu that maximizes likelihood, which is the observed number of goals, 4.

This result is probably not reasonable, because the prior was not reasonable.

A better prior¶

To construct a better prior, I'll use a Gamma distribution with parameters chosen to be consistent with previous playoff games.

We can use make_gamma_dist to construct a prior suite with the given parameters.

In [131]:

from scipy.stats import gamma

def make_gamma_dist(xs, alpha, beta):
    """Makes a Pmf based on a gamma distribution.
    
    xs: places to evaluate the PDF
    alpha, beta: parameters of the distribution
    
    returns: Suite
    """
    ps = gamma(a=alpha, scale=1/beta).pdf(xs)
    pmf = Pmf(ps, xs)
    pmf.normalize()
    return pmf

In [102]:

alpha = 4.4
beta = 1.8

alpha, beta

Out[102]:

(4.4, 1.8)

In [132]:

mus = np.linspace(0, 10, num=101)
gamma_prior = make_gamma_dist(mus, alpha, beta)
gamma_prior.mean(), gamma_prior.std()

Out[132]:

(2.4441837515202747, 1.1644330431484435)

Here's what it looks like.

In [133]:

gamma_prior.plot(label='gamma prior')
decorate_rate()

plt.savefig('zigzag6.png', dpi=150)

And we can update this prior using the observed data.

In [134]:

data = 4
likelihood = poisson.pmf(data, mus)

In [135]:

posterior2 = gamma_prior * likelihood
posterior2.normalize()

Out[135]:

0.10917679489111866

In [136]:

gamma_prior.plot(color='gray', label='prior')
posterior2.plot(label='posterior')
decorate_rate()

plt.savefig('zigzag7.png', dpi=150)

Back to PyMC¶

Previously we used PyMC to draw a sample from a Poisson distribution with known mu.

Now we'll use it to draw a sample from a gamma distribution of mu, with known alpha and beta.

And then use that value of mu to draw a sample from a Poisson distribution.

Here are the parameters we used to create the gamma prior:

In [37]:

print(alpha, beta)

4.8 2

Here's what the PyMC model looks like:

In [142]:

model = pm.Model()

with model:
    mu = pm.Gamma('mu', alpha, beta)
    goals = pm.Poisson('goals', mu)
    trace = pm.sample_prior_predictive(1000)

The distribution of mu from this model is the prior. Let's see what it looks like:

In [143]:

sample_prior_pymc = trace['mu']
np.mean(sample_prior_pymc)

Out[143]:

2.445926077198612

And compare it to a sample from the gamma prior.

In [144]:

sample_prior_gamma = gamma_prior.sample(1000, replace=True)

In [145]:

plot_cdf(sample_prior_gamma, label='prior gamma')
plot_cdf(sample_prior_pymc, label='prior pymc')
decorate_rate('CDF')

plt.savefig('zigzag10.png', dpi=150)

It looks pretty good.

Prior predictive¶

The distributions of goals from this model is the prior predictive.

In [146]:

sample_prior_pred_pymc = trace['goals']
np.mean(sample_prior_pred_pymc)

Out[146]:

2.43

And let's compare it to a prior predictive distribution estimated by sampling.

In [147]:

sample_prior_pred = np.random.poisson(sample_prior_gamma)
np.mean(sample_prior_pred)

Out[147]:

2.492

In [148]:

def plot_goals(sample, pos='left', **options):
    options['align'] = 'edge'
    if pos == 'left':
        options['width'] = -0.45
    else:
        options['width'] = 0.45
        
    cdf = Cdf.from_seq(sample)
    plt.plot(cdf.qs, cdf.ps, **options)

In [150]:

set_colors()
plot_cdf(sample_prior_pred, label='prior pred grid')
plot_cdf(sample_prior_pred_pymc, label='prior pred PyMC')
decorate_goals('CDF')

plt.savefig('zigzag11.png', dpi=150)

Looks good.

When do we get to inference?¶

Finally, we are ready to use PyMC for actual inference. We just have to make one small change.

Instead of generating goals, we'll mark goals as observed and provide the observed data, 4.

And instead of called sample_prior_predictive, we'll call sample, which is understood to sample from the posterior distribution of mu.

In [151]:

model = pm.Model()

with model:
    mu = pm.Gamma('mu', alpha, beta)
    goals = pm.Poisson('goals', mu, observed=4)
    trace = pm.sample(500)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [mu]
Sampling 2 chains, 0 divergences: 100%|██████████| 2000/2000 [00:00<00:00, 2301.77draws/s]
The acceptance probability does not match the target. It is 0.8821095734410509, but should be close to 0.8. Try to increase the number of tuning steps.

With goals fixed, the only unknown is mu, so trace contains a sample drawn from the posterior distribution of mu. We can plot the posterior using a function provided by PyMC:

In [152]:

len(trace['mu'])

Out[152]:

In [153]:

with model:
    pm.plot_posterior(trace)

We can extract a sample of mu from the trace.

In [49]:

sample_post_pymc = trace['mu']
np.mean(sample_post_pymc)

Out[49]:

2.9224513318372063

And compare it to the sample we drew from the grid approximation:

In [50]:

sample_post = posterior2.sample(1000, replace=True)
sample_post.mean()

Out[50]:

2.9052

In [51]:

plot_cdf(sample_post, label='posterior grid')
plot_cdf(sample_post_pymc, label='posterior pymc')
decorate_rate()

plt.savefig('zigzag12.png', dpi=150)

Again, it looks pretty good.

Posterior predictive¶

To sample from the posterior predictive distribution, we can use sample_posterior_predictive:

In [52]:

with model:
    post_pred = pm.sample_posterior_predictive(trace, samples=1000)

100%|██████████| 1000/1000 [00:00<00:00, 2524.62it/s]

Here's what it looks like:

In [53]:

sample_post_pred_pymc = post_pred['goals']
np.mean(sample_post_pred_pymc)

Out[53]:

2.95

In [54]:

sample_post_pred = poisson(sample_post).rvs()
sample_post_pred.mean()

Out[54]:

2.888

In [55]:

set_colors()
plot_cdf(sample_post_pred, label='posterior pred grid')
plot_cdf(sample_post_pred_pymc, label='posterior pred pm')
decorate_goals()

plt.savefig('zigzag13.png', dpi=150)

Looks pretty good!

Two teams¶

We can extend the model to estimate different values of mu for the two teams.

In [56]:

model = pm.Model()

with model:
    mu_BOS = pm.Gamma('mu_BOS', alpha, beta)
    mu_TBL = pm.Gamma('mu_TBL', alpha, beta)
    goals_BOS = pm.Poisson('goals_BOS', mu_BOS, observed=3)
    goals_TBL = pm.Poisson('goals_TBL', mu_TBL, observed=2)
    trace = pm.sample(500)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [mu_TBL, mu_BOS]
Sampling 2 chains, 0 divergences: 100%|██████████| 2000/2000 [00:00<00:00, 2169.16draws/s]
The acceptance probability does not match the target. It is 0.8810527302399683, but should be close to 0.8. Try to increase the number of tuning steps.

We can use traceplot to review the results and do some visual diagnostics.

In [57]:

with model:
    pm.traceplot(trace);

Here are the posterior distribitions for mu_BOS and mu_TBL.

In [58]:

mu_TBL = trace['mu_TBL']
plot_cdf(mu_TBL, label='mu_TBL posterior')

mu_BOS = trace['mu_BOS']
plot_cdf(mu_BOS, label='mu_BOS posterior')

decorate_rate('CDF')
np.mean(mu_BOS), np.mean(mu_TBL)

plt.savefig('zigzag14.png', dpi=150)

On the basis of one game here's the probability that Boston is the better team.

In [59]:

np.mean(mu_BOS > mu_TBL)

Out[59]:

0.617

In [60]:

np.mean(mu_BOS == mu_TBL)

Out[60]:

0.0

Predictions¶

Even if Boston is the better team, that doesn't mean they'll win the next game.

We can use sample_posterior_predictive to generate predictions.

In [61]:

with model:
    post_pred = pm.sample_posterior_predictive(trace, samples=1000)

100%|██████████| 1000/1000 [00:00<00:00, 1443.61it/s]

Here are the posterior predictive distributions of goals scored.

In [62]:

goals_BOS = post_pred['goals_BOS'].flatten()
goals_TBL = post_pred['goals_TBL'].flatten()

In [63]:

set_colors()
plot_cdf(goals_TBL, label='TBL')
plot_cdf(goals_BOS, label='BOS')
decorate_goals('CDF')

plt.savefig('zigzag15.png', dpi=150)

Here's the chance that Boston wins the next game.

In [64]:

win = np.mean(goals_BOS > goals_TBL)
win

Out[64]:

0.475

The chance that they lose.

In [65]:

lose = np.mean(goals_TBL > goals_BOS)
lose

Out[65]:

0.338

And the chance of a tie.

In [66]:

tie = np.mean(goals_BOS == goals_TBL)
tie

Out[66]:

0.187

Going hierarchical¶

So far, all of this is based on a gamma prior. To choose the parameters of the prior, I used data from previous Stanley Cup finals and computed a maximum likelihood estimate (MLE). But that's not correct, because

It assumes that the observed goal counts are the long-term goal-scoring rates.
It treats alpha and beta as known values rather than parameters to estimate.

In other words, I have ignored two important sources of uncertainty. As a result, my predictions are almost certainly too confident.

The solution is a hierarchical model, where alpha and beta are the parameters that control mu and mu is the parameter that controls goals. Then we can use observed goals to update the distributions of all three unknown parameters.

Of course, now we need a prior distribution for alpha and beta. A common choice is the half Cauchy distribution (see Gelman), but on advice of counsel, I'm going with exponential.

Here's a PyMC model that generates alpha and beta from an exponential distribution.

In [67]:

model = pm.Model()

with model:
    alpha = pm.Exponential('alpha', lam=1)
    beta = pm.Exponential('beta', lam=1)
    trace = pm.sample_prior_predictive(1000)

Here's what the distributions of alpha and beta look like.

In [68]:

sample_prior_alpha = trace['alpha']
plot_cdf(sample_prior_alpha, label='alpha prior')
sample_prior_beta = trace['beta']
plot_cdf(sample_prior_beta, label='beta prior')

plt.xscale('log')
plt.xlabel('Parameter of a gamma distribution')
plt.ylabel('CDF')
np.mean(sample_prior_alpha)

Out[68]:

0.9842304166724799

Now that we have alpha and beta, we can generate mu.

In [69]:

model = pm.Model()

with model:
    alpha = pm.Exponential('alpha', lam=1)
    beta = pm.Exponential('beta', lam=1)
    mu = pm.Gamma('mu', alpha, beta)
    trace = pm.sample_prior_predictive(1000)

Here's what the prior distribution of mu looks like.

In [70]:

sample_prior_mu = trace['mu']
plot_cdf(sample_prior_mu, label='mu prior hierarchical')
decorate_rate('CDF')
np.mean(sample_prior_mu)

Out[70]:

12.976167851569219

In effect, the model is saying "I have never seen a hockey game before. As far as I know, it could be soccer, could be basketball, could be pinball."

If we zoom in on the range 0 to 10, we can compare the prior implied by the hierarchical model with the gamma prior I hand picked.

In [71]:

plot_cdf(sample_prior_mu, label='mu prior hierarchical')
plot_cdf(sample_prior_gamma, label='mu prior gamma', color='gray')
plt.xlim(0, 10)
decorate_rate('CDF')

Obviously, they are very different. They agree that the most likely values are less than 10, but the hierarchical model admits the possibility that mu could be orders of magnitude bigger.

Strange as it sounds, that's probably what we want in a non-committal prior.

Ok, last step of the forward process, let's generate some goals.

In [72]:

model = pm.Model()

with model:
    alpha = pm.Exponential('alpha', lam=1)
    beta = pm.Exponential('beta', lam=1)
    mu = pm.Gamma('mu', alpha, beta)
    goals = pm.Poisson('goals', mu)
    trace = pm.sample_prior_predictive(1000)

Here's the prior predictive distribution of goals.

In [73]:

sample_prior_goals = trace['goals']

In [74]:

set_colors()
plot_cdf(sample_prior_goals, label='goals prior')
decorate_goals('CDF')
np.mean(sample_prior_goals)

Out[74]:

5.944

To see whether that distribution is right, I ran samples using SciPy.

In [75]:

import scipy.stats as st

def forward_hierarchical(size=1):
    alpha = st.expon().rvs(size=size)
    beta = st.expon().rvs(size=size)
    mu = st.gamma(a=alpha, scale=1/beta).rvs(size=size)
    goals = st.poisson(mu).rvs(size=size)
    return goals[0]

sample_prior_goals_st = [forward_hierarchical() for i in range(1000)];

In [76]:

set_colors()
plot_cdf(sample_prior_goals, label='goals prior')
plot_cdf(sample_prior_goals_st, label='goals prior scipy')
decorate_goals('CDF')
plt.xlim(0, 50)
plt.legend(loc='lower right')
np.mean(sample_prior_goals_st)

Out[76]:

6.937

Hierarchical inference¶

Once we have the forward process working, we only need a small change to run the reverse process.

In [77]:

model = pm.Model()

with model:
    alpha = pm.Exponential('alpha', lam=1)
    beta = pm.Exponential('beta', lam=1)
    mu = pm.Gamma('mu', alpha, beta)
    goals = pm.Poisson('goals', mu, observed=[3, 3])
    trace = pm.sample(1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [mu, beta, alpha]
Sampling 2 chains, 1 divergences: 100%|██████████| 3000/3000 [00:02<00:00, 1444.74draws/s]
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.

Here's the posterior distribution of mu. The posterior mean is close to the observed value, which is what we expect with a weakly informative prior.

In [78]:

sample_post_mu = trace['mu']
plot_cdf(sample_post_mu, label='mu posterior')
decorate_rate('CDF')
np.mean(sample_post_mu)

Out[78]:

2.865235353825642

Two teams¶

We can extend the model to estimate different values of mu for the two teams.

In [79]:

model = pm.Model()

with model:
    alpha = pm.Exponential('alpha', lam=1)
    beta = pm.Exponential('beta', lam=1)
    mu_BOS = pm.Gamma('mu_BOS', alpha, beta)
    mu_TBL = pm.Gamma('mu_TBL', alpha, beta)
    goals_BOS = pm.Poisson('goals_BOS', mu_BOS, observed=[3, 3])
    goals_TBL = pm.Poisson('goals_TBL', mu_TBL, observed=[2, 3])
    trace = pm.sample(1000, nuts_kwargs=dict(target_accept=0.95))

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [mu_TBL, mu_BOS, beta, alpha]
Sampling 2 chains, 0 divergences: 100%|██████████| 3000/3000 [00:03<00:00, 861.25draws/s]

In [80]:

with model:
    pm.traceplot(trace)

In [81]:

with model:
    pm.plot_posterior(trace['mu_BOS'])

In [82]:

with model:
    pm.plot_posterior(trace['mu_TBL'])

Here are the posterior distribitions for mu_TBL and mu_BOS.

In [83]:

sample_post_mu_TBL = trace['mu_TBL']
plot_cdf(sample_post_mu_TBL, label='mu_TBL posterior')

sample_post_mu_BOS = trace['mu_BOS']
plot_cdf(sample_post_mu_BOS, label='mu_BOS posterior')

decorate_rate('CDF')
np.mean(sample_post_mu_TBL), np.mean(sample_post_mu_BOS)

Out[83]:

(2.5273712319352972, 2.857168894779832)

On the basis of one game (and never having seen a previous game), here's the probability that Boston is the better team.

In [84]:

np.mean(sample_post_mu_BOS > sample_post_mu_TBL)

Out[84]:

0.603

More background¶

But let's take advantage of more information. Here are the results from the seven most recent Stanley Cup finals, ignoring games that went into overtime.

In [85]:

data = dict(BOS13 = [2, 1, 2],
            CHI13 = [0, 3, 3],
            NYR14 = [0, 2],
            LAK14 = [3, 1],
            TBL15 = [1, 4, 3, 1, 1, 0],
            CHI15 = [2, 3, 2, 2, 2, 2],
            SJS16 = [2, 1, 4, 1],
            PIT16 = [3, 3, 2, 3],
            NSH17 = [3, 1, 5, 4, 0, 0],
            PIT17 = [5, 4, 1, 1, 6, 2],
            VGK18 = [6, 2, 1, 2, 3],
            WSH18 = [4, 3, 3, 6, 4],
            BOS19 = [2, 7, 2, 1, 5, 1],
            STL19 = [2, 2, 4, 2, 1, 4],
            BOS20 = [3, 2],
            TBL20 = [2, 3],
           )

Here's how we can get the data into the model.

In [86]:

model = pm.Model()

with model:
    alpha = pm.Exponential('alpha', lam=1)
    beta = pm.Exponential('beta', lam=1)
    
    mu = dict()
    goals = dict()
    for name, observed in data.items():
        mu[name] = pm.Gamma('mu_'+name, alpha, beta)
        goals[name] = pm.Poisson(name, mu[name], observed=observed)
        
    trace = pm.sample(500, nuts_kwargs=dict(target_accept=0.95))

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [mu_TBL20, mu_BOS20, mu_STL19, mu_BOS19, mu_WSH18, mu_VGK18, mu_PIT17, mu_NSH17, mu_PIT16, mu_SJS16, mu_CHI15, mu_TBL15, mu_LAK14, mu_NYR14, mu_CHI13, mu_BOS13, beta, alpha]
Sampling 2 chains, 0 divergences: 100%|██████████| 3000/3000 [00:13<00:00, 224.03draws/s]

Here are the posterior means.

In [90]:

sample_post_mu_BOS = trace['mu_BOS20']
np.mean(sample_post_mu_BOS)

Out[90]:

2.479121501369579

In [91]:

sample_post_mu_TBL = trace['mu_TBL20']
np.mean(sample_post_mu_TBL)

Out[91]:

2.4738678038205064

They are lower with the background information than without, and closer together. Here's the updated chance that Boston is the better team.

In [92]:

np.mean(sample_post_mu_BOS > sample_post_mu_TBL)

Out[92]:

0.4935

In [93]:

plot_cdf(trace['alpha'], label='alpha')
plot_cdf(trace['beta'], label='beta')

plt.xlabel('Parameter of a gamma distribution')
plt.ylabel('CDF')
plt.legend();

In [94]:

np.mean(trace['alpha'])

Out[94]:

4.360177013060098

In [95]:

np.mean(trace['beta'])

Out[95]:

1.7947170342792527

In [96]:

np.mean(trace['alpha'] / trace['beta'])

Out[96]:

2.475142954843771

In [97]:

np.mean(trace['alpha'] / trace['beta']**2)

Out[97]:

1.6342667744048895

In [ ]: