The World Cup Problem: Germany v. Brazil¶

PyMC solution to Allen D.s soccer problem.

We'll use an exponential prior, instead of a Uniform prior, but also with mean equal to the observed sample mean goals per team per game.

In [12]:

import pymc as pm
%pylab inline
figsize(12,6)

Populating the interactive namespace from numpy and matplotlib

In [69]:

avg_goals_per_team = 1.34
duration_of_game = 93.

In [71]:

#prior
lambda_ = pm.Exponential('lambda_', duration_of_game/avg_goals_per_team)
sample = np.array([lambda_.random() for i in range(10000)])
hist(sample, bins=30);
plt.title('Prior distribution: Exponential with mean equal to observed mean');

In [72]:

sample_points_per_team = np.random.poisson(duration_of_game*sample)
hist(sample_points_per_team, bins=sample_points_per_team.max(), normed=True);
plt.title('Hypothetical goals/game/team,\ngiven homogeneous Poisson Process model assumptions')
plt.ylabel('Probability')
plt.xlabel('Number of goals');

print sample_points_per_team.mean()

1.3133

In [73]:

duration_between_goals = [11, 12]

obs = pm.Exponential('obs', lambda_, observed=True, value=duration_between_goals)

prediction = pm.Poisson('pred', (duration_of_game-23)*lambda_)

mcmc = pm.MCMC([lambda_, obs, prediction])
mcmc.sample(15000,5000)

 [-----------------100%-----------------] 15000 of 15000 complete in 2.0 sec

In [74]:

prediction_trace = mcmc.trace('pred')[:]
hist(prediction_trace,bins=max(prediction_trace), normed=True);
plt.title("Predictive distribution of Germany's goals in the next 70 minutes")
plt.ylabel('Probability')
plt.xlabel('Number of goals');

In [76]:

(prediction_trace >= 5).mean()

Out[76]:

0.128

We get more than twice the probability that Germany scores 5 or more goals. Why such a large difference? Since we have very few data points (only two), our prior still has lots of influence on the posterior, so our results are very depenedent on prior chosen - and Allen and I choose different priors (uniform vs. exponential).

In [ ]: