import pymc as pm %pylab inline figsize(12,6)
Populating the interactive namespace from numpy and matplotlib
avg_goals_per_team = 1.34 duration_of_game = 93.
#prior lambda_ = pm.Exponential('lambda_', duration_of_game/avg_goals_per_team) sample = np.array([lambda_.random() for i in range(10000)]) hist(sample, bins=30); plt.title('Prior distribution: Exponential with mean equal to observed mean');
sample_points_per_team = np.random.poisson(duration_of_game*sample) hist(sample_points_per_team, bins=sample_points_per_team.max(), normed=True); plt.title('Hypothetical goals/game/team,\ngiven homogeneous Poisson Process model assumptions') plt.ylabel('Probability') plt.xlabel('Number of goals'); print sample_points_per_team.mean()
duration_between_goals = [11, 12] obs = pm.Exponential('obs', lambda_, observed=True, value=duration_between_goals) prediction = pm.Poisson('pred', (duration_of_game-23)*lambda_) mcmc = pm.MCMC([lambda_, obs, prediction]) mcmc.sample(15000,5000)
[-----------------100%-----------------] 15000 of 15000 complete in 2.0 sec
prediction_trace = mcmc.trace('pred')[:] hist(prediction_trace,bins=max(prediction_trace), normed=True); plt.title("Predictive distribution of Germany's goals in the next 70 minutes") plt.ylabel('Probability') plt.xlabel('Number of goals');
(prediction_trace >= 5).mean()
We get more than twice the probability that Germany scores 5 or more goals. Why such a large difference? Since we have very few data points (only two), our prior still has lots of influence on the posterior, so our results are very depenedent on prior chosen - and Allen and I choose different priors (uniform vs. exponential).