Code and exercises from my workshop on Bayesian statistics in Python.
Copyright 2016 Allen Downey
MIT License: https://opensource.org/licenses/MIT
# If we're running on Colab, install empiricaldist
# https://pypi.org/project/empiricaldist/
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
!pip install empiricaldist
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_style('white')
sns.set_context('talk')
import matplotlib.pyplot as plt
from empiricaldist import Pmf
Create a Pmf object to represent a six-sided die.
d6 = Pmf()
A Pmf is a map from possible outcomes to their probabilities.
for x in [1,2,3,4,5,6]:
d6[x] = 1
Initially the probabilities don't add up to 1.
d6
probs | |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 1 |
5 | 1 |
6 | 1 |
normalize
adds up the probabilities and divides through. The return value is the total probability before normalizing.
d6.normalize()
6
Now the Pmf is normalized.
d6
probs | |
---|---|
1 | 0.166667 |
2 | 0.166667 |
3 | 0.166667 |
4 | 0.166667 |
5 | 0.166667 |
6 | 0.166667 |
And we can compute its mean (which only works if it's normalized).
d6.mean()
3.5
choice
chooses a random values from the Pmf.
d6.choice(size=10)
array([5, 1, 3, 1, 4, 5, 1, 5, 4, 3])
bar
plots the Pmf as a bar chart
def decorate_dice(title):
"""Labels the axes.
title: string
"""
plt.xlabel('Outcome')
plt.ylabel('PMF')
plt.title(title)
d6.bar()
decorate_dice('One die')
d6.add_dist(d6)
creates a new Pmf
that represents the sum of two six-sided dice.
twice = d6.add_dist(d6)
twice
probs | |
---|---|
2 | 0.027778 |
3 | 0.055556 |
4 | 0.083333 |
5 | 0.111111 |
6 | 0.138889 |
7 | 0.166667 |
8 | 0.138889 |
9 | 0.111111 |
10 | 0.083333 |
11 | 0.055556 |
12 | 0.027778 |
Exercise 1: Plot twice
and compute its mean.
# Solution
twice.bar()
decorate_dice('Two dice')
twice.mean()
6.999999999999998
Exercise 2: Suppose I roll two dice and tell you the result is greater than 3.
Plot the Pmf
of the remaining possible outcomes and compute its mean.
# Solution
twice_gt3 = d6.add_dist(d6)
twice_gt3[2] = 0
twice_gt3[3] = 0
twice_gt3.normalize()
twice_gt3.bar()
decorate_dice('Two dice, greater than 3')
twice_gt3.mean()
7.393939393939394
Bonus exercise: In Dungeons and Dragons, the amount of damage a goblin can withstand is the sum of two six-sided dice. The amount of damage you inflict with a short sword is determined by rolling one six-sided die.
Suppose you are fighting a goblin and you have already inflicted 3 points of damage. What is your probability of defeating the goblin with your next successful attack?
Hint: Pmf
provides comparator functions like gt_dist
and le_dist
, which compare two distributions and return a probability.
# Solution
damage = d6.add_dist(3)
damage.bar()
decorate_dice('Total Damage')
# Solution
hit_points = d6.add_dist(d6)
damage.ge_dist(hit_points)
0.5
Pmf.from_seq
makes a Pmf
object from a sequence of values.
Here's how we can use it to create a Pmf
with two equally likely hypotheses.
cookie = Pmf.from_seq(['Bowl 1', 'Bowl 2'])
cookie
probs | |
---|---|
Bowl 1 | 0.5 |
Bowl 2 | 0.5 |
Now we can update each hypothesis with the likelihood of the data (a vanilla cookie).
cookie['Bowl 1'] *= 0.75
cookie['Bowl 2'] *= 0.5
cookie.normalize()
0.625
And display the posterior probabilities.
cookie
probs | |
---|---|
Bowl 1 | 0.6 |
Bowl 2 | 0.4 |
Exercise 3: Suppose we put the first cookie back, stir, choose again from the same bowl, and get a chocolate cookie.
What are the posterior probabilities after the second cookie?
Hint: The posterior (after the first cookie) becomes the prior (before the second cookie).
# Solution
cookie['Bowl 1'] *= 0.25
cookie['Bowl 2'] *= 0.5
cookie.normalize()
cookie
probs | |
---|---|
Bowl 1 | 0.428571 |
Bowl 2 | 0.571429 |
Exercise 4: Instead of doing two updates, what if we collapse the two pieces of data into one update?
Re-initialize Pmf
with two equally likely hypotheses and perform one update based on two pieces of data, a vanilla cookie and a chocolate cookie.
The result should be the same regardless of how many updates you do (or the order of updates).
# Solution
cookie = Pmf.from_seq(['Bowl 1', 'Bowl 2'])
cookie['Bowl 1'] *= 0.75 * 0.25
cookie['Bowl 2'] *= 0.5 * 0.5
cookie.normalize()
cookie
probs | |
---|---|
Bowl 1 | 0.428571 |
Bowl 2 | 0.571429 |