Think Bayes¶

This notebook presents example code and exercise solutions for Think Bayes.

MIT License: https://opensource.org/licenses/MIT

In [1]:

# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

# import classes from thinkbayes2
from thinkbayes2 import Pmf, Suite

import thinkbayes2
import thinkplot

The Dungeons and Dragons club¶

Suppose there are 10 people in my Dungeons and Dragons club; on any game day, each of them has a 70% chance of showing up.

Each player has one character and each character has 6 attributes, each of which is generated by rolling and adding up 3 6-sided dice.

At the beginning of the game, I ask whose character has the lowest attribute. The wizard says, "My constitution is 5; does anyone have a lower attribute?", and no one does.

The warrior says "My strength is 16; does anyone have a higher attribute?", and no one does.

How many characters are in the party?

The prior¶

There are three ways to compute the prior distribution:

Simulation
Convolution
Analytic distribution

First, simulation. Here's a function that flips a coin with probability p:

In [2]:

from random import random

def flip(p):
    return random() < p

We can use it to flip a coin for each member of the club.

In [3]:

flips = [flip(0.7) for i in range(10)]

Out[3]:

[False, True, False, True, False, False, True, False, False, True]

And count the number that show up on game day.

In [4]:

sum(flips)

Out[4]:

Let's encapsulate that in a function that simulates a game day.

In [5]:

def game_day(n, p):
    flips = [flip(p) for i in range(n)]
    return sum(flips)

In [6]:

game_day(10, 0.7)

Out[6]:

If we run that function many times, we get a sample from the distribution of the number of players.

In [7]:

sample = [game_day(10, 0.7) for i in range(1000)]
pmf_sample = Pmf(sample)
thinkplot.Hist(pmf_sample)

The second method is convolution. Instead of flipping a coin, we can create a Pmf object that represents the distribution of outcomes from a single flip.

In [8]:

def coin(p):
    return Pmf({1:p, 0:1-p})

Here's what it looks like.

In [9]:

player = coin(0.7)
player.Print()

0 0.30000000000000004
1 0.7

If we have two players, there are three possible outcomes:

In [10]:

(player + player).Print()

0 0.09000000000000002
1 0.42000000000000004
2 0.48999999999999994

If we have 10 players, we can get the prior distribution like this:

In [11]:

prior = sum([player]*10)
prior.Print()

0 5.9049000000000085e-06
1 0.00013778100000000018
2 0.0014467005000000017
3 0.009001692000000009
4 0.036756909000000025
5 0.10291934520000004
6 0.20012094900000005
7 0.26682793200000005
8 0.2334744405
9 0.12106082099999994
10 0.028247524899999984

Now we can compare the results of simulation and convolution:

In [12]:

thinkplot.Hist(pmf_sample, color='C0')
thinkplot.Pmf(prior, color='C1')

thinkplot.decorate(xlabel='Number of players',
                   ylabel='PMF')

Finally, we can use an analytic distribution. The distribution we just computed is the binomial distribution, which has the following PMF:

$ PMF(k; n, p) = P(k ~|~ n, p) = {n \choose k}\,p^{k}(1-p)^{n-k}$

We could evalate the right hand side in Python, or use MakeBinomialPmf:

In [13]:

help(thinkbayes2.MakeBinomialPmf)

Help on function MakeBinomialPmf in module thinkbayes2.thinkbayes2:

MakeBinomialPmf(n, p)
    Evaluates the binomial PMF.
    
    n: number of trials
    p: probability of success on each trial
    
    Returns: Pmf of number of successes

And we can confirm that the analytic result matches what we computed by convolution.

In [14]:

binomial = thinkbayes2.MakeBinomialPmf(10, 0.7)
thinkplot.Pmf(prior, color='C1')
thinkplot.Pmf(binomial, color='C2', linestyle='dotted')

thinkplot.decorate(xlabel='Number of players',
                   ylabel='PMF')

Since two players spoke, we can eliminate the possibility of 0 or 1 players:

In [15]:

thinkplot.Pmf(prior, color='gray')
del prior[0]
del prior[1]
prior.Normalize()
thinkplot.Pmf(prior, color='C1')

thinkplot.decorate(xlabel='Number of players',
                   ylabel='PMF')

Likelihood¶

There are three components of the likelihood function:

The probability that the highest attribute is 16.
The probability that the lowest attribute is 5.
The probability that the lowest and highest attributes are held by different players.

To compute the first component, we have to compute the distribution of the maximum of $6n$ attributes, where $n$ is the number of players.

Here is the distribution for a single die.

In [16]:

d6 = Pmf([1,2,3,4,5,6])
d6.Print()

1 0.16666666666666666
2 0.16666666666666666
3 0.16666666666666666
4 0.16666666666666666
5 0.16666666666666666
6 0.16666666666666666

And here's the distribution for the sum of three dice.

In [17]:

thrice = sum([d6] * 3)
thinkplot.Pdf(thrice)
thinkplot.decorate(xlabel='Attribute',
                   ylabel='PMF')

Here's the CDF for the sum of three dice.

In [18]:

cdf_thrice = thrice.MakeCdf()
thinkplot.Cdf(cdf_thrice)
thinkplot.decorate(xlabel='Attribute',
                   ylabel='CDF')

The Max method raises the CDF to a power. So here's the CDF for the maximum of six attributes.

In [19]:

cdf_max_6 = cdf_thrice.Max(6)
thinkplot.Cdf(cdf_max_6)
thinkplot.decorate(xlabel='Attribute',
                   ylabel='CDF',
                   title='Maximum of 6 attributes')

If there are n players, there are 6*n attributes. Here are the distributions for the maximum attribute of n players, for a few values of n.

In [20]:

for n in range(2, 10, 2):
    cdf_max = cdf_thrice.Max(n*6)
    thinkplot.Cdf(cdf_max, label='n=%s'%n)

thinkplot.decorate(xlabel='Attribute',
                   ylabel='CDF',
                   title='Maximum of 6*n attributes')

To check that, I'll compute the CDF for 7 players, and estimate it by simulation.

In [21]:

n = 7
cdf = cdf_thrice.Max(n*6)
thinkplot.Cdf(cdf, label='n=%s'%n)

sample_max = [max(cdf_thrice.Sample(42)) for i in range(1000)]
thinkplot.Cdf(thinkbayes2.Cdf(sample_max), label='sample')

thinkplot.decorate(xlabel='Attribute',
                   ylabel='CDF',
                   title='Maximum of 6*n attributes')

Looks good.

Now, to compute the minimum, I have to write my own function, because Cdf doesn't provide a Min function.

In [22]:

def compute_cdf_min(cdf, k):
    """CDF of the min of k samples from cdf.
    
    cdf: Cdf object
    k: number of samples
    
    returns: new Cdf object
    """
    cdf_min = cdf.Copy()
    cdf_min.ps = 1 - (1 - cdf_min.ps)**k
    return cdf_min

Now we can compute the CDF of the minimum attribute for n players, for several values of n.

In [23]:

for n in range(2, 10, 2):
    cdf_min = compute_cdf_min(cdf_thrice, n*6)
    thinkplot.Cdf(cdf_min, label='n=%s'%n)

thinkplot.decorate(xlabel='Attribute',
                   ylabel='CDF',
                   title='Minimum of 6*n attributes')

And again we can check it by comparing to simulation results.

In [24]:

n = 7
cdf = compute_cdf_min(cdf_thrice, n*6)
thinkplot.Cdf(cdf, label='n=%s'%n)

sample_min = [min(cdf_thrice.Sample(42)) for i in range(1000)]
thinkplot.Cdf(thinkbayes2.Cdf(sample_min), label='sample')

thinkplot.decorate(xlabel='Attribute',
                   ylabel='CDF',
                   title='Minimum of 6*n attributes')

For efficiency and conciseness, it is helpful to precompute the distributions for the relevant values of n, and store them in dictionaries.

In [25]:

like_min = {}
like_max = {}

for n in range(2, 11):
    cdf_min = compute_cdf_min(cdf_thrice, n*6)
    like_min[n] = cdf_min.MakePmf()
    cdf_max = cdf_thrice.Max(n*6)
    like_max[n] = cdf_max.MakePmf()
    print(like_min[n][5], like_max[n][16])

0.23288163935889017 0.23288163935889017
0.28826338107405935 0.2882633810740594
0.31794402625472684 0.3179440262547268
0.32955796250238156 0.32955796250238156
0.32871475364520075 0.3287147536452008
0.3195146933518256 0.3195146933518255
0.3049352170780888 0.30493521707808885
0.2871203018328896 0.28712030183288956
0.2675970126720095 0.2675970126720096

The output shows that the particular data we saw is symmetric: the chance that 16 is the maximum is the same as the chance that 5 is the minimum.

Finally, we need the probability that the minimum and maximum are held by the same person. If there are n players, there are 6*n attributes.

Let's call the player with the highest attribute Max. What is the chance that Max also has the lowest attribute? Well Max has 5 more attributes, out of a total of 6*n-1 remaining attributes.

So here's prob_same as a function of n.

In [26]:

def prob_same(n):
    return 5 / (6*n-1)

for n in range(2, 11):
    print(n, prob_same(n))

2 0.45454545454545453
3 0.29411764705882354
4 0.21739130434782608
5 0.1724137931034483
6 0.14285714285714285
7 0.12195121951219512
8 0.10638297872340426
9 0.09433962264150944
10 0.0847457627118644

The update¶

Here's a class that implements this likelihood function.

In [27]:

class Dungeons(Suite):
    
    def Likelihood(self, data, hypo):
        """Probability of the data given the hypothesis.
        
        data: lowest attribute, highest attribute, boolean
              (whether the same person has both)
        hypo: number of players
        
        returns: probability
        """
        lowest, highest, same = data
        n = hypo
        
        p = prob_same(n)
        like = p if same else 1-p
        
        like *= like_min[n][lowest]
        like *= like_max[n][highest]

        return like

Here's the prior we computed above.

In [28]:

suite = Dungeons(prior)
thinkplot.Hist(suite)
thinkplot.decorate(xlabel='Number of players',
                   ylabel='PMF')
suite.Mean()

Out[28]:

7.000868145040201

And here's the update based on the data in the problem statement.

In [29]:

suite.Update((5, 16, False))

Out[29]:

0.08548474490284354

Here's the posterior.

In [30]:

thinkplot.Hist(suite)
thinkplot.decorate(xlabel='Number of players',
                   ylabel='PMF')
suite.Mean()

Out[30]:

6.940862784521086

In [31]:

suite.Print()

2 0.0005007044860801902
3 0.006177449626400222
4 0.03402191676923563
5 0.10823000113979393
6 0.21684925990462955
7 0.279837163496623
8 0.22697589141459018
9 0.10574761295351938
10 0.02166000020912791

Based on the data, I am 94% sure there are between 5 and 9 players.

In [32]:

suite.CredibleInterval()

Out[32]:

(5, 9)

In [33]:

sum(suite[n] for n in [5,6,7,8,9])

Out[33]:

0.9376399289091562

In [ ]: