Coins and dice

This notebook is an element of the free risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.

Author: Eric Marsden [email protected].


In this notebook, we illustrate NumPy features for working with discrete probability distributions, such as those resulting from a coin toss or a dice roll. For an introduction to NumPy arrays, see the NumPy tutorial.

In [1]:
import numpy
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_formats=['svg']
numpy.set_printoptions(threshold=20)

Let's simulate a coin toss by a random choice between the numbers 0 and 1 (say 0 represents tails and 1 represents heads). Note that the second argument to randint is an exclusive upper bound.

In [2]:
numpy.random.randint(0, 2)
Out[2]:
0

Let's toss two coins, which gives as an array with the result of the first and second toss.

In [3]:
numpy.random.randint(0, 2, 2)
Out[3]:
array([0, 0])

The number of heads when tossing a coin twice is simply the sum of that array.

In [4]:
numpy.random.randint(0, 2, 2).sum()
Out[4]:
2

Task: simulate number of heads when tossing a coin twice. Do this 1000 times and plot the resulting Probability Mass Function.

In [5]:
N = 1000
heads = numpy.zeros(N, dtype=int)
for i in range(N):
    heads[i] = numpy.random.randint(0, 2, 2).sum()
heads
Out[5]:
array([1, 0, 1, ..., 1, 0, 1])

Let's count how many times we have scored 0, 1 or 2 in our 1000 simulations, using the bincount function from numpy.

In [6]:
numpy.bincount(heads)
Out[6]:
array([265, 476, 259])

Let's plot that Probability Mass Function.

In [7]:
plt.stem(numpy.bincount(heads), '-.')
plt.margins(0.1)

We could also have used the itemfreq function from scipy.stats, which returns a frequency table (a two-dimensional array).

In [8]:
scipy.stats.itemfreq(heads)
Out[8]:
array([[  0, 265],
       [  1, 476],
       [  2, 259]])

The SymPy library has functionality that allows us to calculate the expected number of heads analytically.

In [9]:
import sympy.stats

toss1 = sympy.stats.Bernoulli("toss1", p=0.5)
toss2 = sympy.stats.Bernoulli("toss2", p=0.5)
sympy.stats.E(toss1 + toss2)
Out[9]:
1.00000000000000
In [10]:
sympy.stats.density(toss1 + toss2)
Out[10]:
{0: 0.250000000000000, 1: 0.500000000000000, 2: 0.250000000000000}

Expected value of a dice roll

The expected value of a dice roll is

$$\sum_{i=1}^6 i \times \frac{1}{6} = 3.5$$

That means that if we toss a dice a large number of times, the mean value should converge to 3.5. Let's check that by running a simumlation in Python.

In [11]:
N = 1000
roll = numpy.zeros(N, dtype=int)
expectation = numpy.zeros(N)
for i in range(N):
    roll[i] = numpy.random.randint(1, 7)
for i in range(1, N):
    expectation[i] = numpy.mean(roll[0:i])
plt.plot(expectation);

The sympy.stats module has functionality that allows us to evaluate the expected value analytically.

In [12]:
import sympy.stats

D = sympy.stats.Die('D', 6)
sympy.stats.E(D)
Out[12]:
7/2

Simulating dice throws

In [13]:
dice = scipy.stats.randint(1, 7)
dice.rvs(1000).max()
Out[13]:
6

What is the probability of a die rolling 4?

In [14]:
dice.pmf(4)
Out[14]:
0.16666666666666666

What is the probability of rolling 4 or below?

In [15]:
dice.cdf(4)
Out[15]:
0.66666666666666663

What is the probability of rolling between 2 and 4 (inclusive)?

In [16]:
dice.cdf(4) - dice.cdf(1)
Out[16]:
0.5

Again, using sympy.stats to calculate this analytically:

The probability of a dice roll of 4:

In [17]:
sympy.stats.P(sympy.Eq(D, 4))
Out[17]:
1/6

Probability of rolling 4 or below:

In [18]:
sympy.stats.P(D <= 4)
Out[18]:
2/3

Probability of rolling between 2 and 4 (inclusive):

In [19]:
sympy.stats.P(sympy.And(D >= 2, D <= 4))
Out[19]:
1/2

The SymPy library also allows us to calculate more complicated expressions, such as the probability that the sum of three dice is greater than 10, given that the first throw is bigger than 4. This is called a conditional probability.

In [20]:
D1 = sympy.stats.Die('D1', 6)
D2 = sympy.stats.Die('D2', 6) 
D3 = sympy.stats.Die('D3', 6) 
sympy.stats.P(D1 + D2 + D3 > 10, D1 > 4)
Out[20]:
7/9
In [21]:
Z = D1 + D2 + D3
pmf = sympy.stats.density(Z)
keys = pmf.keys()
plt.stem(list(keys), [pmf[k] for k in keys]);

The gambles of the Chevalier de Méré

Let's analyze two problems which were discussed between Antoine Gombaud, chevalier de Méré, a passionate gambler, and the mathematician Blaise Pascal. This discussion helped in the development of probability theory.

The first problem: is it a good idea to gamble on the appearance of at least one 6 when a dice is thrown 4 times?

The probability of losing this gamble is easy to calculate analytically: each throw has 5 chances out of 6 of not seeing a 6, and the events are independent. So the probability of winning is

In [22]:
1 - (5/6.0)**4
Out[22]:
0.5177469135802468

The probability of winning is greater than 0.5. We can also check this problem with SymPy:

In [23]:
import sympy.stats

D1 = sympy.stats.Die('D1', 6)
D2 = sympy.stats.Die('D2', 6)
D3 = sympy.stats.Die('D3', 6)
D4 = sympy.stats.Die('D4', 6)

sympy.stats.P(sympy.Or(D1 > 5, sympy.Or(D2 > 5, sympy.Or(D3 > 5, D4 > 5)))) + 0.0
Out[23]:
0.517746913580247

The second problem: is it a good idea to gamble on the appearance of at least one double six when two dice are thrown 24 times?

The probability of losing this gamble is also easy to calculate: there are 35 chances out of 36 (6 * 6) of not seeing a double 6 on each double throw, so the probability of winning is

In [24]:
1 - (35/36.0)**24
Out[24]:
0.4914038761309034

So this is not a good gamble. We can also calculate this analytically with SymPy, by modelling a Binomial random variable:

In [25]:
A = sympy.stats.Die('A', 6)
B = sympy.stats.Die('B', 6)
doublesix = sympy.stats.Binomial('DoubleSix', 24, sympy.stats.P(sympy.And(A > 5, B > 5)))
sympy.stats.P(doublesix >= 1) + 0.0
Out[25]:
0.491403876130903