This notebook covers the basics of probability theory, with Python 3 implementations. (You should have some background in probability and Python.)
In 1814, Pierre-Simon Laplace wrote:
Probability ... is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible ... when nothing leads us to expect that any one of these cases should occur more than any other.
Laplace really nailed it, way back then! If you want to untangle a probability problem, all you have to do is be methodical about defining exactly what the cases are, and then careful in counting the number of favorable and total cases. We'll start being methodical by defining some vocabulary:
4
.{1, 2, 3, 4, 5, 6}
.{2, 4, 6}
.
For example, the probability of an even die roll is 3/6 = 1/2.
This notebook will develop all these concepts; I also have a second part that covers paradoxes in Probability Theory.
P
¶P
is the traditional name for the Probability function:
from fractions import Fraction
def P(event, space):
"The probability of an event, given a sample space of equiprobable outcomes."
return Fraction(len(event & space),
len(space))
Read this as implementing Laplace's quote directly: "Probability is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible."
What's the probability of rolling an even number with a single six-sided fair die?
We can define the sample space D
and the event even
, and compute the probability:
D = {1, 2, 3, 4, 5, 6}
even = { 2, 4, 6}
P(even, D)
Fraction(1, 2)
It is good to confirm what we already knew.
You may ask: Why does the definition of P
use len(event & space)
rather than len(event)
? Because I don't want to count outcomes that were specified in event
but aren't actually in the sample space. Consider:
even = {2, 4, 6, 8, 10, 12}
P(even, D)
Fraction(1, 2)
Here, len(event)
and len(space)
are both 6, so if just divided, then P
would be 1, which is not right.
The favorable cases are the intersection of the event and the space, which in Python is (event & space)
.
Also note that I use Fraction
rather than regular division because I want exact answers like 1/3, not 0.3333333333333333.
Around 1700, Jacob Bernoulli wrote about removing colored balls from an urn in his landmark treatise Ars Conjectandi, and ever since then, explanations of probability have relied on urn problems. (You'd think the urns would be empty by now.)
For example, here is a three-part problem adapted from mathforum.org:
An urn contains 23 balls: 8 white, 6 blue, and 9 red. We select six balls at random (each possible selection is equally likely). What is the probability of each of these possible outcomes:
- all balls are red
So, an outcome is a set of 6 balls, and the sample space is the set of all possible 6 ball combinations. We'll solve each of the 3 parts using our P
function, and also using basic arithmetic; that is, counting. Counting is a bit tricky because:
To account for the first issue, I'll have 8 different white balls labelled 'W1'
through 'W8'
, rather than having eight balls all labelled 'W'
. That makes it clear that selecting 'W1'
is different from selecting 'W2'
.
The second issue is handled automatically by the P
function, but if I want to do calculations by hand, I will sometimes first count the number of permutations of balls, then get the number of combinations by dividing the number of permutations by c!, where c is the number of balls in a combination. For example, if I want to choose 2 white balls from the 8 available, there are 8 ways to choose a first white ball and 7 ways to choose a second, and therefore 8 × 7 = 56 permutations of two white balls. But there are only 56 / 2 = 28 combinations, because (W1, W2)
is the same combination as (W2, W1)
.
We'll start by defining the contents of the urn:
def cross(A, B):
"The set of ways of concatenating one item from collection A with one from B."
return {a + b
for a in A for b in B}
urn = cross('W', '12345678') | cross('B', '123456') | cross('R', '123456789')
urn
{'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'R1', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8', 'R9', 'W1', 'W2', 'W3', 'W4', 'W5', 'W6', 'W7', 'W8'}
len(urn)
23
Now we can define the sample space, U6
, as the set of all 6-ball combinations. We use itertools.combinations
to generate the combinations, and then join each combination into a string:
import itertools
def combos(items, n):
"All combinations of n items; each combo as a concatenated str."
return {' '.join(combo)
for combo in itertools.combinations(items, n)}
U6 = combos(urn, 6)
len(U6)
100947
I don't want to print all 100,947 members of the sample space; let's just peek at a random sample of them:
import random
random.sample(U6, 10)
['W6 B1 B5 W7 R4 R6', 'W3 R9 W5 R8 B3 B2', 'R5 B6 W3 R4 R6 R2', 'R5 W3 B1 W2 B3 B2', 'R9 B4 R6 R1 B2 W8', 'W6 B1 R8 W2 W4 R6', 'B5 R3 W2 R2 R1 B2', 'W3 W6 R7 W7 R4 W2', 'R5 B6 W5 W2 W4 R2', 'R5 B6 W7 R3 B4 W8']
Is 100,947 really the right number of ways of choosing 6 out of 23 items, or "23 choose 6", as mathematicians call it? Well, we can choose any of 23 for the first item, any of 22 for the second, and so on down to 18 for the sixth. But we don't care about the ordering of the six items, so we divide the product by 6! (the number of permutations of 6 things) giving us:
$$23 ~\mbox{choose}~ 6 = \frac{23 \cdot 22 \cdot 21 \cdot 20 \cdot 19 \cdot 18}{6!} = 100947$$Note that $23 \cdot 22 \cdot 21 \cdot 20 \cdot 19 \cdot 18 = 23! \;/\; 17!$, so, generalizing, we can write:
$$n ~\mbox{choose}~ c = \frac{n!}{(n - c)! \cdot c!}$$And we can translate that to code and verify that 23 choose 6 is 100,947:
from math import factorial
def choose(n, c):
"Number of ways to choose c items from a list of n items."
return factorial(n) // (factorial(n - c) * factorial(c))
choose(23, 6)
100947
Now we're ready to answer the 4 problems:
red6 = {s for s in U6 if s.count('R') == 6}
P(red6, U6)
Fraction(4, 4807)
Let's investigate a bit more. How many ways of getting 6 red balls are there?
len(red6)
84
Why are there 84 ways? Because there are 9 red balls in the urn, and we are asking how many ways we can choose 6 of them:
choose(9, 6)
84
So the probabilty of 6 red balls is then just 9 choose 6 divided by the size of the sample space:
P(red6, U6) == Fraction(choose(9, 6),
len(U6))
True
b3w2r1 = {s for s in U6 if
s.count('B') == 3 and s.count('W') == 2 and s.count('R') == 1}
P(b3w2r1, U6)
Fraction(240, 4807)
We can get the same answer by counting how many ways we can choose 3 out of 6 blues, 2 out of 8 whites, and 1 out of 9 reds, and dividing by the number of possible selections:
P(b3w2r1, U6) == Fraction(choose(6, 3) * choose(8, 2) * choose(9, 1),
len(U6))
True
Here we don't need to divide by any factorials, because choose
has already accounted for that.
We can get the same answer by figuring: "there are 6 ways to pick the first blue, 5 ways to pick the second blue, and 4 ways to pick the third; then 8 ways to pick the first white and 7 to pick the second; then 9 ways to pick a red. But the order 'B1, B2, B3'
should count as the same as 'B2, B3, B1'
and all the other orderings; so divide by 3! to account for the permutations of blues, by 2! to account for the permutations of whites, and by 100947 to get a probability:
P(b3w2r1, U6) == Fraction((6 * 5 * 4) * (8 * 7) * 9,
factorial(3) * factorial(2) * len(U6))
True
We can interpret this as choosing 4 out of the 8 white balls, and 2 out of the 15 non-white balls. Then we can solve it the same three ways:
w4 = {s for s in U6 if
s.count('W') == 4}
P(w4, U6)
Fraction(350, 4807)
P(w4, U6) == Fraction(choose(8, 4) * choose(15, 2),
len(U6))
True
P(w4, U6) == Fraction((8 * 7 * 6 * 5) * (15 * 14),
factorial(4) * factorial(2) * len(U6))
True
# r=red bin; b=blue bin; o=orange; a=apple.
urn = ['ra']*2 + ['ro']*6 + ['bo'] + ['ba']*3
urn
['ra', 'ra', 'ro', 'ro', 'ro', 'ro', 'ro', 'ro', 'bo', 'ba', 'ba', 'ba']
def make_k(item, k):
"Create a set of k instances of item, numbered accordingly."
return {item + i for i in k}
#return {a + b
# for a in A for b in B}
make_k('ra', '12')
{'ra1', 'ra2'}
urn = make_k('ra', '12') | make_k('ro', '123456') | make_k('bo', '1') | make_k('ba', '123')
urn
{'ba1', 'ba2', 'ba3', 'bo1', 'ra1', 'ra2', 'ro1', 'ro2', 'ro3', 'ro4', 'ro5', 'ro6'}
p(fruit=apple, bin=red) = ?
a_and_r = {f for f in urn if 'ra' in f}
len(a_and_r)
2
P(a_and_r, urn)
Fraction(1, 6)
To calculate the probability of an even die roll, I originally said
even = {2, 4, 6}
But that's inelegant—I had to explicitly enumerate all the even numbers from one to six. If I ever wanted to deal with a twelve or twenty-sided die, I would have to go back and change even. I would prefer to define even once and for all like this:
def even(n): return n % 2 == 0
Now in order to make P(even, D)
work, I'll have to modify P
to accept an event as either
a set of outcomes (as before), or a predicate over outcomes—a function that returns true for an outcome that is in the event:
def P(event, space):
"""The probability of an event, given a sample space of equiprobable outcomes.
event can be either a set of outcomes, or a predicate (true for outcomes in the event)."""
if is_predicate(event):
event = such_that(event, space)
return Fraction(len(event & space), len(space))
is_predicate = callable
def such_that(predicate, collection):
"The subset of elements in the collection for which the predicate is true."
return {e for e in collection if predicate(e)}
Here we see how such_that
, the new even
predicate, and the new P
work:
such_that(even, D)
{2, 4, 6}
P(even, D)
Fraction(1, 2)
D12 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
such_that(even, D12)
{2, 4, 6, 8, 10, 12}
P(even, D12)
Fraction(1, 2)
Note: such_that
is just like the built-in function filter
, except such_that
returns a set.
We can now define more interesting events using predicates; for example we can determine the probability that the sum of a three-dice roll is prime (using a definition of is_prime
that is efficient enough for small n
):
D3 = {(d1, d2, d3) for d1 in D for d2 in D for d3 in D}
def prime_sum(outcome): return is_prime(sum(outcome))
def is_prime(n): return n > 1 and not any(n % i == 0 for i in range(2, n))
P(prime_sum, D3)
Fraction(73, 216)
How many people do you think it takes before it is more likely than not that someone shares a birthday?
How to work this out? Turns out it's easier (as it is often is!) to figure out the complement; i.e., that no two people share the same birthday, and then subtract this from 1.
$p$ that person one has unique birthday is $\frac{365}{365} = 1$
Second person is $\frac{364}{365}$ (can pick any of the other 364 days). This is a conditional probability; we can simply multiply: $\frac{365}{365} \cdot \frac{364}{365}$. Then the third person? If we multiply these out (up to some $n$ number of people), we have the probability that no one shares a birthday.
def shared_birthday_p(n):
cond_p = 1
for i in range(n-1):
cond_p = cond_p * (365-(i+1))/365
return 1-cond_p
shared_birthday_p(23)
0.5072972343239855