Peter Norvig, 12 Feb 2016. Original notebook: Modifications by Byron Wallace for DS4420 (Sp 2020).

A Concrete Introduction to Probability (using Python)

This notebook covers the basics of probability theory, with Python 3 implementations. (You should have some background in probability and Python.)

In 1814, Pierre-Simon Laplace wrote:

Probability ... is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible ... when nothing leads us to expect that any one of these cases should occur more than any other.


Pierre-Simon Laplace

Laplace really nailed it, way back then! If you want to untangle a probability problem, all you have to do is be methodical about defining exactly what the cases are, and then careful in counting the number of favorable and total cases. We'll start being methodical by defining some vocabulary:

  • Experiment: An occurrence with an uncertain outcome that we can observe.
    For example, rolling a die.
  • Outcome: The result of an experiment; one particular state of the world. What Laplace calls a "case."
    For example: 4.
  • Sample Space The set of all possible outcomes for the experiment.
    For example, {1, 2, 3, 4, 5, 6}.
  • Event A subset of possible outcomes that together have some property we are interested in.
    For example, the event "even die roll" is the set of outcomes {2, 4, 6}.
  • Probability As Laplace said, the probability of an event with respect to a sample space is the number of favorable cases (outcomes from the sample space that are in the event) divided by the total number of cases in the sample space. (This assumes that all outcomes in the sample space are equally likely.) Since it is a ratio, probability will always be a number between 0 (representing an impossible event) and 1 (representing a certain event).
    For example, the probability of an even die roll is 3/6 = 1/2.

This notebook will develop all these concepts; I also have a second part that covers paradoxes in Probability Theory.

Code for P

P is the traditional name for the Probability function:

In [15]:
from fractions import Fraction

def P(event, space): 
    "The probability of an event, given a sample space of equiprobable outcomes."
    return Fraction(len(event & space), 

Read this as implementing Laplace's quote directly: "Probability is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible."

Warm-up Problem: Die Roll

What's the probability of rolling an even number with a single six-sided fair die?

We can define the sample space D and the event even, and compute the probability:

In [16]:
D    = {1, 2, 3, 4, 5, 6}
even = {   2,    4,    6}

P(even, D)
Fraction(1, 2)

It is good to confirm what we already knew.

You may ask: Why does the definition of P use len(event & space) rather than len(event)? Because I don't want to count outcomes that were specified in event but aren't actually in the sample space. Consider:

In [17]:
even = {2, 4, 6, 8, 10, 12}

P(even, D)
Fraction(1, 2)

Here, len(event) and len(space) are both 6, so if just divided, then P would be 1, which is not right. The favorable cases are the intersection of the event and the space, which in Python is (event & space). Also note that I use Fraction rather than regular division because I want exact answers like 1/3, not 0.3333333333333333.

Urn Problems

Around 1700, Jacob Bernoulli wrote about removing colored balls from an urn in his landmark treatise Ars Conjectandi, and ever since then, explanations of probability have relied on urn problems. (You'd think the urns would be empty by now.)

Jacob Bernoulli

Jacob Bernoulli

For example, here is a three-part problem adapted from

An urn contains 23 balls: 8 white, 6 blue, and 9 red. We select six balls at random (each possible selection is equally likely). What is the probability of each of these possible outcomes:

  1. all balls are red
  2. 3 are blue, 2 are white, and 1 is red
  3. exactly 4 balls are white

So, an outcome is a set of 6 balls, and the sample space is the set of all possible 6 ball combinations. We'll solve each of the 3 parts using our P function, and also using basic arithmetic; that is, counting. Counting is a bit tricky because:

  • We have multiple balls of the same color.
  • An outcome is a set of balls, where order doesn't matter, not a sequence, where order matters.

To account for the first issue, I'll have 8 different white balls labelled 'W1' through 'W8', rather than having eight balls all labelled 'W'. That makes it clear that selecting 'W1' is different from selecting 'W2'.

The second issue is handled automatically by the P function, but if I want to do calculations by hand, I will sometimes first count the number of permutations of balls, then get the number of combinations by dividing the number of permutations by c!, where c is the number of balls in a combination. For example, if I want to choose 2 white balls from the 8 available, there are 8 ways to choose a first white ball and 7 ways to choose a second, and therefore 8 × 7 = 56 permutations of two white balls. But there are only 56 / 2 = 28 combinations, because (W1, W2) is the same combination as (W2, W1).

We'll start by defining the contents of the urn:

In [12]:
def cross(A, B):
    "The set of ways of concatenating one item from collection A with one from B."
    return {a + b 
            for a in A for b in B}

urn = cross('W', '12345678') | cross('B', '123456') | cross('R', '123456789') 

In [14]:

Now we can define the sample space, U6, as the set of all 6-ball combinations. We use itertools.combinations to generate the combinations, and then join each combination into a string:

In [18]:
import itertools

def combos(items, n):
    "All combinations of n items; each combo as a concatenated str."
    return {' '.join(combo) 
            for combo in itertools.combinations(items, n)}

U6 = combos(urn, 6)


I don't want to print all 100,947 members of the sample space; let's just peek at a random sample of them:

In [19]:
import random

random.sample(U6, 10)
['W6 B1 B5 W7 R4 R6',
 'W3 R9 W5 R8 B3 B2',
 'R5 B6 W3 R4 R6 R2',
 'R5 W3 B1 W2 B3 B2',
 'R9 B4 R6 R1 B2 W8',
 'W6 B1 R8 W2 W4 R6',
 'B5 R3 W2 R2 R1 B2',
 'W3 W6 R7 W7 R4 W2',
 'R5 B6 W5 W2 W4 R2',
 'R5 B6 W7 R3 B4 W8']

Is 100,947 really the right number of ways of choosing 6 out of 23 items, or "23 choose 6", as mathematicians call it? Well, we can choose any of 23 for the first item, any of 22 for the second, and so on down to 18 for the sixth. But we don't care about the ordering of the six items, so we divide the product by 6! (the number of permutations of 6 things) giving us:

$$23 ~\mbox{choose}~ 6 = \frac{23 \cdot 22 \cdot 21 \cdot 20 \cdot 19 \cdot 18}{6!} = 100947$$

Note that $23 \cdot 22 \cdot 21 \cdot 20 \cdot 19 \cdot 18 = 23! \;/\; 17!$, so, generalizing, we can write:

$$n ~\mbox{choose}~ c = \frac{n!}{(n - c)! \cdot c!}$$

And we can translate that to code and verify that 23 choose 6 is 100,947:

In [20]:
from math import factorial

def choose(n, c):
    "Number of ways to choose c items from a list of n items."
    return factorial(n) // (factorial(n - c) * factorial(c))
In [21]:
choose(23, 6)

Now we're ready to answer the 4 problems:

Urn Problem 1: what's the probability of selecting 6 red balls?

In [10]:
red6 = {s for s in U6 if s.count('R') == 6}

P(red6, U6)
Fraction(4, 4807)

Let's investigate a bit more. How many ways of getting 6 red balls are there?

In [11]:

Why are there 84 ways? Because there are 9 red balls in the urn, and we are asking how many ways we can choose 6 of them:

In [12]:
choose(9, 6)

So the probabilty of 6 red balls is then just 9 choose 6 divided by the size of the sample space:

In [13]:
P(red6, U6) == Fraction(choose(9, 6), 

Urn Problem 2: what is the probability of 3 blue, 2 white, and 1 red?

In [14]:
b3w2r1 = {s for s in U6 if
          s.count('B') == 3 and s.count('W') == 2 and s.count('R') == 1}

P(b3w2r1, U6)
Fraction(240, 4807)

We can get the same answer by counting how many ways we can choose 3 out of 6 blues, 2 out of 8 whites, and 1 out of 9 reds, and dividing by the number of possible selections:

In [15]:
P(b3w2r1, U6) == Fraction(choose(6, 3) * choose(8, 2) * choose(9, 1), 

Here we don't need to divide by any factorials, because choose has already accounted for that.

We can get the same answer by figuring: "there are 6 ways to pick the first blue, 5 ways to pick the second blue, and 4 ways to pick the third; then 8 ways to pick the first white and 7 to pick the second; then 9 ways to pick a red. But the order 'B1, B2, B3' should count as the same as 'B2, B3, B1' and all the other orderings; so divide by 3! to account for the permutations of blues, by 2! to account for the permutations of whites, and by 100947 to get a probability:

In [16]:
 P(b3w2r1, U6) == Fraction((6 * 5 * 4) * (8 * 7) * 9, 
                           factorial(3) * factorial(2) * len(U6))

Urn Problem 3: What is the probability of exactly 4 white balls?

We can interpret this as choosing 4 out of the 8 white balls, and 2 out of the 15 non-white balls. Then we can solve it the same three ways:

In [17]:
w4 = {s for s in U6 if
      s.count('W') == 4}

P(w4, U6)
Fraction(350, 4807)
In [18]:
P(w4, U6) == Fraction(choose(8, 4) * choose(15, 2),
In [19]:
P(w4, U6) == Fraction((8 * 7 * 6 * 5) * (15 * 14),
                      factorial(4) * factorial(2) * len(U6))

Bonus: Let's revisit our strangely colored fruit example.

In [22]:
# r=red bin; b=blue bin; o=orange; a=apple.
urn = ['ra']*2 + ['ro']*6 + ['bo'] + ['ba']*3
['ra', 'ra', 'ro', 'ro', 'ro', 'ro', 'ro', 'ro', 'bo', 'ba', 'ba', 'ba']
In [29]:
def make_k(item, k):
    "Create a set of k instances of item, numbered accordingly."
    return {item + i for i in k}
    #return {a + b 
    #        for a in A for b in B}
In [30]:
make_k('ra', '12')
{'ra1', 'ra2'}
In [33]:
urn = make_k('ra', '12') | make_k('ro', '123456') | make_k('bo', '1') | make_k('ba', '123')
In [34]:

p(fruit=apple, bin=red) = ?

In [37]:
a_and_r = {f for f in urn if 'ra' in f}
In [38]:
P(a_and_r, urn)
Fraction(1, 6)

Revised Version of P, with more general events

To calculate the probability of an even die roll, I originally said

even = {2, 4, 6}

But that's inelegant—I had to explicitly enumerate all the even numbers from one to six. If I ever wanted to deal with a twelve or twenty-sided die, I would have to go back and change even. I would prefer to define even once and for all like this:

In [39]:
def even(n): return n % 2 == 0

Now in order to make P(even, D) work, I'll have to modify P to accept an event as either a set of outcomes (as before), or a predicate over outcomes—a function that returns true for an outcome that is in the event:

In [40]:
def P(event, space): 
    """The probability of an event, given a sample space of equiprobable outcomes.
    event can be either a set of outcomes, or a predicate (true for outcomes in the event)."""
    if is_predicate(event):
        event = such_that(event, space)
    return Fraction(len(event & space), len(space))

is_predicate = callable

def such_that(predicate, collection): 
    "The subset of elements in the collection for which the predicate is true."
    return {e for e in collection if predicate(e)}

Here we see how such_that, the new even predicate, and the new P work:

In [41]:
such_that(even, D)
{2, 4, 6}
In [42]:
P(even, D)
Fraction(1, 2)
In [43]:
D12 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

such_that(even, D12)
{2, 4, 6, 8, 10, 12}
In [44]:
P(even, D12)
Fraction(1, 2)

Note: such_that is just like the built-in function filter, except such_that returns a set.

We can now define more interesting events using predicates; for example we can determine the probability that the sum of a three-dice roll is prime (using a definition of is_prime that is efficient enough for small n):

In [26]:
D3 = {(d1, d2, d3) for d1 in D for d2 in D for d3 in D}

def prime_sum(outcome): return is_prime(sum(outcome))

def is_prime(n): return n > 1 and not any(n % i == 0 for i in range(2, n))

P(prime_sum, D3)
Fraction(73, 216)

Birthday Paradox

How many people do you think it takes before it is more likely than not that someone shares a birthday?

How to work this out? Turns out it's easier (as it is often is!) to figure out the complement; i.e., that no two people share the same birthday, and then subtract this from 1.

$p$ that person one has unique birthday is $\frac{365}{365} = 1$

Second person is $\frac{364}{365}$ (can pick any of the other 364 days). This is a conditional probability; we can simply multiply: $\frac{365}{365} \cdot \frac{364}{365}$. Then the third person? If we multiply these out (up to some $n$ number of people), we have the probability that no one shares a birthday.

In [51]:
def shared_birthday_p(n):
    cond_p = 1
    for i in range(n-1):
        cond_p = cond_p * (365-(i+1))/365
    return 1-cond_p
In [55]:


The Count

The Count

The conclusion is: be explicit about what the problem says, and then methodical about defining the sample space, and finally be careful in counting the number of outcomes in the numerator and denominator. Easy as 1-2-3.