Oliver's blood

Copyright 2018 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT

In [1]:
# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

from thinkbayes2 import Pmf, Suite

Here is another problem from MacKay’s Information Theory, Inference, and Learning Algorithms:

Two people have left traces of their own blood at the scene of a crime. A suspect, Oliver, is tested and found to have type ‘O’ blood. The blood groups of the two traces are found to be of type ‘O’ (a common type in the local population, having frequency 60%) and of type ‘AB’ (a rare type, with frequency 1%). Do these data (type ‘O’ and ‘AB’ blood were found at scene) give evidence in favour of the proposition that Oliver was one of the two people present at the crime?

MacKay suggests formulating the problem like this:

Denote the proposition ‘the suspect and one unknown person were present’ by S. The alternative, , states ‘two unknown people from the population were present’.

And then he computes the conditional probabilities of the data under S and .

P(D | S) = p(AB)

P(D | S̄) = 2 p(O) p(AB)

Some people are initially unsure why there is a factor of two in the second equation. One way to convince yourself that it is correct is a verbal argument: "If Oliver did not leave a blood trace at the scene, then the blood traces were left by two unknown people. If we consider these unknown people in order, the first might have left type ‘O’ blood and the second might have left type ‘AB’, or the other way around. Since there are two ways to account for the data, we have to add their probabilities."

This is correct, but with probability it is easy for errors to hide in the words. I find it useful to express the idea computationally as well.

I'll create a Pmf object with the distribution of blood types.

In [2]:
types = Pmf({'O\t': 0.6, 'AB\t':0.01, 'other\t':0.39})
types.Print()
AB	 0.01
O	 0.6
other	 0.39

Now we can compute P(D | S) = p(AB)

In [3]:
like_S = types['AB\t']
Out[3]:
0.01

Pmf provides an addition operator that computes the distribution of all pairs of outcomes:

In [4]:
pairs = types + types
pairs.Print()
AB	AB	 0.0001
AB	O	 0.006
AB	other	 0.0039000000000000003
O	AB	 0.006
O	O	 0.36
O	other	 0.23399999999999999
other	AB	 0.0039000000000000003
other	O	 0.23399999999999999
other	other	 0.1521

Reading this table, we can see more explicitly that there are two outcomes that account for the data, AB O and O AB.

So we can compute P(D | S̄):

In [5]:
like_S̄ = pairs['O\tAB\t'] + pairs['AB\tO\t']
Out[5]:
0.012

As MacKay points out, the data are more likely under than under S, so they are evidence in favor of ; that is, they are exculpatory.

Let's do the update, assuming that the prior is 50:50.

In [6]:
suite = Suite(['S', 'S̄'])
suite.Print()
S 0.5
S̄ 0.5
In [7]:
suite['S'] *= like_S
suite['S̄'] *= like_S̄
suite.Normalize()
Out[7]:
0.011

In light of this evidence, we are slightly more inclined to believe that Oliver is not guilty (or at least, did not leave a blood trace at the scene).

In [8]:
suite.Print()
S 0.4545454545454546
S̄ 0.5454545454545455