This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

Let $q$ be the probability of obtaining a head. Whereas $q$ was just a fixed number in the previous recipe, we consider here that it is a *random variable*. Initially, this variable follows a distribution called the **prior distribution**. It represents our knowledge about $q$ *before* we start flipping the coin. We will update this distribution after each trial (**posterior distribution**).

- First, we assume that $q$ is a
*uniform*random variable on the interval $[0, 1]$. That's our prior distribution: for all $q$, $P(q)=1$. - Then, we flip our coin $n$ times. We note $x_i$ the outcome of the $i$-th flip ($0$ for tail, $1$ for head).
- What is the probability distribution of $q$ knowing the observations $x_i$?
**Bayes' formula**allows us to compute the*posterior distribution*analytically (see the next section for the mathematical details):

$$P(q | \{x_i\}) = \frac{P(\{x_i\} | q) P(q)}{\displaystyle\int_0^1 P(\{x_i\} | q) P(q) dq} = (n+1)\binom n h q^h (1-q)^{n-h}$$

We define the posterior distribution according to the mathematical formula above. We remark this this expression is $(n+1)$ times the *probability mass function* (PMF) of the binomial distribution, which is directly available in `scipy.stats`

. (http://en.wikipedia.org/wiki/Binomial_distribution)

In [ ]:

```
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
%matplotlib inline
```

In [ ]:

```
posterior = lambda n, h, q: (n+1) * st.binom(n, q).pmf(h)
```

Let's plot this distribution for an observation of $h=61$ heads and $n=100$ total flips.

In [ ]:

```
n = 100
h = 61
q = np.linspace(0., 1., 1000)
d = posterior(n, h, q)
```

In [ ]:

```
plt.figure(figsize=(5,3));
plt.plot(q, d, '-k');
plt.xlabel('q parameter');
plt.ylabel('Posterior distribution');
plt.ylim(0, d.max()+1);
```

- This distribution indicates the plausible values for $q$ given the observations. We could use it to derive a
**credible interval**, likely to contain the actual value. (http://en.wikipedia.org/wiki/Credible_interval)

We can also derive a point estimate. For example, the **maximum a posteriori (MAP) estimation** consists in considering the *maximum* of this distribution as an estimate for $q$. We can find this maximum analytically or numerically. Here, we find analytically $\hat q = h/n$, which looks quite sensible. (http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation)

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).