Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".
Bayesian inference derives the posterior probability as a consequence of two antecedents: a prior probability and a "likelihood function" derived from a statistical model for the observed data. Bayesian inference computes the posterior probability according to Bayes' theorem:
$$ P(H \mid E)={\frac {P(E\mid H)\cdot P(H)}{P(E)}} $$where
$H$ stands for any hypothesis whose probability may be affected by data (called evidence below). Often there are competing hypotheses, and the task is to determine which is the most probable.
$P(H)$, the prior probability, is the estimate of the probability of the hypothesis $H$ before the data $E$, the current evidence, is observed.
$E$, the evidence, corresponds to new data that were not used in computing the prior probability.
$P(H\mid E)$, the posterior probability, is the probability of $H$ given $E$, i.e., after $E$ is observed. This is what we want to know: the probability of a hypothesis given the observed evidence.
$P(E\mid H)$ is the probability of observing $E$ given $H$, and is called the likelihood. As a function of $E$ with $H$ fixed, it indicates the compatibility of the evidence with the given hypothesis. The likelihood function is a function of the evidence, $E$, while the posterior probability is a function of the hypothesis, $H$.
$P(E)$ is sometimes termed the marginal likelihood or "model evidence". This factor is the same for all possible hypotheses being considered (as is evident from the fact that the hypothesis
$H$ does not appear anywhere in the symbol, unlike for all the other factors), so this factor does not enter into determining the relative probabilities of different hypotheses.
For different values of $H$, only the factors $P(H)$ and $P(E\mid H)$, both in the numerator, affect the value of $P(H\mid E)$ – the posterior probability of a hypothesis is proportional to its prior probability (its inherent likeliness) and the newly acquired likelihood (its compatibility with the new observed evidence).
Bayes' rule can also be written as follows:
$$ P(H\mid E)={\frac {P(E\mid H)}{P(E)}}\cdot P(H) $$where the factor ${\frac {P(E\mid H)}{P(E)}}$ can be interpreted as the impact of $E$ on the probability of $H$.
We take a textbook toy example. In this problem we are going to calculate the probability that a patient has an illness given a positive test-result for the illness. A positive test result means the test thinks the patient has the illness.
We have the following prior knowledge on the illness and the test.
Now suppose you randomly picked someone from the population for the test, and the test returned a positive result. What is the probability that this person has the illness?
Let us define the following notations.
Then for this problem we know that
\begin{align*} P(H) &= 0.08 \\ P(E \mid H) &= 0.95 \\ P(E\mid H^C) &= 0.07 \end{align*}Additionally, \begin{align*} P(H^C) &= 0.92 \\ P(E^C\mid H) &= 1-P(E\mid H) = 0.05 \\ P(E^C\mid H^C) &= 1-P(E\mid H^C) = 0.93 \end{align*}
In this problem we are interested in the probability of a person having the illness when the test result is positive, so we want to compute $P(H\mid E)$ from the above information.
From the Bayes' theorem, we know
\begin{align*} P(H\mid E) &= \frac{P(E\mid H)P(H)}{P(E)} \\ &= \frac{P(E\mid H)P(H)}{P(E\mid H)P(H)+P(E\mid H^C)P(H^C)} \end{align*}Prob_H = 0.08
Prob_E_given_H = 0.95
Prob_E_given_Hc = 0.07
Prob_Hc = 1 - Prob_H
Prob_E = Prob_E_given_H * Prob_H + Prob_E_given_Hc * Prob_Hc
Prob_H_given_E = Prob_E_given_H * Prob_H / Prob_E
print(Prob_H_given_E)
0.5413105413105412
The chart below provides the intuition behind this.
import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize=(10,10))
size = 0.35
inner_vals = np.array([Prob_H, 1-Prob_H])
vals1 = np.array([Prob_E_given_H, 1-Prob_E_given_H])*Prob_H
vals2 = np.array([Prob_E_given_Hc, 1-Prob_E_given_Hc])*Prob_Hc
outer_vals = np.hstack((vals1, vals2))
cmap = plt.get_cmap("tab20c")
inner_colors = cmap([4, 0])
outer_colors = cmap([5, 6, 1, 2])
outer_labels = 'positive | ill', 'negative | ill', \
'positive | well', 'negative | well'
plt.pie(outer_vals, radius=1, colors=outer_colors,
wedgeprops=dict(width=size, edgecolor='w'), labels=outer_labels,
explode=(0.05, 0, 0.05, 0))
plt.legend()
inner_labels = r'$H$', r'$H^c$'
plt.pie(inner_vals, radius=1-size, colors=inner_colors,
wedgeprops=dict(width=size, edgecolor='w'), autopct='%1.1f%%' )
plt.show()