Bayes Theorem¶

Rohan L. Fernando¶

November 2015¶

Motivation¶

In whole-genome analyses, the number $k$ of marker covariates typically exceeds the number $n$ of observations
Cannot use least squares methods to simultaneously estimate the effects of all the $k$ covariates
Bayesian inference is widely used to overcome this problem by combining prior information with the observed data
In the Bayesian approach, inferences are based on conditional probabilities, and the Bayes theorem is a statement on conditional probability

Conditional Probability of $X$ Given $Y$¶

Suppose $X$ and $Y$are two random variables with joint probability distribution $\Pr(X,Y)$. Then, the conditional probability of $X$ given $Y$ is given by Bayes theorem as

$$\Pr(X|Y) = \frac{\Pr(X,Y)}{\Pr(Y)} \tag{1}$$

where $\Pr(Y)$ is the probability distribution of $Y$.

Similarly, thethe conditional probability of $Y$ given $X$ is

$$\Pr(Y|X)=\frac{\Pr(X,Y)}{\Pr(X)},$$

which upon rearranging gives

$$\Pr(X,Y)=\Pr(Y|X)\Pr(X). \tag{2}$$

Then, substituting (2) in (1) gives

$$\begin{eqnarray} \Pr(X|Y) &= &\frac{\Pr(X,Y)}{\Pr(Y)}\\ &= &\frac{\Pr(Y|X)\Pr(X)}{\Pr(Y)}, \end{eqnarray}$$

which is the form of the formula that is used for inference of $X$ given $Y.$

Bayes Theorem by Example¶

Here we consider a simple example to justify the formula (1). The following table gives the joint distribution of smoking and lung cancer in a hypothetical population of 1,000,000 individuals.

$$ \begin{array}{c|lc|r} & \text{Smoking} \\ \hline \text{Cancer} & \text{Yes} & \text{No} & \text{} \\ \hline \text{Yes} & 42,500 & 7,500 & 50,000 \\ \text{No} & 207,500 & 742,500 & 950,000 \\ \hline & 250,000 & 750,000 & 1,000,000 \end{array} $$

Given these numbers, consider how you would compute the relative frequency of lung cancer among smokers

There are a total of 250,000 smokers in this population, and among these 250,000 individuals, 42,500 have lung cancer

So, relative frequency of lung cancer among smokers is $\frac{42,500}{250,000}$

As we reason next, this relative frequency is also the conditional probability of lung cancer given the individual is a smoker

The frequentist definition of probability of an event is the limiting value of its relative frequency in a “large” number of trials.

Suppose we sample with replacement individuals from the 250,000 smokers and compute the relative frequency of the incidence of lung cancer.
It can be shown that as the sample size goes to infinity, this relative frequency will approach $\frac{42,500}{250,000}=0.17$.

This ratio can also be written as $$\frac{42,500/1,000,000}{250,000/1,000,000}=0.17.$$
The ratio in the numerator is the joint probability of smoking and lung cancer, and the ratio in the denominator is the marginal probability of smoking.
So, $$\Pr(X|Y) = \frac{\Pr(X,Y)}{\Pr(Y)} $$