Probability distribution is a function that generates the probabilities of occurrence of all possible outcomes in an experiment. Consider an experiement of rolling of the die. If the random variable X is used to denote the outcome of the die roll, then the probability distribution of X would take the value $\frac{1}{6}$ for $X \to \{1, 2, 3, 4, 5, 6\}$.
Bernoulli distribution is the probability distribution of a random variable that takes a boolean value such as a 1 or 0. The Probability Mass Function of Bernoulli distribution is mathematically defined as:
For a possible outcome k,
$$f(k,p) = {p^k}({{1-p})^{(1-k)}} $$where p is the probability of outcome 1 and 1-p is the probability of outcome 0.
Result of a coin toss, if patient has disease or not, any experiment with outcome of success or failure.
Let us now plot a Bernoulli distribution
%matplotlib inline
from scipy.stats import bernoulli
import numpy as np
import seaborn as sns
p = 0.3
x = bernoulli.rvs(p, size=1000)
sns.distplot(x, kde=False);
Use stats function.
mu = bernoulli.stats(p, moments='m')
print(mu)
0.3
ref_tmp_var = False
try:
if mu == 0.3:
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
continue
The number of successes, $x$ in a fixed number of $n$ independent Bernoulli trials with probability of success $p$ or $\theta$, follows a binomial distribution. $n$ and $p$ are always fixed in a binomial distribution.
Number of heads after tossing a coin 100 times. Number of defective bulbs after inspecting 1000 bulbs.
Let us generate numbers that conform to a binomial distribution. Given N = 40, p=0.5 and the numbers distributed according to the binomial pmf, compute the mean and variance and assign it to variables, mu and var.
%matplotlib inline
import numpy as np
import seaborn as sns
N = 40
p = 0.5
binomial_x = np.random.binomial(N, p, 10)
sns.distplot(binomial_x)
# Compute mean and variance
<matplotlib.axes._subplots.AxesSubplot at 0x2090f0c0390>
Use formula for mu & var.
mu = N*p
var = N*p*(1-p)
print(mu, var)
20.0 10.0
ref_tmp_var = False
try:
if (mu==20.0) and (var==10.0):
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
continue
Let us consider an experiment of tossing a biased coin, in which p is the probability of 'heads' and n is the number of tosses. An experiment is defined as a series of trials. In this case, an experiment will consist of n trials/tosses. We will repeat the experiment 1000 times to simulate the binomial distribution.
We must check if the experiment follows the assumptions of binomial distribution:
Consider n tosses of a coin. Since the tosses are independent, we can calculate the probability of an outcome by multiplying the individual probabilities in each toss. Probability of 'heads' is p and that of 'tails' is 1-p. Outcome of one experiment with n tosses, has k heads and n-k tails and has probability pk(1-p)n-k. There are ${n} \choose {k}$ number of distinct n-toss sequences that contain k heads. This forms the binomial coefficient.
Binomial probability mass function (pmf) = ${n} \choose {k}$ pk(1-p)n-k
We can now go ahead and simulate our experiment.
Assume the experiment consists of n=30 coin tosses, the probability of getting heads, p=0.6 and this experiment is repeated 1000 times. Generate a random distribution, k which holds number of heads of each repetition of the experiment. Calculate pmf for each value of k and plot a graph with k and corresponding pmf values.
%matplotlib inline
import random
from scipy.stats import binom
import matplotlib.pyplot as plt
def toss(p,n):
heads = 0
for i in range(n):
if random.random() < p:
heads += 1
return heads #This gives the number of heads in a single experiment of n trials
size = range(1,1001)
n,p = 30,0.6
k = [] #Holds the number of heads from each experiment
pmf= [] #Holds the binomial pmf of each experiment
random.seed(12345)
Use toss function to create list of k values, sort them and for each k, calculate pmf using binom.pmf function. Plot k and pmf to get the distribution plot.
for i in size:
k.append(toss(p,n))
k.sort()
for i in k:
pk = binom.pmf(i,n,p)
pmf.append(pk)
plt.plot(k,pmf)
[<matplotlib.lines.Line2D at 0x2090eeb7198>]
ref_tmp_var = False
try:
head = k[:10]
prob = pmf[:10]
ks = [9, 9, 10, 10, 11, 11, 11, 11, 12, 12]
pmfs = [round(x,4) for x in [0.00063412401653505015, 0.00063412401653505015, 0.0019974906520854141, 0.0019974906520854141, 0.005447701778414739, 0.005447701778414739, 0.005447701778414739, 0.005447701778414739, 0.012938291723735, 0.012938291723735]]
if ks == head and pmfs == [round(a,4) for a in prob]:
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions. ')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions. ')
assert ref_tmp_var
continue
Any random variable, whether discrete or continous, may be defined by its cumulative distribution function. CDF of a random variable $X$ is denoted by $F_X(x)$ and gives the cumulative probability upto the value $x$, which is $P(X \leq x)$ for all $x$. In other words, CDF is the proportion of the population having values less than or equal to $x$.
Consider a binomial random variable $X \sim Binomial(n,\mathrm p)$.
Binomial probability mass function, $p_X(x)$ = ${n} \choose {k}$ pk(1-p)n-k.
$\quad$We can derive the cumulative distribution function as,
$F_X(x) \quad=\quad P(X \leq x) \quad=\quad \sum\limits_{k=0}^{x} p_X(k)$
$\quad=\quad \sum\limits_{k=0}^{x}$ ${n} \choose {k}$ pk(1-p)n-k
Coin toss is a perfect example of a binomial random variable. Consider tossing a biased coin, in which p is the probability of 'heads'(success), and n is the number of tosses. Let n=10, p=0.8. To find probability of 2 or less successes,
$F_X(x=2) = P_X(X \leq 2) = p_X(0) + p_X(1) + p_X(2)$
%matplotlib inline
import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as plt
# Plot cdf of a binomial random variable
x = np.linspace(0,20,100)
cdf = binom.cdf(x, n=50, p=0.2)
plt.plot(x,cdf)
# Read the following example
# Find probability of k = 7 successes, with n=10 and p=0.8
pmf7 = binom.pmf(k=7, n=10, p=0.8)
print("Probability of 7 successes =", pmf7)
# Read the following example
# Find probability of k = 7 successes or less, with n=10 and p=0.8
cdf7 = binom.cdf(k=7, n=10, p=0.8)
print("Probability of 7 successes or less =",cdf7)
# Find probability of k > 6 successes, with n=10 and p=0.8
# p_x =
Probability of 7 successes = 0.201326592 Probability of 7 successes or less = 0.3222004736
Read the example solution code given for k=7 and k<=7 successes and compute probability of occurence of greater than 6 heads.
p_x = 1 - binom.cdf(k=6, n=10, p=0.8)
print("Probability of more than 6 successes =", p_x)
Probability of more than 6 successes = 0.8791261184
ref_tmp_var = False
try:
if abs(p_x - 0.8791) < 0.01:
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions. ')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions. ')
assert ref_tmp_var
continue
The number of trials until the first success in a series of Bernoulli trials follows a geometric distribution.
Number of coin tosses before the first time coin shows head. Number of bulbs tested before the first defective bulb is found. Number of patients screened in hospital before first positive case of a disease.
Let us generate numbers that conform to geometric distribution. Compute mean and variance for the exponential distribution and assign to variables mu and var.
N = 40
p = 0.7
binomial_x = np.random.geometric(0.7, 10)
sns.distplot(binomial_x)
# Compute mean and variance
<matplotlib.axes._subplots.AxesSubplot at 0x2090f5d4828>
use formula for mu and var
mu = (1-p)/p
var = (1-p)/p**2
print("Mean: ", mu, "Variance: ", var)
Mean: 0.42857142857142866 Variance: 0.6122448979591838
ref_tmp_var = False
try:
if (abs(mu - 0.428) < 0.1) and (abs(var - 0.612) < 0.1):
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
continue
The probability distribution of a Poisson random variable X is the number of events occurring in a given time interval.
Poisson distribution of discrete random variables is defined mathematically as
$$P(X)=e^{-μ}\frac{μ^x} {x!}$$where
X = 0, 1, 2, …
e = 2.71828
μ = mean number of events in the given time interval
Number of cars crossing a traffic signal in 30 minutes. Number of phone calls at a service desk in 5 minutes. Number of patients coming into a hospital every hour.
Let us now plot a poisson distribution.
ax = sns.plt.subplots(figsize=(18, 8))
x = np.random.poisson(lam=2, size=100)
sns.distplot(x)
<matplotlib.axes._subplots.AxesSubplot at 0x2090d368710>
use formula for mu and var.
ax = sns.plt.subplots(figsize=(18, 8))
x = np.random.poisson(lam=2, size=1000)
sns.distplot(x)
mu = np.mean(x)
print(mu)
2.018
ref_tmp_var = False
try:
if (abs(mu - 1.921) < 0.25):
ref_assert_var = True
ref_tmp_var = True
else:
ref_assert_var = False
print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
print('Please follow the instructions given and use the same variables provided in the instructions.')
assert ref_tmp_var
continue