The Chi-square Distribution is denoted as χ2(n) or sometimes χ2n, where n indicates the degrees of freedom. It used everywhere (I think you used it before in feature analysis). It is related to Normal distribution.
Let V=Z21+Z32+⋯+Z2n, where the Zj are i.i.d. N(0,1). Then by definition, V∼χ2(n).
You will find that in a lot of things involving statistics, the sum of squares of N(0,1) often pops up.
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import (MultipleLocator, FormatStrFormatter,
from scipy.stats import chi2
%matplotlib inline
dof_values = [1,2,3,4,5,6,7,8]
x = np.linspace(0, 10, 1000)
# plot the distributions
_, ax = plt.subplots(figsize=(12,8))
for d in dof_values:
ax.plot(x, chi2.pdf(x, d), lw=3.2, alpha=0.6, label='df={}'.format(d))
# legend styling
legend = ax.legend()
for label in legend.get_texts():
for label in legend.get_lines():
# y-axis
ax.set_ylim([0.0, 0.5])
# x-axis
ax.set_xlim([0, 10.0])
# x-axis tick formatting
majorLocator = MultipleLocator(2.0)
majorFormatter = FormatStrFormatter('%0.1f')
minorLocator = MultipleLocator(1.0)
ax.grid(color='grey', linestyle='-', linewidth=0.3)
plt.suptitle(r'Examples of $\chi^2_n$ with varying degrees of freedom')
from scipy.stats import gamma
_, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(14,6))
x = np.linspace(0, 20, 1000)
ax1.plot(x, chi2.pdf(x, 1), lw=3.2, alpha=0.6, color='#33AAFF', label='df=1')
ax1.set_title('$\chi^2_1$', y=1.02)
ax1.grid(color='grey', linestyle='-', linewidth=0.3)
# gamma.pdf API: scale = 1 / beta
l = 0.5
ax2.plot(x, gamma.pdf(x, 0.5, scale=1/l), lw=3.2, alpha=0.6, color='#FF9933', label=r'$\alpha$=1/2, $\lambda$=1/2')
ax2.set_title(r'$Gamma(\frac{1}{2}, \frac{1}{2})$', y=1.02)
ax2.grid(color='grey', linestyle='-', linewidth=0.3)
It follows then that χ2(n)=Gamma(n2,12)
_, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(14,6))
x = np.linspace(0, 20, 1000)
dof_values = [1, 2, 5, 10, 20]
col_alph_values = [0.8, 0.6, 0.5, 0.4, 0.3]
for df,c_alph in zip(dof_values, col_alph_values):
ax1.plot(x, chi2.pdf(x, df), color='#33AAFF', lw=3.2, alpha=c_alph, label='df={}'.format(df))
ax1.set_title('$\chi^2_n$ for varying degrees of freedom', y=1.02)
ax1.grid(color='grey', linestyle='-', linewidth=0.3)
# gamma.pdf API: scale = 1 / lambda
l = 0.5
for alph,c_alph in zip(dof_values, col_alph_values):
ax2.plot(x, gamma.pdf(x, alph/2, scale=1/l), lw=3.2, alpha=c_alph, color='#FF9933', label=r'$\alpha$={}/2, $\lambda$=1/2'.format(alph))
ax2.set_title(r'$Gamma(\frac{n}{2}, \frac{1}{2})$ for varying n', y=1.02)
ax2.grid(color='grey', linestyle='-', linewidth=0.3)
The Student's t-distribution can be described in terms of the standard normal Z∼N(0,1) and X2 V(n) distributions, so that means it can be entirely described in terms of the standard normal distribution.
Let T=Z√V/n, with Z∼N(0,1) and V∼χ2(n), where Z,V are independent.
Then we can write T∼tn, where n is the degrees of freedom.
t1 does not have a 1st moment; t2 does not have a 2nd moment; t3 does not have a 3rd moment; and so on. Odd moments, if they exist, are 0
from scipy.stats import t
dof_values = [1,2,5,10,30,1E10]
col_alph_values = [0.2, 0.3, 0.4, 0.5, 0.6, 0.8]
x = np.linspace(-5, 5, 1000)
# plot the distributions
fig, ax = plt.subplots(figsize=(8, 6))
for df,c_alph in zip(dof_values, col_alph_values):
if df > 30:
dl = r'$+\infty$'
dl = df
ax.plot(x, t.pdf(x, df), lw=3.2, color='#A93226', alpha=c_alph, label=r'df={}'.format(dl))
# legend styling
legend = ax.legend()
for label in legend.get_texts():
for label in legend.get_lines():
# y-axis
ax.set_ylim([0, .43])
# x-axis
ax.set_xlim([-3.0, 3.0])
ax.grid(color='grey', linestyle='-', linewidth=0.3)
plt.title(r'Examples of $t_n$ with varying degrees of freedom', y=1.02)
plt.text(x=3.5, y=0.22, s=r'Fatter tails with fewer degrees of freedom')
plt.text(x=3.5, y=0.19, s=r'Approaches $\mathbb{N}(0,1)$ as $df \rightarrow +\infty$')
It was proved earlier that for Z∼N(0,1), the even moments are such that
E(Z2)=1E(Z4)=1×3=3E(Z6)=1×3×5=15 skip factorialNow, this was proven using moment-generating functions, but we can also relate this to the Gamma distribution.
E(Z2n)=E((Z2)n)=E(χ21)n) but by definition Z2 is χ21=E(Gamma(12,12)n)... and after this point, we can use our knowledge of the Gamma distribution and LOTUS.
Let's prove property 5 above, but use the Law of Large Numbers (c.f. Lesson 29).
Now we can choose any distribution for this case as long as it is N(0,1), and so there is nothing wrong in choosing the same distribution Z for the numerator and all elements of the denominator as well.
Then Vnn→1 with probability 1 by the Law of Large Numbers, since the average Vnn will approach the true average Z21 as n gets large. We know that Z21=1.
Now, the Law of Large Numbers is with regards to point-wise estimates, so we can further state that √Vnn→1 with probability 1.
So Tn→Z with probability 1, since the denominator goes to 1 when you have a large number of degrees of freedom; only the Z in the numerator will be of importance.
Random vector (X1,X2,⋯,Xk)=→X is Multivariate Normal if every linear combination t1X1+t2X2+⋯+tkXk is Normal.
Let Z,W be i.i.d. N(0,1). Then (Z+2W,3Z+5W) is multivariate Normal (MVN).
Given constants s,t
s(Z+2W)+t(3Z+5W)=(s+3t)Z+(2s+5t)WBut since this is (s+3t) and (2s+5t) are just scaling independent Normal random variables Z and W respectively; and since we know that the sum of Normal random variables is also Normal, we know that (s+3t)Z+(2s+5t)W is also necessarily a Normal r.v.
Let Z∼N(0,1), and let S be a random sign that is independent of Z.
Then Z,SZ are marginally N(0,1) (consider both individually on their own).
But (Z,SZ) is not multivariate normal! Just test this by considering (Z+SZ).
Z+SZ cannot be Normal, since:
Now with X∼N(μ,σ2), the moment generating function M(X) is
M(X)=E(etX)=etμ+12t2σ2Extending this one-dimensional case to the multidimensional →X:
M(→X)=E(e→t→X)=E(et1X1+t2X2+⋯+tnXn)=E(et1μ1+t2μ2+⋯+tnμn+12Var(t1X1+t2X2+⋯+tnXn)Recall that in general, independence implies uncorrelation, but the vice versa is not always true. In the case of an MVN, however, it is true.
In other words, consider vector
→X=[→X1→X2]If every component of →X1 is uncorrelated with every component of →X2, then →X1 is independent of →X2.
Let X,Y be i.i.d. N(0,1). Then (X+Y,X−Y) is MVN (bivariate Normal to be precise).
It is easy enough to show that X+Y and X−Y are uncorrelated:
Cov(X+Y,X−Y)=Var(X)+Cov(X,Y)−Cov(X,Y)−Var(Y)=Var(X)−Var(Y)=1−1=0But can we show that X+Y and X−Y are independent?
Let's try for something a bit more abstract.
We suppose that X,Y are independent, zero-mean normal random variables with variances σU,σV.
Let U=aX+bY, and V=cX+dY so that U,V are jointly normal; this is a more general represention of the above example, where a=1,b=1,c=1,d=−1.
Say we have some scalars t1,t2, and let Z=t1U+t2V. Then
MU,V(t1,t2)=E(et1U+t2V)=E(eZ)=E(et1μU+t2μV+12Var(t1U+t2V)=E(e12Var(t1U+t2V)=E(et2Uσ2U+t2Vσ2V2)Now let U′,V′ be independent zero-mean normal random variables with the same variances σU,σV. Since U′,V′ are independent, they are also uncorrelated, and so the moment generating function of their bivariate normal distribution is given by MU′,V′(t1,t2)=E(et2Uσ2U+t2Vσ2V2).
Since both U,V and U′,V′ have the same moment generating function, they are boht associated with the same bivariate Normal distribution (they share the same joint PDF).
Thereforce, since U′,V′ are independent, we conclude that U,V are also independent. QED.