import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.art3d as art3d
import numpy as np
In this chapter, we will only be dealing with joint distributions, which are the most important section of the whole course. Joint distributions are used for formulating all kinds of probability model.
The joint probability mass function of two discrete random variable is defined as
PXY(x,y)=P(X=x,Y=y)It is convenient to define a finite range for X and Y, RX={x1,x2,...} and RY={y1,y2,...} and its cartesian product
RXY⊂RX×RY={(xi,yj)|xi∈RX,yj∈RY}is the range for joint distribution.
The most common property for probability distribution is
∑(xi,yj)∈RXYPXY(xi,yj)=1Let's consider a probability mass function table.
from fractions import Fraction as frac
pY_0 = frac(1,6) + frac(1,8)
pY_1 = frac(1,4) + frac(1,6)
pY_2 = frac(1,8) + frac(1,6)
pX_0 = frac(1,6) + frac(1,4) + frac(1,8)
pX_1 = frac(1,8) + frac(1,6) + frac(1,6)
print('Marginal PMF of pY are {0}, {1}, {2}.'.format(pY_0,pY_1,pY_2))
print('Marginal PMF of pX are {0}, {1}.'.format(pX_0,pX_1))
Marginal PMF of pY are 7/24, 5/12, 7/24. Marginal PMF of pX are 13/24, 11/24.
The reason we call them marginal is because they are written at the margin of the table.
Y=0Y=1Y=2PX(x)X=01/61/41/813/24X=11/81/61/611/24PY(y)7/245/127/24If independent, a conditional probability should equal to marginal probability, for instance
P(X=0|Y=1)=1/41/4+1/6=3/5PX(X=0)=13/24They are not equal, which means they are not independent.
The relationship of marginal PMF and conditional PMF is
P(X|Y)=P(X,Y)PY(Y)i.e.
Conditional PMF=Joint PMFMarginal PMFThe joint CDF of two random variables X and Y is defined as
FXY(x,y)=P(X≤x,Y≤y)where 0≤FXY(x,y)≤1.
For instance, the joint CDF of P(X≤2,Y≤1) in range (−6, 6) is the probability of the shaded area.
x = np.linspace(-6, 1)
y = 2*np.ones(len(x))
fig, ax = plt.subplots(figsize = (8, 8))
ax.plot([1, -5], [2, 2], color = 'b')
ax.scatter(1, 2, s = 80, zorder = 3, color = 'red')
ax.plot([1, 1], [2, -5], color = 'b')
ax.axis([-5, 6, -5, 6])
ax.scatter(np.random.uniform(low = -5, high = 6, size = 50),
np.random.uniform(low = -5, high = 6, size = 50))
ax.fill_between(x, y, -5, color = 'red', alpha =.2)
ax.text(1, 2.1, '$(1, 2)$', size = 15)
ax.grid()
Marginal CDF FX(x) and FY(y) are denoted
FX(x)=P(X≤x,Y≤∞)FY(y)=P(X≤∞,Y≤y)If A is a random event, the conditional PMF of X given A is denoted as
PX|A(X=xi)=P(X=xi,A)P(A)Consider a PMF as below
X=−2X=−1X=0X=1X=2Y=2001/1300Y=101/131/131/130Y=01/131/131/131/131/13Y=−101/131/131/130Y=−2001/1300Mathematically, it is defined as G={(x,y)|x,y∈Z,|x|+|y|≤2}.
pY_2 = frac(1,13)
pY_1 = frac(1,13)*3
pY_0 = frac(1,13)*5
pY_m1 = frac(1,13)*3
pY_m2 = frac(1,13)
pX_2 = frac(1,13)
pX_1 = frac(1,13)*3
pX_0 = frac(1,13)*5
pX_m1 = frac(1,13)*3
pX_m2 = frac(1,13)
print('Marginal PMF of pY are {0}, {1}, {2}, {3}, {4}.'.format(pY_2,pY_1,pY_0,pY_m1,pY_m2))
print('Marginal PMF of pX are {0}, {1}, {2}, {3}, {4}.'.format(pX_2,pX_1,pX_0,pX_m1,pX_m2))
Marginal PMF of pY are 1/13, 3/13, 5/13, 3/13, 1/13. Marginal PMF of pX are 1/13, 3/13, 5/13, 3/13, 1/13.
We add marginals to the table
X=−2X=−1X=0X=1X=2PY(y)Y=2001/13001/13Y=101/131/131/1303/13Y=01/131/131/131/131/135/13Y=−101/131/131/1303/13Y=−2001/13001/13PX(x)1/133/135/133/131/13It shows that given Y=1, X is uniformly distributed over {−1,0,1}.
No, for instance P(X=0|Y=1)≠PX(X=0)
If random event A is replaced by a discrete random variable Y, the conditional density PMFs are defined as
PX|Y(xi|yj)=PXY(xi,yj)PY(yj)PY|X(yj|xi)=PXY(xi,yj)PX(xi)where xi and yj are realizations of X and Y.
The expectation can be conditional on a random event or a realization of random variable.
E[X|A]=∑xi∈RXxiPX|A(xi|A)E[X|Y=yj]=∑xi∈RXxiPX|Y(xi|Y=yj)Use the PMF example in last section, let's try to answer questions below.
To calculate the conditional expectation, we must use conditional probability as weight:
First, calculate the conditional PMF
PX|−1<Y<2(xi|−1<Y<2)=−21/138/13−2/138/13+03/138/13+2/138/13+21/138/13=0If you paid attention to the conditional expection expression
E[X|Y=yj]=∑xi∈RXxiPX|Y(xi|Y=yj)you would find that it is actually a function of Y.
Consider a joint PMF below
X=0X=1PY(y)Y=01/52/53/5Y=12/502/5PX(x)3/52/5Remember that Z is a function of Y. To calculate conditional expectation, we need to use conditional probability as well.
E[X|Y=0]=0(1515+25)+1(2515+25)=23E[X|Y=1]=0Because E[X|Y] itself is a variable, it must have an expectation as well
E[Z]=E[E[X|Y]]=PY(Y=0)E[X|Y=0]+PY(Y=1)E[X|Y=1]=35⋅23+25⋅0=25Actually, E[Z]=E[E[X|Y]]=E[X] must hold, it is the law of iterated expectation.
All the rules of expectation for independent variables are here, they are fairly straightforward, because conditioning on Y does not provide any extra information
Joint PDF of X and Y is defined as
P((X,Y)∈A)=∬AfXY(x,y)dxdy=1where fXY(x,y) is a non-negative function, mapping R2 to R.
However, we are particularly interested in the case that A is a rectangular,
P(a≥X≥b,c≥Y≥d)=∫dc∫bafXY(x,y)dxdyAnd within A, there are infinite amount of small rectangles
P(a≥X≥a+δ,c≥Y≥c+δ)≈fXY(a,c)δ2Let's consider an example other than normal distribution.
fXY(x,y)={x+cy20≤x≤1,0≤y≤10 otherwiseUse the property ∬AfXY(x,y)dxdy=1
∫10∫10(x+cy2)dxdy=1∫10[x22+cxy2]10dy=1∫10[12+cy2]dy=1[y2+cy33]10=112+c3=1c=32Plug in c, perform double integration
∫1/20∫1/20(x+32y2)dxdy=∫1/20[x22+32y2x]1/20dy=∫1/20[18+34y2]dy=[18+y34]1/20=332The joint distribution is depicted as below, the volume between the curved plane and xy plane is 1.
x, y = np.linspace(0, 1), np.linspace(0, 1)
X, Y = np.meshgrid(x, y)
Z = X + 3/2*Y**2
fig = plt.figure(figsize = (8, 8))
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z, cmap = 'coolwarm')
ax.contourf(X, Y, Z, zdir='z', offset=0, cmap='coolwarm')
plt.show()
Maringal PDF of X and Y are
fX(x)=∫∞−∞fXY(x,y)dy, for all xfY(y)=∫∞−∞fXY(x,y)dx, for all yLet's use the same example as in last section to find out fX(x) and fY(y).
fX(x)=∫10(x+32y2)dy=x+12fY(y)=∫10(x+32y2)dx=32y2+12Joint CDF and joint PDF has relationship as follows:
FXY(x,y)=∫y−∞∫x−∞fXY(u,v)dudvfXY(x,y)=∂2∂x∂yFXY(x,y)The same PDF as above, find the CDF.
fXY(x,y)={x+32y20≤x≤1,0≤y≤10 otherwiseConsider the conditional PDF of X given that X∈A
P(x≤X≤x+δ|X∈A)≈fX|X∈A(x)⋅δ=P(x≤X≤x+δ,X∈A)P(A)=P(x≤X≤x+δ)P(A)≈fX(x)δP(A)We have shown that
fX|X∈A(x)=fX(x)P(A)You can imagin P(A) as a scaling factor that normalize the conditional PDF into an area of 1.
For two jointly continuous random variables X and Y, we can define the following conditional concepts:
The intuition of the first expression, i.e. conditional PDF is
P(x≤X≤x+δ|y≤Y≤y+ϵ)≈fXY(xy)δϵfY(y)ϵ=fX|Y(x|y)δConditional probability must satisfy the basic rule of probability as well,
∫∞−∞fX|Y(x|y)dx=1because
∫∞−∞fXY(xy)dxfY(y)=1Rearrange the conditional PDF, we obtain the multiplication rule
fXY(xy)=fX|Y(x|y)fY(y)If continuous variables X and Y are independent, then knowing either of them does not provide information for the other. That is
fX|Y(x|y)=fX(x),orfY|X(y|x)=fY(y)Thus the multiplication rule for independent distribution
fXY(xy)=fX(x)fY(y)Other rules derived from this are
E[XY]=E[X]E[Y]Var(X+Y)=Var(X)+Var(Y)E[g(X)h(Y)]=E[g(X)]E[h(Y)]