Maximum Likelihood Parameter Estimation of some 1-dimensional Distributions

Required python modules:

In [1]:
import numpy as np
from matplotlib import pyplot as plt
np.set_printoptions(precision=2)

Bernoulli Distribution

Generate a random sample of 15 binary values:

In [2]:
NumSamp=15.0
BinSamp=np.random.randint(0,2,(NumSamp))
print "Random Sample:"
print BinSamp
Random Sample:
[1 0 0 0 0 0 0 0 0 1 1 1 0 1 1]

Determine frequency of ones and estimate the probability of ones, $p_1$, and the probability of zeros, $p_0$:

In [3]:
freqOne=len(BinSamp[BinSamp==1])
p1=freqOne/NumSamp
p0=1-p1
print "Number of ones in the sample: ",freqOne
print "Probability of ones:        : ",p1
print "Probability of zeros:       : ",p0
Number of ones in the sample:  6
Probability of ones:        :  0.4
Probability of zeros:       :  0.6

Plot the estimated discrete probability distribution $P(x)=p_1^x(1-p_1)^{(1-x)}$:

In [4]:
plt.stem(range(2),[p0,p1],'b')
plt.axis([-0.05,1.2,0,1.2])
plt.text(0.1,1.0,"Sample: "+np.array2string(BinSamp))
plt.title("Bernoulli Distribution")
plt.grid(True)

Multinomial Distribution

Generate a random sample of 25 discrete values from $\{0,1,2,3\}$:

In [5]:
NumSamp=25.0
MAX=4
MultSamp=np.random.randint(0,MAX,(NumSamp))
print "Sample:"
print MultSamp
Sample:
[3 1 1 3 3 3 2 0 1 3 3 0 3 3 3 2 2 1 1 3 1 3 3 3 0]

Determine frequency and probability for all of the 4 values in the sample:

In [6]:
freqs=[len(MultSamp[MultSamp==i]) for i in range(MAX)]
probs=[freqs[i]/NumSamp for i in range(MAX)]
for i in range(MAX):
    print "Number of %d in the sample:      %2d" % (i,freqs[i])
    print "  Probability of %d in the sample: %1.2f" % (i,probs[i])
Number of 0 in the sample:       3
  Probability of 0 in the sample: 0.12
Number of 1 in the sample:       6
  Probability of 1 in the sample: 0.24
Number of 2 in the sample:       3
  Probability of 2 in the sample: 0.12
Number of 3 in the sample:      13
  Probability of 3 in the sample: 0.52

Plot the estimated discrete probability distribution $P(x^{(0)},x^{(1)},x^{(2)},x^{(3)}) = \prod\limits_{i=0}^3 p_i^{x^{(i)}}$:

In [7]:
plt.stem(range(MAX),probs,'b')
plt.axis([-0.05,MAX-0.8,0,1.2])
plt.text(0.1,1.0,"Sample: "+np.array2string(MultSamp))
plt.title("Multinomial Distribution")
plt.grid(True)

Gaussian Distribution

Generate a random sample of 30 continuous values:

In [8]:
NumSamp=30.0
MultSamp=np.random.normal(3,1,(NumSamp))
print "Sample:"
print MultSamp
Sample:
[ 3.77  2.78  4.89  2.73  3.11  2.19  2.48  4.36  2.68  1.91  2.73  3.02
  2.66  1.95  3.93  3.43  3.17  3.39  2.89  2.24  1.82  3.56  2.92  4.86
  2.91  3.8   2.82  3.02  0.6   3.5 ]

If the samples can be assumed to be Gaussian distributed estimate the mean $m$ and the standard deviation $s$ of the distribution from the sample:

In [9]:
m=np.mean(MultSamp)
s=np.std(MultSamp)
print "Estimated Mean:                ",m
print "Estimated Standard Deviation:  ",s
Estimated Mean:                 3.00387171583
Estimated Standard Deviation:   0.881070277823

Plot the gaussian distribution $$p(x)=\frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}, $$ where the estimation of the mean $\mu$ is $m$ and the estimation of the standard deviation $\sigma$ is $s$.

In [10]:
x=np.arange(0,6,0.1) # Define the x-values at which the probability distribution shall be plotted
prob=1/(s * np.sqrt(2 * np.pi))*np.exp( - (x - m)**2 / (2 * s**2) )
plt.plot(x,prob,'b')
plt.text(0.1,0.35,"Estimated Mean: %2.2f  \nEstimated std. dev. %2.2f" % (m,s))
plt.title("Gaussian Distribution")
plt.grid(True)
plt.show()

2-Dimensional Gaussian Distribution

In [11]:
from numpy import mgrid
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from scipy.stats import norm
from matplotlib import pyplot as plt
from matplotlib import cm

Generate a random sample of 200 2-dimensional Gaussian distributed samples. The mean vector and the covariance matrix of the multivariate Gaussian that generates the samples are configured:

In [12]:
Numpoints=200
m1=1    #Mean of first dimension
m2=1    #Mean of first dimension
mean=np.array(([m1,m2]))
s11=1 #standard deviation of x1
s22=1.4 #standard deviation of x2 
rho=0.8 #correlation coefficient between s11 and s22
#Determine covariance matrix from standard deviations
var11=s11**2
var22=s22**2
var12=rho*s11*s22
var21=var12
cov=np.array(([var11,var12],[var21,var22]))
print "Configured mean: "
print mean
print "Configured covariance: "
print cov
Configured mean: 
[1 1]
Configured covariance: 
[[ 1.    1.12]
 [ 1.12  1.96]]

Now the 200 samples of the multivariate Gaussian are generated. The mean vector and the covariance matrix are estimated from the generated sample. As can be seen the estimation is close to the real, configured mean and covariance.

In [13]:
pointset=np.random.multivariate_normal(mean,cov,Numpoints)
est_cov=np.cov(pointset,rowvar=0)
est_mean=np.mean(pointset,axis=0)
print "Estimated mean: "
print est_mean
print "Estimated covariance: "
print est_cov
Estimated mean: 
[ 1.05  1.01]
Estimated covariance: 
[[ 0.96  1.02]
 [ 1.02  1.92]]

The generated set of 2-dimensional samples is plotted:

In [18]:
plt.figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
plt.plot(pointset[:,0],pointset[:,1],'o',ms=6,label="Random Points")
plt.hold(True)
plt.grid(True)
plt.title("Multivariate Gaussian Distribution of "+str(Numpoints)+" samples")
plt.xlabel('x1',fontsize=16)
plt.ylabel('x2',fontsize=16)
plt.xlim([-6,6])
plt.ylim([-6,6])
plt.plot([m1],[m2],'sr',ms=10,label="Mean: ("+str(m1)+","+str(m2)+")")
#plt.text(-5,2,'$\sigma_{11}$='+str(s11)+'\n$\sigma_{22}$='+str(s22)+'\n$corr$='+str(rho),fontsize=16,backgroundcolor='white')
plt.text(-5,2,'s11='+str(s11)+'\ns22='+str(s22)+'\ncorr='+str(rho),fontsize=16,backgroundcolor='white')
plt.legend(loc=2,numpoints=1)
plt.show()

Finally the estimated 2-dimensional Gaussian Distribution function is plotted in a 3-dimensional graph:

In [20]:
x,y = mgrid[-7:7:100j, -7:7:100j]
[m1,m2]=est_mean
s11=np.sqrt(est_cov[0,0])
s22=np.sqrt(est_cov[1,1])
corr=est_cov[0,1]/(s11*s22)


#2-dimensionale Gau├čverteilung nach Definition http://de.wikipedia.org/wiki/Mehrdimensionale_Normalverteilung
a=1.0/(2*np.pi*s11*s22*np.sqrt(1-corr**2))*np.exp(-1.0/(2*(1-corr**2))*(((x-m1)**2)/(s11**2)+((y-m2)**2)/(s22**2)-2*corr*(x-m1)*(y-m2)/(s11*s22)))  
##############################################################################################################
fig = plt.figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
ax = fig.gca(projection='3d')
surf = ax.plot_surface(x,y,a,rstride=1, cstride=1, cmap=cm.jet,
        linewidth=0.0, antialiased=False)
#plt.text(0.5,0.5,0.2,'$\sigma_{11}$='+str(s11)+'\n$\sigma_{22}$='+str(s22)+'\n$corr$='+str(corr),fontsize=14,backgroundcolor='white')
plt.title('s11='+str(s11)+'  s22 = '+str(s22)+'  corr ='+str(corr))
plt.xlabel('x')
plt.ylabel('y')
plt.show()
In [15]: