Comparison of a generative model (parametric classification) and discriminative models

Generative Model (Parametric Classification)

In this section it is demonstrated that parametric classification yields bad results, if the assumption on the data distribution is wrong. In the example it is assumed that the data of both classes is Gaussian distributed, what is actually not the case.

In [1]:
from numpy.random import multivariate_normal
import matplotlib.pyplot as plt
import numpy as np
from scipy import arange, meshgrid

The used training data is stored in the file 2dim2classManual. Each sample consists of 2 input features and a class label. Instances of class 0 are labeled by 0. Instances of class 1 are labeled by one. In the code below data from the file is imported and partitioned into 2 class-specific numpy-arrays.

In [2]:
Trainset=np.loadtxt("./Res/2dim2classManual")
Trainset1=Trainset[Trainset[:,2]<0.5] # Get class 0 data, which is labeled by 0
numClass1=Trainset1.shape[0]
NumTrain=[numClass1]
Trainset2=Trainset[Trainset[:,2]>0.5] # Get class 1 data, which is labeled by 1
numClass2=Trainset2.shape[0]
NumTrain.append(numClass2)
print "    x1   x2   label"
print Trainset
    x1   x2   label
[[-0.85 -0.01  0.  ]
 [-0.59 -0.11  0.  ]
 [-0.47 -0.24  0.  ]
 [-0.28 -0.4   0.  ]
 [-0.06 -0.57  0.  ]
 [ 0.1  -0.69  0.  ]
 [ 0.32 -0.89  0.  ]
 [-0.72 -0.26  0.  ]
 [-0.56 -0.42  0.  ]
 [-0.92 -1.44  0.  ]
 [-0.22 -1.38  0.  ]
 [-0.68 -0.82  0.  ]
 [-0.41 -1.02  0.  ]
 [-1.03 -0.75  0.  ]
 [-0.93 -0.53  0.  ]
 [-0.89 -0.26  0.  ]
 [-0.3  -0.81  0.  ]
 [ 0.   -1.12  0.  ]
 [-0.85 -1.05  0.  ]
 [-0.19 -0.44  0.  ]
 [-0.29 -0.15  1.  ]
 [-0.03 -0.21  1.  ]
 [-0.2   0.2   1.  ]
 [ 0.08 -0.24  1.  ]
 [-0.1   0.3   1.  ]
 [-0.1  -0.3   1.  ]
 [-0.18 -0.2   1.  ]
 [-0.15 -0.23  1.  ]
 [ 0.01 -0.3   1.  ]
 [-0.1  -0.21  1.  ]
 [-0.18 -0.31  1.  ]
 [-0.07 -0.11  1.  ]
 [ 0.48  0.52  1.  ]]

Next, the training data is plotted in a matplotlib-figure. It can easily be seen that the data of both classes is not gaussian distributed, but the two pointsets would be linear separable.

In [3]:
plt.figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
plt.plot(Trainset1[:,0],Trainset1[:,1],'o',ms=6,label="Class0 Training")
plt.plot(Trainset2[:,0],Trainset2[:,1],'or',ms=6,label="Class1 Training")
plt.xlabel('$x_1$',fontsize=16)
plt.ylabel('$x_2$',fontsize=16)
plt.xlim([-2,2])
plt.ylim([-2,2])
plt.legend(loc=2,numpoints=1)
plt.title("Given Training Data")
plt.show()

Despite the non-Gaussian distributed data, we assume a Gaussian distribution and calculate the corresponding parameters mean and covariance for both classes.

In [4]:
#Parameter Estimation for class 0 Trainingdata
m0 = 1.0/NumTrain[0]*np.sum(Trainset1,axis=0)
Evar011=1.0/NumTrain[0]*sum([(p[0]-m0[0])**2 for p in Trainset1])
Evar022=1.0/NumTrain[0]*sum([(p[1]-m0[1])**2 for p in Trainset1])
Evar012=1.0/NumTrain[0]*sum([(p[0]-m0[0])*(p[1]-m0[1]) for p in Trainset1])
print "############################### Class 0 ################################"
print "-----------------------------Mean Values Estimation---------------------"
print "Estimated: m00=%5.4f" % (m0[0])
print "Estimated: m01=%5.4f" % (m0[1])
print "-----------------------------Variance Estimation------------------------"
print "Estimated: var11=%5.4f" % (Evar011)
print "Estimated: var22=%5.4f" % (Evar022)
print "Estimated: cov12=%5.4f" % (Evar012)

#Parameter Estimation for class 1 Trainingdata
m1 = 1.0/NumTrain[1]*np.sum(Trainset2,axis=0)
Evar111=1.0/NumTrain[1]*sum([(p[0]-m1[0])**2 for p in Trainset2])
Evar122=1.0/NumTrain[1]*sum([(p[1]-m1[1])**2 for p in Trainset2])
Evar112=1.0/NumTrain[1]*sum([(p[0]-m1[0])*(p[1]-m1[1]) for p in Trainset2])
print "############################### Class 1 ################################"
print "-----------------------------Mean Values Estimation---------------------"
print "Estimated: m10=%5.4f" % (m1[0])
print "Estimated: m11=%5.4f" % (m1[1])
print "-----------------------------Variance Estimation------------------------"
print "Estimated: var11=%5.4f" % (Evar111)
print "Estimated: var22=%5.4f" % (Evar122)
print "Estimated: cov12=%5.4f" % (Evar112)
############################### Class 0 ################################
-----------------------------Mean Values Estimation---------------------
Estimated: m00=-0.4765
Estimated: m01=-0.6605
-----------------------------Variance Estimation------------------------
Estimated: var11=0.1434
Estimated: var22=0.1575
Estimated: cov12=-0.0287
############################### Class 1 ################################
-----------------------------Mean Values Estimation---------------------
Estimated: m10=-0.0638
Estimated: m11=-0.0954
-----------------------------Variance Estimation------------------------
Estimated: var11=0.0332
Estimated: var22=0.0640
Estimated: cov12=0.0241

The parameters estimated from the training data fully define the learned models of both classes. We apply these models in order to assign each point of a meshgrid to the class whose model yields the higher a-posteriori probability.

In [5]:
ticks = arange(-2.,2.,0.03)
X,Y = meshgrid(ticks, ticks)
testpoints=[]
for i in xrange(X.size):
    testpoints.append([X.ravel()[i],Y.ravel()[i]])

s011=np.sqrt(Evar011)
s022=np.sqrt(Evar022)
corr0=Evar012/(s011*s022)
s111=np.sqrt(Evar111)
s122=np.sqrt(Evar122)
corr1=Evar112/(s111*s122)

class0Points=[]
class1Points=[]
for point in testpoints:
    x=point[0]
    y=point[1]
    a0=1.0/(2*np.pi*s011*s022*np.sqrt(1-corr0**2))*np.exp(-1.0/(2*(1-corr0**2))*(((x-m0[0])**2)/(s011**2)+((y-m0[1])**2)/(s022**2)-2*corr0*(x-m0[0])*(y-m0[1])/(s011*s022)))
    a1=1.0/(2*np.pi*s111*s122*np.sqrt(1-corr1**2))*np.exp(-1.0/(2*(1-corr1**2))*(((x-m1[0])**2)/(s111**2)+((y-m1[1])**2)/(s122**2)-2*corr1*(x-m1[0])*(y-m1[1])/(s111*s122)))
    if a0>a1:
        class0Points.append(point)
    else:
        class1Points.append(point)

Grid points whose class-0 aposteriori is larger are coloured magenta. Grid points whose class-1 aposteriori is larger are coloured yellow.

In [6]:
plt.figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
plt.hold(True)
plt.plot([w[0] for w in class0Points],[w[1] for w in class0Points],"sm",ms=7)
plt.plot([w[0] for w in class1Points],[w[1] for w in class1Points],"sy",ms=7)
plt.plot(Trainset1[:,0],Trainset1[:,1],'o',ms=6,label="Class0 Training")
plt.plot(Trainset2[:,0],Trainset2[:,1],'or',ms=6,label="Class1 Training")
plt.xlabel('$x_1$',fontsize=16)
plt.ylabel('$x_2$',fontsize=16)
plt.xlim([-2,2])
plt.ylim([-2,2])

plt.title('Parametric classification')
plt.legend(loc=2,numpoints=1)
plt.show()

As shown in the plot, the trained model is not able to classify all training instances correctly: Two instances of class 0 are assigned to class 1.

Discriminative Models

For the given data it would be much better to apply a discriminative model. In this section we apply Logistic Regression, Stochastic Gradient Descent and a single layer Perceptron. All of these learnes are contained in the scikits learn module linear_model.

In [7]:
from sklearn import linear_model

Logistic Regression

All instances available from the file are used for training. In the Logistic Regression module the parameter $C$ should be chosen large, if only the error function shall be minimized without regularisation. Smaller $C$-values yield a stronger emphasize on regularization.

In [8]:
features=Trainset[:,:2]
labels=Trainset[:,2]
clf=linear_model.LogisticRegression(C=1000)
clf.fit(features, labels)
"Logistic Regression - Learned Parameters:"
print "   intercept:   ",clf.intercept_
print "   coeffs:      ",clf.coef_
   intercept:    [ 14.32282198]
   coeffs:       [[ 22.97454486  28.7392451 ]]

Plot the learned discriminative model and the training data.

In [9]:
m=-1.0*clf.coef_[0,0]/clf.coef_[0,1]
b=-1.0*clf.intercept_/clf.coef_[0,1]
pred=clf.predict(features)
print "Number of Errors on training data: ",np.sum(np.abs(pred-labels))

xr=np.array([-1,1])
yvals=m*xr+b
plt.figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
plt.plot(Trainset1[:,0],Trainset1[:,1],'o',ms=6,label="Class0 Training")
plt.plot(Trainset2[:,0],Trainset2[:,1],'or',ms=6,label="Class1 Training")
plt.xlabel('$x_1$',fontsize=16)
plt.ylabel('$x_2$',fontsize=16)
plt.xlim([-2,2])
plt.ylim([-2,2])
plt.legend(loc=2,numpoints=1)
plt.title("Logistic Regression")
#plt.show()
plt.hold(True)
plt.plot(xr,yvals,'b')
plt.show()
Number of Errors on training data:  0.0

Stochastic Gradient Descent (SGD Classifier)

In [10]:
clf=linear_model.SGDClassifier(n_iter=25)
clf.fit(features, labels)
"Stochastic Gradient Descend - Learned Parameters:"
print "   intercept:   ",clf.intercept_
print "   coeffs:      ",clf.coef_
   intercept:    [ 14.18365654]
   coeffs:       [[ 21.10745614  27.24780702]]
In [11]:
m=-1.0*clf.coef_[0,0]/clf.coef_[0,1]
b=-1.0*clf.intercept_/clf.coef_[0,1]
pred=clf.predict(features)
print "Number of Errors on training data: ",np.sum(np.abs(pred-labels))

xr=np.array([-1,1])
yvals=m*xr+b
plt.figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
plt.plot(Trainset1[:,0],Trainset1[:,1],'o',ms=6,label="Class0 Training")
plt.plot(Trainset2[:,0],Trainset2[:,1],'or',ms=6,label="Class1 Training")
plt.xlabel('$x_1$',fontsize=16)
plt.ylabel('$x_2$',fontsize=16)
plt.xlim([-2,2])
plt.ylim([-2,2])
plt.legend(loc=2,numpoints=1)
plt.title("Stochastic Gradient Descent")
#plt.show()
plt.hold(True)
plt.plot(xr,yvals,'b')
plt.show()
Number of Errors on training data:  0.0

Single Layer Perceptron

In [12]:
clf=linear_model.Perceptron(n_iter=20)
clf.fit(features, labels)
"Single Layer Perceptron - Learned Parameters:"
print "   intercept:   ",clf.intercept_
print "   coeffs:      ",clf.coef_
   intercept:    [ 1.]
   coeffs:       [[ 1.49  1.81]]
In [13]:
m=-1.0*clf.coef_[0,0]/clf.coef_[0,1]
b=-1.0*clf.intercept_/clf.coef_[0,1]
pred=clf.predict(features)
print "Number of Errors on training data: ",np.sum(np.abs(pred-labels))

xr=np.array([-1,1])
yvals=m*xr+b
plt.figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
plt.plot(Trainset1[:,0],Trainset1[:,1],'o',ms=6,label="Class0 Training")
plt.plot(Trainset2[:,0],Trainset2[:,1],'or',ms=6,label="Class1 Training")
plt.xlabel('$x_1$',fontsize=16)
plt.ylabel('$x_2$',fontsize=16)
plt.xlim([-2,2])
plt.ylim([-2,2])
plt.legend(loc=2,numpoints=1)
plt.title("Single Layer Perceptron")
#plt.show()
plt.hold(True)
plt.plot(xr,yvals,'b')
plt.show()
Number of Errors on training data:  0.0
In [13]: