# Parametric Classification of 1-dimensional Input¶

In this notebook a parametric classifier for 1-dimensional input data is developed. The task is to predict the category of car, a customers will purchase, if his annual income is known. In the training phase the gaussian distributed likelihood $p(x|C_i)$ and the a-priori $P(C_i)$ for each of the 3 car classes $C_i$ is estimated from a sample of 27 training instances, each containing the annual income and the purchased car of a former customer. The file containing the training data can be ob obtained from here

Required Python modules:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
np.set_printoptions(precision=1,suppress=True)


Read customer data from file. Each row in the file represents one custoner. The first column is the customer ID, the second column is the annual income of the customer and the third column is the class of car he or she bought:

• 0 = Low Class
• 1 = Middle Class
In [2]:
customerArray=np.fromfile("./Res/AutoKunden.txt",sep=' ').reshape(-1,3)
print "     ID   Income   Car-class"
print customerArray
Ntot=len(customerArray)

     ID   Income   Car-class
[[     1.  77017.      2.]
[     2.  69062.      1.]
[     3.  16558.      0.]
[     4.  88625.      2.]
[     5.  93726.      2.]
[     6.  66035.      1.]
[     7.  72293.      2.]
[     8.  14595.      0.]
[     9.  36797.      1.]
[    10.  65124.      2.]
[    11.  15454.      0.]
[    12.  21161.      1.]
[    13.  56231.      1.]
[    14.  44612.      1.]
[    15.  31415.      1.]
[    16.  26697.      0.]
[    17.  23132.      1.]
[    18.  17469.      0.]
[    19.  64526.      1.]
[    20.  17936.      0.]
[    21.  33298.      1.]
[    22.  25335.      1.]
[    23.  67334.      2.]
[    24.  20405.      0.]
[    25.  28099.      0.]
[    26.  37016.      1.]
[    27.  81069.      2.]]


Determine for each car-class $C_i$ the likelihood-function $p(x|C_i)$ and the a-priori probability $p(C_i)$. It is assumed that the likelihoods are gaussian normal distributions. Then for each class the mean and the standard-deviation must be estimated from the given data. The modified a-posteriori value is the product of likelihood and a-priori probability (modified because the evidence $p(x)$ is ignored):

$$p(C_i|x) \sim p(x|C_i) \cdot p(C_i)$$

The decision based on this modified a-Posteriori will be the same as the decision based on the real a-Posteriori, since

$$C_{pred} = argmax_{C_i}\left( \frac{p(x|C_i) \cdot p(C_i)}{p(x)}\right) = argmax_{C_i}\left( p(x|C_i) \cdot p(C_i)\right)$$

The estimated likelihood functions are plotted.

Now, since we have estimated the models of all 3 classes, we can apply them for predicting the most likely car-class from the annual income of a potential customer.

In [7]:
AnnualIncomeList=[25000,29000,63000,69000]
Aposteriori=[]
x=range(0,100000,100)
for c in range(0,3): #iterate over the 3 classes
print '-'*40
print 'Class %d'%c
A=customerArray[customerArray[:,2]==c] #Filter customers of current class
p=float(len(A))/Ntot # a-priori probability
m=np.mean(A[:,1])    # mean-value
s=np.std(A[:,1])     # standard deviation
print "   A-Priori probability of this class: %1.2f "%p
print "   Estimated mean of this class\' likelihood: %4.2f "%m
print "   Estimated standard deviation of this class\' likelihood: %4.2f "%s
# Likelihood multiplied by a-priori probability
likelihood = 1/(s * np.sqrt(2 * np.pi))*np.exp( - (x - m)**2 / (2 * s**2) )
aposterioriMod=p*likelihood
Aposteriori.append(aposterioriMod)
plt.plot(x,aposterioriMod,label='class '+str(c))
plt.hold(True)
plt.grid(True)
plt.hold(True)
for AnnualIncome in AnnualIncomeList: #plot vertical lines at the annual incomes for which classification is required
plt.axvline(x=AnnualIncome,color='m',ls='dashed')
plt.legend()
plt.xlabel("Annual Income")
plt.ylabel("Probability")
plt.title("Likelihood times A-Priori Probability for all 3 classes")

----------------------------------------
Class 0
A-Priori probability of this class: 0.30
Estimated mean of this class' likelihood: 19651.62
Estimated standard deviation of this class' likelihood: 4770.09
----------------------------------------
Class 1
A-Priori probability of this class: 0.44
Estimated mean of this class' likelihood: 42385.00
Estimated standard deviation of this class' likelihood: 16665.05
----------------------------------------
Class 2
A-Priori probability of this class: 0.26
Estimated mean of this class' likelihood: 77884.00
Estimated standard deviation of this class' likelihood: 9875.03

Out[7]:
<matplotlib.text.Text at 0x5ae8250>
In [5]:
for AnnualIncome in AnnualIncomeList:
print '-'*20
print "Annual Income = %7.2f"%AnnualIncome
i=round(AnnualIncome/100)
proVal=[x[i] for x in Aposteriori]
sumProbs=np.sum(proVal)
for i,p in enumerate(proVal):
print 'APosteriori propabilitiy of class %d = %1.4f'% (i,p/sumProbs)
print 'Most probable class for customer with income %5.2f Euro is %d '% (AnnualIncome,np.argmax(np.array(proVal)))


--------------------
Annual Income = 25000.00
APosteriori propabilitiy of class 0 = 0.6816
APosteriori propabilitiy of class 1 = 0.3184
APosteriori propabilitiy of class 2 = 0.0000
Most probable class for customer with income 25000.00 Euro is 0
--------------------
Annual Income = 29000.00
APosteriori propabilitiy of class 0 = 0.3203
APosteriori propabilitiy of class 1 = 0.6797
APosteriori propabilitiy of class 2 = 0.0000
Most probable class for customer with income 29000.00 Euro is 1
--------------------
Annual Income = 63000.00
APosteriori propabilitiy of class 0 = 0.0000
APosteriori propabilitiy of class 1 = 0.5954
APosteriori propabilitiy of class 2 = 0.4046
Most probable class for customer with income 63000.00 Euro is 1
--------------------
Annual Income = 69000.00
APosteriori propabilitiy of class 0 = 0.0000
APosteriori propabilitiy of class 1 = 0.2984
APosteriori propabilitiy of class 2 = 0.7016
Most probable class for customer with income 69000.00 Euro is 2

In [4]: