**Author:**Johannes Maucher**Last Update:**22nd October 2014- List of all IPython Notebooks for this lecture

In this notebook a parametric classifier for 1-dimensional input data is developed. The task is to predict the category of car, a customers will purchase, if his annual income is known. In the training phase the gaussian distributed likelihood $p(x|C_i)$ and the a-priori $P(C_i)$ for each of the 3 car classes $C_i$ is estimated from a sample of 27 training instances, each containing the annual income and the purchased car of a former customer. The file containing the training data can be ob obtained from here

Required Python modules:

In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
np.set_printoptions(precision=1,suppress=True)
```

Read customer data from file. Each row in the file represents one custoner. The first column is the customer ID, the second column is the annual income of the customer and the third column is the class of car he or she bought:

- 0 = Low Class
- 1 = Middle Class
- 2 = Premium Class

In [2]:

```
customerArray=np.fromfile("./Res/AutoKunden.txt",sep=' ').reshape(-1,3)
print " ID Income Car-class"
print customerArray
Ntot=len(customerArray)
```

Determine for each car-class $C_i$ the likelihood-function $p(x|C_i)$ and the a-priori probability $p(C_i)$. It is assumed that the likelihoods are gaussian normal distributions. Then for each class the mean and the standard-deviation must be estimated from the given data. The modified a-posteriori value is the product of likelihood and a-priori probability (modified because the evidence $p(x)$ is ignored):

$$ p(C_i|x) \sim p(x|C_i) \cdot p(C_i) $$The decision based on this modified a-Posteriori will be the same as the decision based on the real a-Posteriori, since

$$ C_{pred} = argmax_{C_i}\left( \frac{p(x|C_i) \cdot p(C_i)}{p(x)}\right) = argmax_{C_i}\left( p(x|C_i) \cdot p(C_i)\right) $$The estimated likelihood functions are plotted.

In [7]:

```
AnnualIncomeList=[25000,29000,63000,69000]
Aposteriori=[]
x=range(0,100000,100)
for c in range(0,3): #iterate over the 3 classes
print '-'*40
print 'Class %d'%c
A=customerArray[customerArray[:,2]==c] #Filter customers of current class
p=float(len(A))/Ntot # a-priori probability
m=np.mean(A[:,1]) # mean-value
s=np.std(A[:,1]) # standard deviation
print " A-Priori probability of this class: %1.2f "%p
print " Estimated mean of this class\' likelihood: %4.2f "%m
print " Estimated standard deviation of this class\' likelihood: %4.2f "%s
# Likelihood multiplied by a-priori probability
likelihood = 1/(s * np.sqrt(2 * np.pi))*np.exp( - (x - m)**2 / (2 * s**2) )
aposterioriMod=p*likelihood
Aposteriori.append(aposterioriMod)
plt.plot(x,aposterioriMod,label='class '+str(c))
plt.hold(True)
plt.grid(True)
plt.hold(True)
for AnnualIncome in AnnualIncomeList: #plot vertical lines at the annual incomes for which classification is required
plt.axvline(x=AnnualIncome,color='m',ls='dashed')
plt.legend()
plt.xlabel("Annual Income")
plt.ylabel("Probability")
plt.title("Likelihood times A-Priori Probability for all 3 classes")
```

Out[7]:

In [5]:

```
for AnnualIncome in AnnualIncomeList:
print '-'*20
print "Annual Income = %7.2f"%AnnualIncome
i=round(AnnualIncome/100)
proVal=[x[i] for x in Aposteriori]
sumProbs=np.sum(proVal)
for i,p in enumerate(proVal):
print 'APosteriori propabilitiy of class %d = %1.4f'% (i,p/sumProbs)
print 'Most probable class for customer with income %5.2f Euro is %d '% (AnnualIncome,np.argmax(np.array(proVal)))
```

In [4]:

```
```