A Naive Bayes Classifier is a machine learning model applying Bayes' Theorem - Derived by Rev. Thomas Bayes in the 18th century. Bayes Law is based on a simple formula:

$$P(A|B) = P(B|A)\frac{P(A)}{P(B)}$$The formula reads as follows: The probability of A given B is the probability of B given A times to the ratio of the probabilities of A and B (Linoff and Berry, 2011:211)

The Bayesian part of Naive Byesian part of Naive Bayesian Models refers to the technique's use of Bayes' law. The naive part refers to the assumption that the variables used are independent of each other, with respect to the target. The Bayes Classifiers belong to the probabilistic classifiers family. This family of machine learning models is well suited when the dimensionality of the inputs is high (predictors > observations). Naive Bayesian Models provide a way out of this dilemma when you are trying to predict a probability.

Typical applications of Naive Bayes Algorithms:

- Real time Prediction;
- Text classification/ Spam Filtering/ Sentiment Analysis;
- Recommendation System.

Here you can find a great visual explanation of how a Naive Bayes Classifier works.

Advantages:

- Well suited for high dimensionality.
- Easy to implement.
- Can be trained with a small data set.

Disadvantages:

- Dependencies among variables can not be modelled.
- Can only be used for predicting (multiple) classes.
- On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.

Be aware that this is a simplified example to give a brief introduction into using a Naive Bayes Classifier.

In [3]:

```
#Import libraries
%matplotlib inline
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import scipy
from collections import Counter
from sklearn import preprocessing, cross_validation
#Import Gaussian Naive Bayes model from SciKit-Learn
from sklearn.naive_bayes import GaussianNB
```

In this example we're going to use SciKit-Learn to build a Naive Bayes model using Python.There are three types of Naive Bayes models available in the SciKit-Learn library:

Gaussian: It is used in classification and it assumes that features follow a normal distribution.

Multinomial: It is used for discrete counts. For example, let’s say, we have a text classification problem. Here we can consider bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.

Bernoulli: The binomial model is useful if your feature vectors are binary (i.e. zeros and ones). One application would be text classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.

For this example we're going to use the Gaussian Naive Bayes.

In [4]:

```
churn = pd.read_csv('churn.csv')
# Replace yes/no with 1/0.
churn = churn.replace(['yes', 'no'], [1, 0])
```

In [5]:

```
churn.head()
```

Out[5]:

In [6]:

```
#Let's firt look what the ratio between True and False of the Churn column is. Just to check if we have values we can predict.
counter = Counter(churn['Churn?'])
names = counter.keys()
counts = counter.values()
# Plot histogram using matplotlib bar().
indexes = np.arange(len(names))
width = 0.7
plt.bar(indexes, counts, width)
plt.xticks(indexes + width * 0.5, names)
plt.show()
```

In [7]:

```
#We also need te change the target variable in to a binary target variable.
churn = churn.replace(['True.', 'False.'], [1, 0])
```

In [8]:

```
#Selecting the input variables for the model and create a Numpy Array.
x = np.array(churn[['Day Calls','Day Charge','Eve Calls','Eve Charge', 'Night Calls', 'Night Charge', 'Intl Calls','Intl Charge', 'CustServ Calls', 'VMail Plan', 'VMail Message']])
print x
```

In [9]:

```
#Selecting target variable and create a Numpy Array.
y = np.array(churn[['Churn?']])
print y
```

In [10]:

```
#We create four different sets for cross validation.
x_train, x_test, y_train, y_test = cross_validation.train_test_split(x,y,test_size=0.2)
```

In [11]:

```
#Select the model (Gaussian Naive Bayes Classifier)
nbc = GaussianNB()
```

In [16]:

```
model = nbc.fit(x_train, y_train)
accuracy = nbc.score(x_train, y_train)
accuracy
```

Out[16]:

In [14]:

```
# Tests Accuracy of the model on the test data set. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
accuracy2 = nbc.score(x_test, y_test)
accuracy2
```

In [27]:

```
#Example on how to classify a new observation
example_predict = np.array([0, 1, 110, 45.07, 99, 16.78, 91, 11.01, 3, 2.70, 1])
prediction = nbc.predict(example_predict)
print prediction
```

References:

Linoff and Berry. (2011) Data Mining Techniques. Indianapolis: WIley Publishing, Inc.