Let's pretend we have an email with three words: "Send money now." We'll use Naive Bayes to classify it as ham or spam.
$$P(spam \ | \ \text{send money now}) = \frac {P(\text{send money now} \ | \ spam) \times P(spam)} {P(\text{send money now})}$$By assuming that the features (the words) are conditionally independent, we can simplify the likelihood function:
$$P(spam \ | \ \text{send money now}) \approx \frac {P(\text{send} \ | \ spam) \times P(\text{money} \ | \ spam) \times P(\text{now} \ | \ spam) \times P(spam)} {P(\text{send money now})}$$We can calculate all of the values in the numerator by examining a corpus of spam email:
$$P(spam \ | \ \text{send money now}) \approx \frac {0.2 \times 0.1 \times 0.1 \times 0.9} {P(\text{send money now})} = \frac {0.0018} {P(\text{send money now})}$$We would repeat this process with a corpus of ham email:
$$P(ham \ | \ \text{send money now}) \approx \frac {0.05 \times 0.01 \times 0.1 \times 0.1} {P(\text{send money now})} = \frac {0.000005} {P(\text{send money now})}$$All we care about is whether spam or ham has the higher probability, and so we predict that the email is spam.
Advantages of Naive Bayes:
Disadvantages of Naive Bayes: