Multi-Class Classification

In the previous lectures we learned about how to use logistic regression to perform a binary classification. In many real life situations, we actually need to classify data into multiple classes.

For this series of lectures, we'll go through the following steps:

1.) Introduction to the Iris Data Set
2.) Introduction to Multi-Class Classification (Logistic Regression)
3.) Data Formatting
4.) Data Visualization Analysis
5.) Multi-Class Classification with Sci Kit Learn
6.) Explanation of K Nearest Neighbors
7.) K Nearest Neighbors with Sci Kit Learn
8.) Conclusion


Let's get started!

Step 1: Introduction to the Iris Data Set

For this series of lectures, we will be using the famous Iris flower data set.

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Fisher in the 1936 as an example of discriminant analysis.

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), so 150 total samples. Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres.

Here's a picture of the three different Iris types:

In [150]:
# The Iris Setosa
from IPython.display import Image
url = 'http://upload.wikimedia.org/wikipedia/commons/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg'
Image(url,width=300, height=300)
Out[150]:
In [151]:
# The Iris Versicolor
from IPython.display import Image
url = 'http://upload.wikimedia.org/wikipedia/commons/4/41/Iris_versicolor_3.jpg'
Image(url,width=300, height=300)
Out[151]: