$\newcommand{\xv}{\mathbf{x}} \newcommand{\Xv}{\mathbf{X}} \newcommand{\yv}{\mathbf{y}} \newcommand{\Yv}{\mathbf{Y}} \newcommand{\zv}{\mathbf{z}} \newcommand{\av}{\mathbf{a}} \newcommand{\Wv}{\mathbf{W}} \newcommand{\wv}{\mathbf{w}} \newcommand{\betav}{\mathbf{\beta}} \newcommand{\gv}{\mathbf{g}} \newcommand{\Hv}{\mathbf{H}} \newcommand{\dv}{\mathbf{d}} \newcommand{\Vv}{\mathbf{V}} \newcommand{\vv}{\mathbf{v}} \newcommand{\Uv}{\mathbf{U}} \newcommand{\uv}{\mathbf{u}} \newcommand{\tv}{\mathbf{t}} \newcommand{\Tv}{\mathbf{T}} \newcommand{\Sv}{\mathbf{S}} \newcommand{\Gv}{\mathbf{G}} \newcommand{\zv}{\mathbf{z}} \newcommand{\Zv}{\mathbf{Z}} \newcommand{\Norm}{\mathcal{N}} \newcommand{\muv}{\boldsymbol{\mu}} \newcommand{\sigmav}{\boldsymbol{\sigma}} \newcommand{\phiv}{\boldsymbol{\phi}} \newcommand{\Phiv}{\boldsymbol{\Phi}} \newcommand{\Sigmav}{\boldsymbol{\Sigma}} \newcommand{\Lambdav}{\boldsymbol{\Lambda}} \newcommand{\half}{\frac{1}{2}} \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}} \newcommand{\dimensionbar}[1]{\underset{#1}{\operatorname{|}}} \newcommand{\grad}{\mathbf{\nabla}} \newcommand{\ebx}[1]{e^{\betav_{#1}^T \xv_n}} \newcommand{\eby}[1]{e^{y_{n,#1}}} \newcommand{\Tiv}{\mathbf{Ti}} \newcommand{\Fv}{\mathbf{F}} \newcommand{\ones}[1]{\mathbf{1}_{#1}}$

Nonparametric Classification with K Nearest Neighbors

In [1]:
import numpy as np
import scipy.stats as ss # for mode
import matplotlib.pyplot as plt
%matplotlib inline
import qdalda as ql
import neuralnetworks as nn

To implement k nearest neighbors, we must calculate the squared distance between each sample to be classified and all of our training data. We can do this with two nested for loops. But let's use numpy broadcasting to do this.

First, here are four two-dimensional samples, and target class labels for each. These will be our saved training data.

In [2]:
X = np.array([[1,2], [3,4], [5,6], [7,8]])
T = np.array([[1],[1],[2],[2]])

Now here are two more samples. These are two new samples for which we want to guess class labels.

We will do so by finding the k closest samples in our training set, and examine the given class labels for those.

In [3]:
Xnew = np.array([[3,3], [6,6]])

The squared distances between all four of our initial samples and each of these two new ones is

In [4]:
distsSquared = np.sum( (Xnew[:, np.newaxis, :] - X)**2, axis = -1 )
distsSquared
Out[4]:
array([[ 5,  1, 13, 41],
       [41, 13,  1,  5]])

Now we must find the $k$ smallest distances in each row. One way to do this is to use argsort on each row. These are indices into the rows of $X$, telling us which of our $X$ samples are closest to each of our $Xnew$ samples.

In [5]:
np.argsort(distsSquared, axis=1)
Out[5]:
array([[1, 0, 2, 3],
       [2, 3, 1, 0]])

Now we just take the first $k$ columns of each row. Say $k=3$.

In [6]:
k=3
indices = np.argsort(distsSquared,axis=1)[:, :k]
indices
Out[6]:
array([[1, 0, 2],
       [2, 3, 1]])

So now we just translate these indices into class labels, and pick the class label appearing most often in each row.

In [11]:
T
Out[11]:
array([[1],
       [1],
       [2],
       [2]])
In [8]:
indices
Out[8]:
array([[1, 0, 2],
       [2, 3, 1]])
In [12]:
T[indices, :]
Out[12]:
array([[[1],
        [1],
        [2]],

       [[2],
        [2],
        [1]]])
In [7]:
T[indices,:].shape
Out[7]:
(2, 3, 1)
In [13]:
T[indices,:][:,:,0].shape
Out[13]:
(2, 3)
In [14]:
T[indices, :][:, :, 0]
Out[14]:
array([[1, 1, 2],
       [2, 2, 1]])

To pick most common in each row, we can use the scipy.stats mode function.

In [17]:
ss.mode(T[indices,:][:,:,0], axis=1)
Out[17]:
ModeResult(mode=array([[1],
       [2]]), count=array([[2],
       [2]]))

The documentation for mode tells us that this function returns the most common element and also the number of times it appeared. We only want the most common element of each row.

In [19]:
ss.mode(T[indices,:][:,:,0], axis=1)[0]
Out[19]:
array([[1],
       [2]])

That's it. Now wrap these statements up in a KNN class.

In [44]:
import numpy as np
import numpy as np
import scipy.stats as ss

######################################################################
### class KNN
######################################################################

class KNN(object):
    
    def __init__(self):
        self.X = None  # data will be stored here
        self.T = None  # class labels will be stored here
    
    def train(self, X, T):
        self.X = X
        self.T = T

    def use(self, Xnew, k = 1):
        self.k = k
        # Calc squared distance from all samples in Xnew with all stored in self.X
        distsSquared = np.sum( (Xnew[:, np.newaxis, :] - self.X)**2, axis=-1 )
        indices = np.argsort(distsSquared, axis=1)[:,:k]
        classes = ss.mode(T[indices, :][:, :, 0], axis=1)[0]
        return classes

First, let's make some 2-d data.

In [45]:
n = 20
X = np.random.multivariate_normal([5,7], [[0.8, -0.5], [-0.5, 0.8]], n)
X = np.vstack((X, np.random.multivariate_normal([6,3], [[0.6, 0.5], [0.5, 0.8]], n)))
T = np.vstack((np.ones((n, 1)), 2*np.ones((n,1))))

s = plt.scatter(X[:, 0],X[:, 1], c=['C{:1d}'.format(int(t[0])) for t in T-1], s=80);

We can apply our knn algorithm to any point and get a predicted class label. Let's apply knn to a grid of points in this two-dimensional plane, and fill regions in blue containing points classified as class 1, and red for class 2. This is easy to do with matplotlib's contourf function.

In [51]:
plt.figure(figsize=(10, 10))

n = 20
X = np.random.multivariate_normal([5, 7], [[0.8, -0.5], [-0.5, 0.8]], n)
X = np.vstack((X, np.random.multivariate_normal([6, 3], [[0.6, 0.5], [0.5, 0.8]], n)))
T = np.vstack((np.ones((n, 1)), 2*np.ones((n,1 ))))


m = 100
xs = np.linspace(0, 10, m)
ys = xs
Xs, Ys = np.meshgrid(xs, ys)
samples = np.vstack((Xs.ravel(), Ys.ravel())).T

knn = KNN()
knn.train(X, T)

classes = knn.use(samples ,k=1)

levels = [0, 1, 2]
plt.contourf(Xs, Ys, classes.reshape(Xs.shape), levels, alpha=0.2, colors=('C0','C1'), antialiased=True)
s=plt.scatter(X[:, 0], X[:, 1], c=['C{:1d}'.format(int(t[0])) for t in T-1], s=80);

How does this compare to our other classifiers?

In [52]:
def plotResult(X, T, Xs, Ys, classes, nClasses):
    levels = [0.9, 1.9, 2.9, 3.9, 4.9][:nClasses+1]
    plt.contourf(Xs, Ys, classes.reshape(Xs.shape), levels, alpha=0.2, colors=('C0','C1','C2'), antialiased=True)
    plt.scatter(X[:, 0], X[:, 1], c=['C{:1d}'.format(int(t[0])) for t in T-1], s=60);
In [53]:
def multipleClassifiers(X, T, Xtest, Xs, Ys):
    nClasses = len(np.unique(T))
    plt.figure(figsize=(20,20))
    for i in range(3*3):
        plt.subplot(3,3,i+1)
    
        if i == 0:
            K = 1
            knn = KNN()
            knn.train(X,T)
            classes = knn.use(Xtest,K)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('KNN K=1')
        
        elif i == 1:
            K = 2
            knn = KNN()
            knn.train(X,T)
            classes = knn.use(Xtest,K)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('KNN K=2')
        
        elif i == 2:
            K = 5
            knn = KNN()
            knn.train(X,T)
            classes = knn.use(Xtest,K)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('KNN K=5')
        
        elif i == 3:
            lda = ql.LDA()
            lda.train(X,T)
            classes,_,_ = lda.use(Xtest)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('LDA')
        
        elif i == 4:
            qda = ql.QDA()
            qda.train(X,T)
            classes,_,_ = qda.use(Xtest)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('QDA')
        
        elif i == 5:
            llr = nn.NeuralNetworkClassifier(2, 0, nClasses)
            llr.train(X,T,nIterations=100)
            classes = llr.use(Xtest)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('Linear LR')
        
        elif i == 6:
            nnlr = nn.NeuralNetworkClassifier(2, 1, nClasses)
            nnlr.train(X,T,nIterations=100)
            classes = nnlr.use(Xtest)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('Nonlinear LR, H=[1]')
        
        elif i == 7:
            nnlr = nn.NeuralNetworkClassifier(2, 10, nClasses)
            nnlr.train(X,T,nIterations=100)
            classes = nnlr.use(Xtest)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('Nonlinear LR, H=[10]')
        
        elif i == 8:
            nnlr = nn.NeuralNetworkClassifier(2, [10, 10, 10], nClasses)
            nnlr.train(X,T,nIterations=100)
            classes = nnlr.use(Xtest)
            plotResult(X,T,Xs,Ys,classes, nClasses)
            plt.title('Nonlinear LR, H=[10,10,10]')    
In [55]:
n = 40
X = np.random.multivariate_normal([5,6], [[0.9, -0.2], [-0.2, 0.9]], n)
X = np.vstack((X, np.random.multivariate_normal([6,3], [[2, 0.4], [0.4, 2]], n)))
T = np.vstack((np.ones((n,1)), 2*np.ones((n,1))))


m = 200
xs = np.linspace(X.min(),X.max(),m)           
ys = xs
Xs,Ys = np.meshgrid(xs,ys)
Xtest = np.vstack((Xs.ravel(), Ys.ravel())).T

multipleClassifiers(X, T, Xtest, Xs, Ys)
8.965980753398698e-20
In [56]:
n = 500
x1 = np.linspace(5, 20, n) + np.random.uniform(-2, 2, n)
y1 = ((20 - 12.5)**2 - (x1 - 12.5)**2) / (20 - 12.5)**2 * 10 + 14 + np.random.uniform(-2, 2, n)
x2 = np.linspace(10, 25, n) + np.random.uniform(-2, 2, n)
y2 = ((x2 - 17.5)**2) / (25 - 17.5)**2 * 10 + 5.5 + np.random.uniform(-2, 2, n)
angles = np.linspace(0, 2*np.pi, n)
x3 = np.cos(angles) * 15 + 15 + np.random.uniform(-2, 2, n)
y3 = np.sin(angles) * 15 + 15 + np.random.uniform(-2, 2, n)
X =  np.vstack((np.hstack((x1, x2, x3)), np.hstack((y1, y2, y3)))).T
T = np.repeat(range(1, 4), n).reshape((-1, 1))
plt.figure(figsize=(6, 6))
for c in range(1, 4):
    mask = (T == c).flatten()
    plt.plot(X[mask, 0], X[mask, 1], 'o' ,markersize=6, alpha=0.5)
In [57]:
xs = np.linspace(-5,35,100)
Xs, Ys = np.meshgrid(xs, xs)
Xtest = np.vstack((Xs.flat, Ys.flat)).T

multipleClassifiers(X, T, Xtest, Xs, Ys)
1.450666507801686e-16
1.431170280021877e-16