Notebook

k-means clustering example demonstration¶

k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The problem is computationally difficult (NP-hard).

I have made a very simple example invloving 20 points and 4 means here using basic techniques. Its interesting to see that only after a few iterations like 4-5 the 4 means in this example take their fixed places. Note that since this is a randomized algorithm the output may vary depending on where the centroids were initialized.

In [1]:

require 'slearn'
points = N[[3.0,4.0],[89.0,31.0],[23,144],[80,1],[6.0,15.0],[21.0,10.0], \
           [100.0,89.0],[90,124],[80,93],[80,123],[91,110],  \
           [120,14],[70,2],[90,1],[1.0,2.0],[10.0,11.0], \
           [21.0,121.0],[1,100],[30,90],[31,111]]

v1 = Daru::Vector.new(points[0..points.shape[0]-1,0])
v2 = Daru::Vector.new(points[0..points.shape[0]-1,1])
v3 = Array.new(points.shape[0], -1000)   
v4 = Array.new(points.shape[0], -1000) 

means = points.kmeans(4,20)
0.upto(means.shape[0] - 1) do |i|
  v3[i]=means[i,0]
  v4[i]=means[i,1]
end

ploter=Daru::DataFrame.new({v1: v1, v2: v2,v3: v3,v4: v4})


ploter.plot type: :scatter, x1: :v1, y1: :v2, x2: :v3, y2: :v4 do |plot, diagrams|
  points     = diagrams[0]
  means = diagrams[1]
  
  points.title "Points"
  points.color "#00FF00"
  
  means.title "K means"
  means.color "#FFFF00"
  
  
  plot.legend true
  plot.xrange [-40,190]
  plot.yrange [-40,180]
  plot.x_label "x axis"
  plot.y_label "y axis"
end

The co-ordinates of the resulting centroids are

In [21]:

means

Out[21]:

$$\left(\begin{array}{cc} 88.2&107.8\\ 8.2&8.4\\ 21.2&113.2\\ 89.8&9.8\\ \end{array}\right)$$