# k-means clustering example demonstration¶

k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The problem is computationally difficult (NP-hard).

I have made a very simple example invloving 20 points and 4 means here using basic techniques. Its interesting to see that only after a few iterations like 4-5 the 4 means in this example take their fixed places. Note that since this is a randomized algorithm the output may vary depending on where the centroids were initialized.

In [1]:
require 'slearn'
points = N[[3.0,4.0],[89.0,31.0],[23,144],[80,1],[6.0,15.0],[21.0,10.0], \
[100.0,89.0],[90,124],[80,93],[80,123],[91,110],  \
[120,14],[70,2],[90,1],[1.0,2.0],[10.0,11.0], \
[21.0,121.0],[1,100],[30,90],[31,111]]

v1 = Daru::Vector.new(points[0..points.shape[0]-1,0])
v2 = Daru::Vector.new(points[0..points.shape[0]-1,1])
v3 = Array.new(points.shape[0], -1000)
v4 = Array.new(points.shape[0], -1000)

means = points.kmeans(4,20)
0.upto(means.shape[0] - 1) do |i|
v3[i]=means[i,0]
v4[i]=means[i,1]
end

ploter=Daru::DataFrame.new({v1: v1, v2: v2,v3: v3,v4: v4})

ploter.plot type: :scatter, x1: :v1, y1: :v2, x2: :v3, y2: :v4 do |plot, diagrams|
points     = diagrams[0]
means = diagrams[1]

points.title "Points"
points.color "#00FF00"

means.title "K means"
means.color "#FFFF00"

plot.legend true
plot.xrange [-40,190]
plot.yrange [-40,180]
plot.x_label "x axis"
plot.y_label "y axis"
end


The co-ordinates of the resulting centroids are

In [21]:
means

Out[21]:
$$\left(\begin{array}{cc} 88.2&107.8\\ 8.2&8.4\\ 21.2&113.2\\ 89.8&9.8\\ \end{array}\right)$$