k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The problem is computationally difficult (NP-hard).
I have made a very simple example invloving 20 points and 4 means here using basic techniques. Its interesting to see that only after a few iterations like 4-5 the 4 means in this example take their fixed places. Note that since this is a randomized algorithm the output may vary depending on where the centroids were initialized.
require 'slearn'
points = N[[3.0,4.0],[89.0,31.0],[23,144],[80,1],[6.0,15.0],[21.0,10.0], \
[100.0,89.0],[90,124],[80,93],[80,123],[91,110], \
[120,14],[70,2],[90,1],[1.0,2.0],[10.0,11.0], \
[21.0,121.0],[1,100],[30,90],[31,111]]
v1 = Daru::Vector.new(points[0..points.shape[0]-1,0])
v2 = Daru::Vector.new(points[0..points.shape[0]-1,1])
v3 = Array.new(points.shape[0], -1000)
v4 = Array.new(points.shape[0], -1000)
means = points.kmeans(4,20)
0.upto(means.shape[0] - 1) do |i|
v3[i]=means[i,0]
v4[i]=means[i,1]
end
ploter=Daru::DataFrame.new({v1: v1, v2: v2,v3: v3,v4: v4})
ploter.plot type: :scatter, x1: :v1, y1: :v2, x2: :v3, y2: :v4 do |plot, diagrams|
points = diagrams[0]
means = diagrams[1]
points.title "Points"
points.color "#00FF00"
means.title "K means"
means.color "#FFFF00"
plot.legend true
plot.xrange [-40,190]
plot.yrange [-40,180]
plot.x_label "x axis"
plot.y_label "y axis"
end
The co-ordinates of the resulting centroids are
means