Self-organizing map is type of neural network is trained using unsupervised learning and also is an one of Kohonen neural network. The idea of creatins the such networks belongs to the Finnish scientist Teuvo Kohonen. Basically that networks perform clustering and data visualization tasks. But they also allow to reduce multidimensional data into a space of a smaller dimension, and are used to search for patterns in the data.
In this article we will see how to solve the clustering problem with help of the Kohonen network and will build self organizing map.
The main element of the network is the Kohonen layer consists of a number of linear elements has m inputs. Every layer get on input the $x = (x_1, x_2, ... , x_n)$ vector from input data. The output of every layer is $$y_j = \sum_i^n w_{ij}x_i $$
After the $y$ of each neuron is calculated, the winner’s neuron will be determined according to the “winner takes all” rule. The max $$y_{max} = argmax\{y_j\}$$ is searched among all and then the output of such a neuron will be $1$, all other outputs will be $0$. If the max is reached in several neurons:
Inizialization
The most popular ways to set the initial node weights are:
As a result, we get $M_k$ is the map of neurons. $k$- neurons, their count is sets by the an analytics.
$N$ - number of input data.
Trainning
Initializing $t=0$ is it number of iteration and shuffling input data.
Аmong all the neurons, it is determined closest to the incoming vector $d_{min} = argmin\{d_i\}$.The neuron associated to the $d_{min}$ will be the winner. If $d_{min}$ is reached at several neurons the winner will chosen randomly. $m_w$ is winner neuron.
Kohonen maps, unlike networks, use the "Winner Takes Most" algorithm in training. In this way the weights of not only the neuron of the winner, but also of topologically close neurons are updated.
Change weights.
Calculate $m_i(t) = m_i(t-1) + h_i(t) (x(t) - m_i(t-1)), i = 1,2,..., k$.
Update the weights of all neurons that are neighbors of the winner's neuron. Increase $t$ and repeat learning.
Training continues until $t < N$ or until the error becomes small.
Self-organizing maps uses in data mining like a text analysis, financial statement analysis or image analysis.
The advantages of self-organizing cards: - Dimensionality reduction. - Topological modeling of the training set. - Resistance to outliers and missed data. - Simple visualization
Visualization of the work of the self-organizing card.
Lets see small example.
First we should install the 'SOMPY' library. The "SOMPY" does not have an official documentation.
!pip3 install git+https://github.com/compmonks/SOMPY.git
Also you may need to install ipdb.
!pip3 install ipdb
Import all necessary libraries.
import matplotlib.pylab as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
from time import time
import numpy as np
import pandas as pd
import sompy
np.random.seed(17)
Creating a "toy" dataset.
data_len = 200
data_frame_1 = pd.DataFrame(data=np.random.rand(data_len, 4))
data_frame_1.values[:, 1] = (
data_frame_1.values[:, 1] + 0.42 * np.random.rand(data_len, 1)
)[:, 0]
data_frame_2 = pd.DataFrame(data=np.random.rand(data_len, 4) + 1)
data_frame_2.values[:, 1] = (
-1 * data_frame_2.values[:, 1] + 0.62 * np.random.rand(data_len, 1)
)[:, 0]
data_frame_3 = pd.DataFrame(data=np.random.rand(data_len, 4) + 2)
data_frame_3.values[:, 1] = (
0.5 * data_frame_3.values[:, 1] + 1 * np.random.rand(data_len, 1)
)[:, 0]
data_frame_4 = pd.DataFrame(data=np.random.rand(data_len, 4) + 3.5)
data_frame_4.values[:, 1] = (
-0.1 * data_frame_4.values[:, 1] + 0.5 * np.random.rand(data_len, 1)
)[:, 0]
data_full = np.concatenate((data_frame_1, data_frame_2, data_frame_3, data_frame_4))
fig = plt.figure()
plt.plot(data_full[:, 0], data_full[:, 1], "ob", alpha=0.2, markersize=4)
fig.set_size_inches(7, 7)
data_full
First we need to set the size of the map, the set of toy data is small, so first we will set the small size of the map.
mapsize = [2, 2]
The build
method from SOMFactory creates self organizing map, give it the size of the map and the data. the method takes the size of the map and the data.
initialization='random'
is a type of initial node weights, the random values to all weights.
som = sompy.SOMFactory.build(data_full, mapsize, initialization="random")
som.train(n_job=1, verbose="info")
For visualizaion used mapview.View2DPacked.
v = sompy.mapview.View2DPacked(10, 10, "example", text_size=8)
v.show(som)
The som could recognize four clusters. Although the scope of the cluster are far from ideal.
The "cluster" method is using sklearn.Kmeans for predict clusters on the raw data.
v = sompy.mapview.View2DPacked(5, 5, "test", text_size=8)
som.cluster(n_clusters=4)
som.cluster_labels
v.show(som, what="cluster");
Let's look at the visualization of clusters on the grid. For this use HitMapView.
h = sompy.hitmap.HitMapView(8, 8, "hitmap", text_size=8, show_text=True)
h.show(som);
The grid of self organizing map have a two types: - square grid - hexagonal grid
Now we will create a new SOM and add some arguments for best result.
Increasing map size.
mapsize = [20, 20]
lattice='rect'
is a square grid of SOM.
normalization='var'
is the type of normalization of the input data. 'var' is t-statistic.
$$\frac{X-\bar{X}}{s}$$
initialization='pca'
is a type of initial node weights, principal component initialization.
neighborhood='gaussian'
use the 'gaussian' function for "measure of neighborhood".
som = sompy.SOMFactory.build(
data_full,
mapsize,
lattice="rect",
normalization="var",
initialization="random",
neighborhood="gaussian",
)
som.train(n_job=1, verbose="info")
v = sompy.mapview.View2DPacked(10, 10, "example", text_size=8)
v.show(som)
v = sompy.mapview.View2DPacked(5, 5, "test", text_size=8)
som.cluster(n_clusters=4)
som.cluster_labels
h = sompy.hitmap.HitMapView(8, 8, "hitmap", text_size=8, show_text=True)
h.show(som);
Now let's use the SOM for the Iris Dataset
from sklearn import datasets
iris = datasets.load_iris()
iris.target_names
mapsize = [20, 20]
iris.target
%%time
som = sompy.SOMFactory.build(
iris.data,
mapsize,
lattice="rect",
normalization="var",
initialization="random",
neighborhood="gaussian",
)
som.train(n_job=1, verbose=False)
v = sompy.mapview.View2DPacked(10, 10, "iris", text_size=8)
v.show(som, which_dim=[0, 1, 2])
The raw data.
view2D = sompy.mapview.View2D(10, 10, "Iris_raw_data", text_size=8)
view2D.show(som, col_sz=4, which_dim="all", desnormalize=True)
After training, SOM separates four distinct clusters, which is true.
iris.data.shape
Visualization of a grid.
v = sompy.mapview.View2DPacked(5, 5, "test", text_size=8)
som.cluster(n_clusters=3)
som.cluster_labels
h = sompy.hitmap.HitMapView(8, 8, "hitmap_iris", text_size=8, show_text=True)
h.show(som,);
Also we can build the U-matrix. Use umatrix.UMatrixView for visualization.
u = sompy.umatrix.UMatrixView(20, 20, "umatrix")
UMAT = u.build_u_matrix(som)
UMAT = u.show(som)
Unfortunately, it’s impossible to consider the example of a hexagonal grid, because the library does not have the corresponding implementation. Also normalization='var'
is only one implementation of the normalization.
Kohonen self-organizing maps solve many issues and are a powerful tool for data analysis. In this article, we learned the principle of the SOM, as well as considered small examples of clustering and data visualization. But at the moment, the SOM is losing its popularity in favor of other algorithms.