Self Organizing Map (SOM)

Notebook ini berdasarkan kursus Deep Learning A-Z™: Hands-On Artificial Neural Networks di Udemy. Lihat Kursus.

Informasi Notebook

  • notebook name: taruma_udemy_som
  • notebook version/date: 1.0.1/20190729
  • notebook server: Google Colab
  • python version: 3.6
In [2]:
#### NOTEBOOK DESCRIPTION

from datetime import datetime

NOTEBOOK_TITLE = 'taruma_udemy_som'
NOTEBOOK_VERSION = '1.0.0'
NOTEBOOK_DATE = 1 # Set 1, if you want add date classifier

NOTEBOOK_NAME = "{}_{}".format(
    NOTEBOOK_TITLE, 
    NOTEBOOK_VERSION.replace('.','_')
)
PROJECT_NAME = "{}_{}{}".format(
    NOTEBOOK_TITLE, 
    NOTEBOOK_VERSION.replace('.','_'), 
    "_" + datetime.utcnow().strftime("%Y%m%d_%H%M") if NOTEBOOK_DATE else ""
)

print(f"Nama Notebook: {NOTEBOOK_NAME}")
print(f"Nama Proyek: {PROJECT_NAME}")
Nama Notebook: taruma_udemy_som_1_0_0
Nama Proyek: taruma_udemy_som_1_0_0_20190729_1028
In [3]:
#### System Version
import sys
print("Versi python: {}".format(sys.version))
Versi python: 3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
In [0]:
#### Load Notebook Extensions
%load_ext google.colab.data_table
In [6]:
#### Download dataset
!wget -O SOM.zip "https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P16-Self-Organizing-Maps.zip"
!unzip SOM.zip
--2019-07-29 10:28:35--  https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P16-Self-Organizing-Maps.zip
Resolving sds-platform-private.s3-us-east-2.amazonaws.com (sds-platform-private.s3-us-east-2.amazonaws.com)... 52.219.96.64
Connecting to sds-platform-private.s3-us-east-2.amazonaws.com (sds-platform-private.s3-us-east-2.amazonaws.com)|52.219.96.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17004 (17K) [application/zip]
Saving to: ‘SOM.zip’

SOM.zip             100%[===================>]  16.61K  --.-KB/s    in 0s      

2019-07-29 10:28:36 (124 MB/s) - ‘SOM.zip’ saved [17004/17004]

Archive:  SOM.zip
   creating: Self_Organizing_Maps/
  inflating: Self_Organizing_Maps/Credit_Card_Applications.csv  
   creating: __MACOSX/
   creating: __MACOSX/Self_Organizing_Maps/
  inflating: __MACOSX/Self_Organizing_Maps/._Credit_Card_Applications.csv  
  inflating: Self_Organizing_Maps/minisom.py  
  inflating: Self_Organizing_Maps/som.py  
  inflating: __MACOSX/Self_Organizing_Maps/._som.py  
In [0]:
#### Atur dataset path
DATASET_DIRECTORY = 'Self_Organizing_Maps/'

STEP 1

In [0]:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [9]:
# Importing the dataset
# Dataset http://archive.ics.uci.edu/ml/datasets/statlog+(australian+credit+approval)
dataset = pd.read_csv(DATASET_DIRECTORY + 'Credit_Card_Applications.csv')
dataset
Out[9]:
CustomerID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 Class
0 15776156 1 22.08 11.460 2 4 4 1.585 0 0 0 1 2 100 1213 0
1 15739548 0 22.67 7.000 2 8 4 0.165 0 0 0 0 2 160 1 0
2 15662854 0 29.58 1.750 1 4 4 1.250 0 0 0 1 2 280 1 0
3 15687688 0 21.67 11.500 1 5 3 0.000 1 1 11 1 2 0 1 1
4 15715750 1 20.17 8.170 2 6 4 1.960 1 1 14 0 2 60 159 1
5 15571121 0 15.83 0.585 2 8 8 1.500 1 1 2 0 2 100 1 1
6 15726466 1 17.42 6.500 2 3 4 0.125 0 0 0 0 2 60 101 0
7 15660390 0 58.67 4.460 2 11 8 3.040 1 1 6 0 2 43 561 1
8 15663942 1 27.83 1.000 1 2 8 3.000 0 0 0 0 2 176 538 0
9 15638610 0 55.75 7.080 2 4 8 6.750 1 1 3 1 2 100 51 0
10 15644446 1 33.50 1.750 2 14 8 4.500 1 1 4 1 2 253 858 1
11 15585892 1 41.42 5.000 2 11 8 5.000 1 1 6 1 2 470 1 1
12 15609356 1 20.67 1.250 1 8 8 1.375 1 1 3 1 2 140 211 0
13 15803378 1 34.92 5.000 2 14 8 7.500 1 1 6 1 2 0 1001 1
14 15599440 1 58.58 2.710 2 8 4 2.415 0 0 0 1 2 320 1 0
15 15692408 1 48.08 6.040 2 4 4 0.040 0 0 0 0 2 0 2691 1
16 15683168 1 29.58 4.500 2 9 4 7.500 1 1 2 1 2 330 1 1
17 15790254 0 18.92 9.000 2 6 4 0.750 1 1 2 0 2 88 592 1
18 15767729 1 20.00 1.250 1 4 4 0.125 0 0 0 0 2 140 5 0
19 15768600 0 22.42 5.665 2 11 4 2.585 1 1 7 0 2 129 3258 1
20 15699839 0 28.17 0.585 2 6 4 0.040 0 0 0 0 2 260 1005 0
21 15786237 0 19.17 0.585 1 6 4 0.585 1 0 0 1 2 160 1 0
22 15694530 1 41.17 1.335 2 2 4 0.165 0 0 0 0 2 168 1 0
23 15796813 1 41.58 1.750 2 4 4 0.210 1 0 0 0 2 160 1 0
24 15605791 1 19.50 9.585 2 6 4 0.790 0 0 0 0 2 80 351 0
25 15714087 1 32.75 1.500 2 13 8 5.500 1 1 3 1 2 0 1 1
26 15711446 1 22.50 0.125 1 4 4 0.125 0 0 0 0 2 200 71 0
27 15588123 1 33.17 3.040 1 8 8 2.040 1 1 1 1 2 180 18028 1
28 15748552 0 30.67 12.000 2 8 4 2.000 1 1 1 0 2 220 20 1
29 15618410 1 23.08 2.500 2 8 4 1.085 1 1 11 1 2 60 2185 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
660 15598586 1 26.67 2.710 1 13 4 5.250 1 1 1 0 2 211 1 1
661 15665014 0 22.50 0.415 2 3 4 0.335 0 0 0 1 1 144 1 0
662 15701738 1 39.92 0.540 1 6 4 0.500 1 1 3 0 2 200 1001 1
663 15650591 0 26.08 8.665 2 6 4 1.415 1 0 0 0 2 160 151 1
664 15652667 1 20.00 0.000 2 2 4 0.500 0 0 0 0 2 144 1 0
665 15679622 1 31.57 4.000 1 3 4 0.085 0 0 0 1 2 411 1 0
666 15730150 1 26.75 4.500 1 8 5 2.500 0 0 0 0 2 200 1211 0
667 15813192 0 24.92 1.250 2 1 1 0.000 1 0 0 0 2 80 1 0
668 15606554 0 32.25 1.500 2 8 4 0.250 0 0 0 1 2 372 123 0
669 15611794 1 17.67 4.460 2 8 4 0.250 0 0 0 0 1 80 1 0
670 15672357 0 37.75 5.500 2 11 4 0.125 1 0 0 1 2 228 1 1
671 15711759 1 22.67 2.540 1 8 8 2.585 1 0 0 0 2 0 1 1
672 15615296 0 17.92 10.210 2 1 1 0.000 0 0 0 0 2 0 51 0
673 15699294 1 24.42 12.335 2 11 8 1.585 1 0 0 1 2 120 1 1
674 15788634 0 25.75 0.500 2 8 8 0.875 1 0 0 1 2 491 1 1
675 15660871 1 26.17 12.500 1 4 8 1.250 0 0 0 1 2 0 18 0
676 15618258 0 22.75 6.165 2 6 4 0.165 0 0 0 0 2 220 1001 0
677 15722535 1 23.00 0.750 2 7 4 0.500 1 0 0 1 1 320 1 0
678 15711977 1 25.67 0.290 1 8 4 1.500 0 0 0 1 2 160 1 0
679 15690169 1 48.58 0.205 1 4 4 0.250 1 1 11 0 2 380 2733 1
680 15790689 1 21.17 0.000 2 8 4 0.500 0 0 0 1 1 0 1 0
681 15665181 1 35.25 16.500 1 8 4 4.000 1 0 0 0 2 80 1 0
682 15633608 0 22.92 11.585 2 13 4 0.040 1 0 0 0 2 80 1350 1
683 15805261 0 48.17 1.335 2 3 7 0.335 0 0 0 0 2 0 121 0
684 15740356 1 43.00 0.290 1 13 8 1.750 1 1 8 0 2 100 376 1
685 15808223 1 31.57 10.500 2 14 4 6.500 1 0 0 0 2 0 1 1
686 15769980 1 20.67 0.415 2 8 4 0.125 0 0 0 0 2 0 45 0
687 15675450 0 18.83 9.540 2 6 4 0.085 1 0 0 0 2 100 1 1
688 15776494 0 27.42 14.500 2 14 8 3.085 1 1 1 0 2 120 12 1
689 15592412 1 41.00 0.040 2 10 4 0.040 0 1 1 0 1 560 1 1

690 rows × 16 columns

In [17]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
print(f"X Dimension = {X.shape}")
print(f"y Dimension = {y.shape}")
X Dimension = (690, 15)
y Dimension = (690,)
In [0]:
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range=(0, 1))
X = sc.fit_transform(X)

STEP 2

In [0]:
# Training the SOM

# # Import module "minisom.py" from DATASET_DIRECTORY
# # ref: https://stackoverflow.com/a/67692/4886384
# MODULE_PATH = DATASET_DIRECTORY + 'minisom.py'
# import importlib.util
# spec = importlib.util.spec_from_file_location('minisom', MODULE_PATH)
# minisom = importlib.util.module_from_spec(spec)
# spec.loader.exec_module(minisom)
In [13]:
# Or using pip install (recent version minisom)
!pip install minisom
Collecting minisom
  Downloading https://files.pythonhosted.org/packages/1a/87/97de0d9ad8b89b809bd00e83a95a50883a400a9d84ff4d696a68ec9f8010/MiniSom-2.1.8.tar.gz
Building wheels for collected packages: minisom
  Building wheel for minisom (setup.py) ... done
  Stored in directory: /root/.cache/pip/wheels/5c/46/22/d2aaf936c144c0ca6cdda2014ce9a763baff5ff8bf158b67fd
Successfully built minisom
Installing collected packages: minisom
Successfully installed minisom-2.1.8
In [0]:
# Training the SOM
from minisom import MiniSom
som = MiniSom(x=10, y=10, input_len=15, sigma=1.0, learning_rate=0.5)
In [0]:
som.random_weights_init(X)
som.train_random(data=X, num_iteration=100)

STEP 3

In [0]:
# Visualizing the results
from pylab import bone, pcolor, colorbar, plot, show
In [31]:
from pylab import rcParams
rcParams['figure.figsize'] = 15, 10
bone()
pcolor(som.distance_map().T)
colorbar()
markers = ['o', 's']
colors = ['r', 'g']
for i, x in enumerate(X):
    w = som.winner(x)
    plot(w[0] + 0.5,
         w[1] + 0.5,
         markers[y[i]],
         markeredgecolor=colors[y[i]],
         markerfacecolor='None',
         markersize=10,
         markeredgewidth=2)
show()

STEP 4

In [0]:
# Finding the frauds
mappings = som.win_map(X)
In [37]:
frauds = np.concatenate((mappings[(3, 3)], mappings[(8, 8)]), axis=0)
pd.DataFrame(frauds)
Out[37]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 15683168.0 1.0 29.58 4.500 2.0 9.0 4.0 7.500 1.0 1.0 2.0 1.0 2.0 330.0 1.0
1 15682860.0 1.0 27.83 1.500 2.0 9.0 4.0 2.000 1.0 1.0 11.0 1.0 2.0 434.0 36.0
2 15684512.0 1.0 42.50 4.915 1.0 9.0 4.0 3.165 1.0 0.0 0.0 1.0 2.0 52.0 1443.0
3 15682576.0 0.0 32.17 1.460 2.0 9.0 4.0 1.085 1.0 1.0 16.0 0.0 2.0 120.0 2080.0
4 15682686.0 0.0 31.25 3.750 2.0 13.0 8.0 0.625 1.0 1.0 9.0 1.0 2.0 181.0 1.0
5 15684722.0 0.0 27.67 1.500 2.0 7.0 4.0 2.000 1.0 0.0 0.0 0.0 1.0 368.0 1.0
6 15682540.0 1.0 62.50 12.750 1.0 8.0 8.0 5.000 1.0 0.0 0.0 0.0 2.0 112.0 1.0
7 15684440.0 1.0 33.67 2.165 2.0 8.0 4.0 1.500 0.0 0.0 0.0 0.0 3.0 120.0 1.0
8 15683993.0 1.0 16.00 3.125 2.0 9.0 4.0 0.085 0.0 1.0 1.0 0.0 2.0 0.0 7.0
9 15683276.0 1.0 29.67 0.750 1.0 8.0 4.0 0.040 0.0 0.0 0.0 0.0 2.0 240.0 1.0
10 15661412.0 1.0 48.75 8.500 2.0 8.0 8.0 12.500 1.0 1.0 9.0 0.0 2.0 181.0 1656.0
11 15662152.0 0.0 29.75 0.665 2.0 9.0 4.0 0.250 0.0 0.0 0.0 1.0 2.0 300.0 1.0
12 15662189.0 0.0 28.58 3.750 2.0 8.0 4.0 0.250 0.0 1.0 1.0 1.0 2.0 40.0 155.0