Self-Organizing Map (SOM) - Mega Case Study

Notebook ini berdasarkan kursus Deep Learning A-Z™: Hands-On Artificial Neural Networks di Udemy. Lihat Kursus.

Informasi Notebook

  • notebook name: taruma_udemy_som_megacasestudy
  • notebook version/date: 1.0.0/20190729
  • notebook server: Google Colab
  • python version: 3.6
  • keras version: 2.2.4
In [1]:
#### NOTEBOOK DESCRIPTION

from datetime import datetime

NOTEBOOK_TITLE = 'taruma_udemy_som_megacasestudy'
NOTEBOOK_VERSION = '1.0.0'
NOTEBOOK_DATE = 1 # Set 1, if you want add date classifier

NOTEBOOK_NAME = "{}_{}".format(
    NOTEBOOK_TITLE, 
    NOTEBOOK_VERSION.replace('.','_')
)
PROJECT_NAME = "{}_{}{}".format(
    NOTEBOOK_TITLE, 
    NOTEBOOK_VERSION.replace('.','_'), 
    "_" + datetime.utcnow().strftime("%Y%m%d_%H%M") if NOTEBOOK_DATE else ""
)

print(f"Nama Notebook: {NOTEBOOK_NAME}")
print(f"Nama Proyek: {PROJECT_NAME}")
Nama Notebook: taruma_udemy_som_megacasestudy_1_0_0
Nama Proyek: taruma_udemy_som_megacasestudy_1_0_0_20190729_1110
In [41]:
#### System Version
import sys
import keras
print("versi python: {}".format(sys.version))
print("versi keras: {}".format(keras.__version__))
versi python: 3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
versi keras: 2.2.4
In [0]:
#### Load Notebook Extensions
%load_ext google.colab.data_table
In [3]:
#### Download dataset
!wget -O SOM_megacase.zip "https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P16-Mega-Case-Study.zip"
!unzip SOM_megacase.zip
--2019-07-29 11:33:46--  https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P16-Mega-Case-Study.zip
Resolving sds-platform-private.s3-us-east-2.amazonaws.com (sds-platform-private.s3-us-east-2.amazonaws.com)... 52.219.100.144
Connecting to sds-platform-private.s3-us-east-2.amazonaws.com (sds-platform-private.s3-us-east-2.amazonaws.com)|52.219.100.144|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20790 (20K) [application/zip]
Saving to: ‘SOM_megacase.zip’

SOM_megacase.zip    100%[===================>]  20.30K  --.-KB/s    in 0.02s   

2019-07-29 11:33:46 (897 KB/s) - ‘SOM_megacase.zip’ saved [20790/20790]

Archive:  SOM_megacase.zip
   creating: Mega_Case_Study/
  inflating: Mega_Case_Study/.DS_Store  
   creating: __MACOSX/
   creating: __MACOSX/Mega_Case_Study/
  inflating: __MACOSX/Mega_Case_Study/._.DS_Store  
  inflating: Mega_Case_Study/ann.py  
  inflating: __MACOSX/Mega_Case_Study/._ann.py  
  inflating: Mega_Case_Study/Credit_Card_Applications.csv  
  inflating: __MACOSX/Mega_Case_Study/._Credit_Card_Applications.csv  
  inflating: Mega_Case_Study/mega_case_study.py  
  inflating: __MACOSX/Mega_Case_Study/._mega_case_study.py  
  inflating: Mega_Case_Study/minisom.py  
  inflating: Mega_Case_Study/som.py  
  inflating: __MACOSX/Mega_Case_Study/._som.py  
In [0]:
#### Atur dataset path
DATASET_DIRECTORY = 'Mega_Case_Study/'

STEP 1-2

In [0]:
# Mega Case Study - Make a Hybrid Deep Learning Model
# Part 1 - Identifying the Frauds with the Self-Organizing Map
In [12]:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Importing the dataset
# Dataset http://archive.ics.uci.edu/ml/datasets/statlog+(australian+credit+approval)
dataset = pd.read_csv(DATASET_DIRECTORY + 'Credit_Card_Applications.csv')
dataset
Out[12]:
CustomerID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 Class
0 15776156 1 22.08 11.460 2 4 4 1.585 0 0 0 1 2 100 1213 0
1 15739548 0 22.67 7.000 2 8 4 0.165 0 0 0 0 2 160 1 0
2 15662854 0 29.58 1.750 1 4 4 1.250 0 0 0 1 2 280 1 0
3 15687688 0 21.67 11.500 1 5 3 0.000 1 1 11 1 2 0 1 1
4 15715750 1 20.17 8.170 2 6 4 1.960 1 1 14 0 2 60 159 1
5 15571121 0 15.83 0.585 2 8 8 1.500 1 1 2 0 2 100 1 1
6 15726466 1 17.42 6.500 2 3 4 0.125 0 0 0 0 2 60 101 0
7 15660390 0 58.67 4.460 2 11 8 3.040 1 1 6 0 2 43 561 1
8 15663942 1 27.83 1.000 1 2 8 3.000 0 0 0 0 2 176 538 0
9 15638610 0 55.75 7.080 2 4 8 6.750 1 1 3 1 2 100 51 0
10 15644446 1 33.50 1.750 2 14 8 4.500 1 1 4 1 2 253 858 1
11 15585892 1 41.42 5.000 2 11 8 5.000 1 1 6 1 2 470 1 1
12 15609356 1 20.67 1.250 1 8 8 1.375 1 1 3 1 2 140 211 0
13 15803378 1 34.92 5.000 2 14 8 7.500 1 1 6 1 2 0 1001 1
14 15599440 1 58.58 2.710 2 8 4 2.415 0 0 0 1 2 320 1 0
15 15692408 1 48.08 6.040 2 4 4 0.040 0 0 0 0 2 0 2691 1
16 15683168 1 29.58 4.500 2 9 4 7.500 1 1 2 1 2 330 1 1
17 15790254 0 18.92 9.000 2 6 4 0.750 1 1 2 0 2 88 592 1
18 15767729 1 20.00 1.250 1 4 4 0.125 0 0 0 0 2 140 5 0
19 15768600 0 22.42 5.665 2 11 4 2.585 1 1 7 0 2 129 3258 1
20 15699839 0 28.17 0.585 2 6 4 0.040 0 0 0 0 2 260 1005 0
21 15786237 0 19.17 0.585 1 6 4 0.585 1 0 0 1 2 160 1 0
22 15694530 1 41.17 1.335 2 2 4 0.165 0 0 0 0 2 168 1 0
23 15796813 1 41.58 1.750 2 4 4 0.210 1 0 0 0 2 160 1 0
24 15605791 1 19.50 9.585 2 6 4 0.790 0 0 0 0 2 80 351 0
25 15714087 1 32.75 1.500 2 13 8 5.500 1 1 3 1 2 0 1 1
26 15711446 1 22.50 0.125 1 4 4 0.125 0 0 0 0 2 200 71 0
27 15588123 1 33.17 3.040 1 8 8 2.040 1 1 1 1 2 180 18028 1
28 15748552 0 30.67 12.000 2 8 4 2.000 1 1 1 0 2 220 20 1
29 15618410 1 23.08 2.500 2 8 4 1.085 1 1 11 1 2 60 2185 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
660 15598586 1 26.67 2.710 1 13 4 5.250 1 1 1 0 2 211 1 1
661 15665014 0 22.50 0.415 2 3 4 0.335 0 0 0 1 1 144 1 0
662 15701738 1 39.92 0.540 1 6 4 0.500 1 1 3 0 2 200 1001 1
663 15650591 0 26.08 8.665 2 6 4 1.415 1 0 0 0 2 160 151 1
664 15652667 1 20.00 0.000 2 2 4 0.500 0 0 0 0 2 144 1 0
665 15679622 1 31.57 4.000 1 3 4 0.085 0 0 0 1 2 411 1 0
666 15730150 1 26.75 4.500 1 8 5 2.500 0 0 0 0 2 200 1211 0
667 15813192 0 24.92 1.250 2 1 1 0.000 1 0 0 0 2 80 1 0
668 15606554 0 32.25 1.500 2 8 4 0.250 0 0 0 1 2 372 123 0
669 15611794 1 17.67 4.460 2 8 4 0.250 0 0 0 0 1 80 1 0
670 15672357 0 37.75 5.500 2 11 4 0.125 1 0 0 1 2 228 1 1
671 15711759 1 22.67 2.540 1 8 8 2.585 1 0 0 0 2 0 1 1
672 15615296 0 17.92 10.210 2 1 1 0.000 0 0 0 0 2 0 51 0
673 15699294 1 24.42 12.335 2 11 8 1.585 1 0 0 1 2 120 1 1
674 15788634 0 25.75 0.500 2 8 8 0.875 1 0 0 1 2 491 1 1
675 15660871 1 26.17 12.500 1 4 8 1.250 0 0 0 1 2 0 18 0
676 15618258 0 22.75 6.165 2 6 4 0.165 0 0 0 0 2 220 1001 0
677 15722535 1 23.00 0.750 2 7 4 0.500 1 0 0 1 1 320 1 0
678 15711977 1 25.67 0.290 1 8 4 1.500 0 0 0 1 2 160 1 0
679 15690169 1 48.58 0.205 1 4 4 0.250 1 1 11 0 2 380 2733 1
680 15790689 1 21.17 0.000 2 8 4 0.500 0 0 0 1 1 0 1 0
681 15665181 1 35.25 16.500 1 8 4 4.000 1 0 0 0 2 80 1 0
682 15633608 0 22.92 11.585 2 13 4 0.040 1 0 0 0 2 80 1350 1
683 15805261 0 48.17 1.335 2 3 7 0.335 0 0 0 0 2 0 121 0
684 15740356 1 43.00 0.290 1 13 8 1.750 1 1 8 0 2 100 376 1
685 15808223 1 31.57 10.500 2 14 4 6.500 1 0 0 0 2 0 1 1
686 15769980 1 20.67 0.415 2 8 4 0.125 0 0 0 0 2 0 45 0
687 15675450 0 18.83 9.540 2 6 4 0.085 1 0 0 0 2 100 1 1
688 15776494 0 27.42 14.500 2 14 8 3.085 1 1 1 0 2 120 12 1
689 15592412 1 41.00 0.040 2 10 4 0.040 0 1 1 0 1 560 1 1

690 rows × 16 columns

In [13]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
print(f"X Dimension = {X.shape}")
print(f"y Dimension = {y.shape}")
X Dimension = (690, 15)
y Dimension = (690,)
In [0]:
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range=(0, 1))
X = sc.fit_transform(X)
In [15]:
# Or using pip install (recent version minisom)
!pip install minisom
Requirement already satisfied: minisom in /usr/local/lib/python3.6/dist-packages (2.1.8)
In [16]:
# Training the SOM
from minisom import MiniSom
som = MiniSom(x=10, y=10, input_len=15, sigma=1.0, learning_rate=0.5)
som.random_weights_init(X)
som.train_random(data=X, num_iteration=100)

# Visualizing the results
from pylab import bone, pcolor, colorbar, plot, show
from pylab import rcParams
rcParams['figure.figsize'] = 15, 10
bone()
pcolor(som.distance_map().T)
colorbar()
markers = ['o', 's']
colors = ['r', 'g']
for i, x in enumerate(X):
    w = som.winner(x)
    plot(w[0] + 0.5,
         w[1] + 0.5,
         markers[y[i]],
         markeredgecolor=colors[y[i]],
         markerfacecolor='None',
         markersize=10,
         markeredgewidth=2)
show()
In [19]:
# Finding the frauds
mappings = som.win_map(X)
frauds = np.concatenate((mappings[(3, 1)], mappings[(5, 3)]), axis=0)
frauds = sc.inverse_transform(frauds)
pd.DataFrame(frauds)
Out[19]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 15790254.0 0.0 18.92 9.000 2.0 6.0 4.0 0.750 1.0 1.0 2.0 0.0 2.0 88.0 592.0
1 15768600.0 0.0 22.42 5.665 2.0 11.0 4.0 2.585 1.0 1.0 7.0 0.0 2.0 129.0 3258.0
2 15748552.0 0.0 30.67 12.000 2.0 8.0 4.0 2.000 1.0 1.0 1.0 0.0 2.0 220.0 20.0
3 15757467.0 0.0 52.83 15.000 2.0 8.0 4.0 5.500 1.0 1.0 14.0 0.0 2.0 0.0 2201.0
4 15682576.0 0.0 32.17 1.460 2.0 9.0 4.0 1.085 1.0 1.0 16.0 0.0 2.0 120.0 2080.0
5 15801441.0 0.0 35.75 0.915 2.0 6.0 4.0 0.750 1.0 1.0 4.0 0.0 2.0 0.0 1584.0
6 15815443.0 0.0 57.08 19.500 2.0 8.0 4.0 5.500 1.0 1.0 7.0 0.0 2.0 0.0 3001.0
7 15748432.0 0.0 58.33 10.000 2.0 11.0 4.0 4.000 1.0 1.0 14.0 0.0 2.0 0.0 1603.0
8 15708714.0 0.0 18.75 7.500 2.0 11.0 4.0 2.710 1.0 1.0 5.0 0.0 2.0 184.0 26727.0
9 15788131.0 0.0 29.50 0.460 2.0 4.0 4.0 0.540 1.0 1.0 4.0 0.0 2.0 380.0 501.0
10 15771856.0 0.0 24.50 12.750 2.0 8.0 5.0 4.750 1.0 1.0 2.0 0.0 2.0 73.0 445.0
11 15679394.0 0.0 36.00 1.000 2.0 8.0 4.0 2.000 1.0 1.0 11.0 0.0 2.0 0.0 457.0
12 15720644.0 0.0 40.33 7.540 1.0 11.0 8.0 8.000 1.0 1.0 14.0 0.0 2.0 0.0 2301.0
13 15808023.0 0.0 28.17 0.375 2.0 11.0 4.0 0.585 1.0 1.0 4.0 0.0 2.0 80.0 1.0
14 15795079.0 0.0 19.17 8.585 2.0 13.0 8.0 0.750 1.0 1.0 7.0 0.0 2.0 96.0 1.0
15 15808386.0 0.0 22.50 8.500 2.0 11.0 4.0 1.750 1.0 1.0 10.0 0.0 2.0 80.0 991.0
16 15746258.0 0.0 24.08 0.500 2.0 11.0 8.0 1.250 1.0 1.0 1.0 0.0 2.0 0.0 679.0
17 15764841.0 0.0 20.42 7.500 2.0 4.0 4.0 1.500 1.0 1.0 1.0 0.0 2.0 160.0 235.0
18 15748649.0 0.0 21.25 2.335 2.0 3.0 5.0 0.500 1.0 1.0 4.0 0.0 1.0 80.0 1.0
19 15729718.0 0.0 21.50 6.000 2.0 6.0 4.0 2.500 1.0 1.0 3.0 0.0 2.0 80.0 919.0
20 15786539.0 0.0 20.67 1.835 2.0 11.0 4.0 2.085 1.0 1.0 5.0 0.0 2.0 220.0 2504.0
21 15773776.0 0.0 35.42 12.000 2.0 11.0 8.0 14.000 1.0 1.0 8.0 0.0 2.0 0.0 6591.0
22 15778345.0 0.0 17.83 11.000 2.0 14.0 8.0 1.000 1.0 1.0 11.0 0.0 2.0 0.0 3001.0
23 15700511.0 0.0 28.50 3.040 1.0 14.0 8.0 2.540 1.0 1.0 1.0 0.0 2.0 70.0 1.0
24 15791769.0 0.0 26.92 13.500 2.0 11.0 8.0 5.000 1.0 1.0 2.0 0.0 2.0 0.0 5001.0
25 15776494.0 0.0 27.42 14.500 2.0 14.0 8.0 3.085 1.0 1.0 1.0 0.0 2.0 120.0 12.0
26 15780088.0 1.0 34.50 4.040 1.0 3.0 5.0 8.500 1.0 1.0 7.0 1.0 2.0 195.0 1.0
27 15774262.0 1.0 29.25 14.790 2.0 6.0 4.0 5.040 1.0 1.0 5.0 1.0 2.0 168.0 1.0
28 15799785.0 1.0 56.42 28.000 1.0 8.0 4.0 28.500 1.0 1.0 40.0 0.0 2.0 0.0 16.0
29 15750921.0 1.0 49.50 7.585 2.0 3.0 5.0 7.585 1.0 1.0 15.0 1.0 2.0 0.0 5001.0
30 15728010.0 1.0 60.08 14.500 2.0 1.0 1.0 18.000 1.0 1.0 15.0 1.0 2.0 0.0 1001.0
31 15689268.0 1.0 54.58 9.415 2.0 1.0 1.0 14.415 1.0 1.0 11.0 1.0 2.0 30.0 301.0
32 15744423.0 1.0 57.83 7.040 2.0 7.0 4.0 14.000 1.0 1.0 6.0 1.0 2.0 360.0 1333.0
33 15706394.0 1.0 56.58 18.500 2.0 2.0 5.0 15.000 1.0 1.0 17.0 1.0 2.0 0.0 1.0

STEP 3

In [22]:
# Part 2 - Going from Unsupervised to Supervised Deep Learning
# Creating the matrix of features
customers = dataset.iloc[:, 1:].values
print(f"customers.shape = {customers.shape}")
customers.shape = (690, 15)
In [24]:
# Creating the dependent variable
is_fraud = np.zeros(len(dataset))
is_fraud.shape
Out[24]:
(690,)
In [0]:
for i in range(len(dataset)):
    if dataset.iloc[i, 0] in frauds:
        is_fraud[i] = 1
In [0]:
# Artificial Neural Networks
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
customers = sc.fit_transform(customers)
In [30]:
# Importing Keras
from keras.models import Sequential
from keras.layers import Dense

classifier = Sequential()
classifier.add(Dense(
    units=2, kernel_initializer='uniform', activation='relu', input_dim=15)
)
classifier.add(Dense(
    units=1, kernel_initializer='uniform', activation='sigmoid')
)
classifier.compile(
    optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']
)
classifier.fit(customers, is_fraud, batch_size=1, epochs=2)
WARNING: Logging before flag parsing goes to stderr.
W0729 11:53:01.020379 140013657868160 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0729 11:53:01.070838 140013657868160 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0729 11:53:01.084968 140013657868160 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0729 11:53:01.126218 140013657868160 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0729 11:53:01.151633 140013657868160 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3376: The name tf.log is deprecated. Please use tf.math.log instead.

W0729 11:53:01.161502 140013657868160 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0729 11:53:01.394174 140013657868160 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

Epoch 1/2
690/690 [==============================] - 1s 2ms/step - loss: 0.4658 - acc: 0.9507
Epoch 2/2
690/690 [==============================] - 1s 1ms/step - loss: 0.1886 - acc: 0.9507
Out[30]:
<keras.callbacks.History at 0x7f5732d1cdd8>

STEP 4

In [40]:
y_pred = classifier.predict(customers)
y_pred = np.concatenate((dataset.iloc[:, 0:1].values, y_pred), axis=1)
y_pred = y_pred[y_pred[:, 1].argsort()]
pd.DataFrame(y_pred[:, :])
Out[40]:
0 1
0 15643056.0 0.007181
1 15679622.0 0.007770
2 15787229.0 0.007988
3 15600975.0 0.008000
4 15713983.0 0.008211
5 15676156.0 0.008303
6 15680901.0 0.009114
7 15711635.0 0.009177
8 15788215.0 0.009282
9 15701885.0 0.009657
10 15567860.0 0.009781
11 15604130.0 0.009802
12 15611409.0 0.009839
13 15648681.0 0.009874
14 15655658.0 0.009913
15 15759133.0 0.009932
16 15625311.0 0.009938
17 15569917.0 0.010062
18 15623210.0 0.010250
19 15713160.0 0.010322
20 15613673.0 0.010366
21 15670646.0 0.010382
22 15673747.0 0.010507
23 15711446.0 0.010557
24 15636767.0 0.010607
25 15597709.0 0.010625
26 15596165.0 0.010626
27 15713250.0 0.010711
28 15604196.0 0.010758
29 15724851.0 0.010953
... ... ...
660 15788131.0 0.234058
661 15815443.0 0.234516
662 15629133.0 0.234938
663 15720644.0 0.235818
664 15772941.0 0.236182
665 15621244.0 0.236320
666 15604963.0 0.236398
667 15748552.0 0.238563
668 15593694.0 0.238949
669 15762716.0 0.239106
670 15757467.0 0.240627
671 15786539.0 0.241843
672 15771856.0 0.243872
673 15679394.0 0.244407
674 15768600.0 0.245352
675 15682576.0 0.247701
676 15801441.0 0.255754
677 15687688.0 0.264728
678 15790254.0 0.266385
679 15647191.0 0.266743
680 15698749.0 0.269456
681 15729718.0 0.269790
682 15808386.0 0.271603
683 15664793.0 0.271966
684 15748649.0 0.283745
685 15696287.0 0.285063
686 15772329.0 0.288725
687 15708714.0 0.345585
688 15790113.0 0.346872
689 15598802.0 0.346872

690 rows × 2 columns