সব কাজেই আমি সাইকিট লার্ন ব্যবহার করতে পারি। মেশিন লার্নিং আর ডিপ লার্নিং এর মধ্যে ফারাক নেই বললেই চলে। শুরুতেই বিভিন্ন লাইব্রেরি লোড করে নিলাম। কাজ করবো ডিপ লার্নিং নিয়ে, কিন্তু শুরুতেই প্রি-প্রসেসিং করে নিচ্ছি সাইকিট লার্ন দিয়ে।
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
# প্রি-প্রসেসিং করে নিচ্ছি সাইকিট লার্ন দিয়ে
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
মনে আছে তো শূন্য থেকে পাইথন মেশিন লার্নিং বইটার কথা?
df = pd.read_csv('https://raw.githubusercontent.com/raqueeb/TensorFlow2/master/datasets/geoloc_elev.csv')
# আমরা শুধুমাত্র দুটো ফিচার ব্যবহার করছি এখানে
X = df[['lat', 'lon']].values
y = df['target'].values
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.3, random_state=0)
কি মনে হচ্ছে, হবে কী বেসিক মেশিন লার্নিং দিয়ে? ঠিক ধরেছেন, এটা একটা নন-লিনিয়ার ডেটা।
df.plot(kind='scatter',
x='lat',
y='lon',
c='target',
cmap='bwr');
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
model = Sequential([
Dense(1, input_dim=2, activation='sigmoid')
])
model.compile(SGD(lr=0.5), 'binary_crossentropy', metrics=['accuracy'])
h = model.fit(X_train, y_train, epochs=10, validation_split=0.1)
Train on 945 samples, validate on 105 samples Epoch 1/10 945/945 [==============================] - 0s 132us/sample - loss: 0.6640 - acc: 0.5608 - val_loss: 0.6553 - val_acc: 0.6381 Epoch 2/10 945/945 [==============================] - 0s 41us/sample - loss: 0.6448 - acc: 0.6582 - val_loss: 0.6549 - val_acc: 0.6381 Epoch 3/10 945/945 [==============================] - 0s 44us/sample - loss: 0.6443 - acc: 0.6582 - val_loss: 0.6567 - val_acc: 0.6381 Epoch 4/10 945/945 [==============================] - 0s 44us/sample - loss: 0.6436 - acc: 0.6582 - val_loss: 0.6539 - val_acc: 0.6381 Epoch 5/10 945/945 [==============================] - 0s 45us/sample - loss: 0.6435 - acc: 0.6582 - val_loss: 0.6550 - val_acc: 0.6381 Epoch 6/10 945/945 [==============================] - 0s 39us/sample - loss: 0.6442 - acc: 0.6582 - val_loss: 0.6544 - val_acc: 0.6381 Epoch 7/10 945/945 [==============================] - 0s 37us/sample - loss: 0.6442 - acc: 0.6582 - val_loss: 0.6538 - val_acc: 0.6381 Epoch 8/10 945/945 [==============================] - 0s 38us/sample - loss: 0.6449 - acc: 0.6582 - val_loss: 0.6539 - val_acc: 0.6381 Epoch 9/10 945/945 [==============================] - 0s 37us/sample - loss: 0.6438 - acc: 0.6582 - val_loss: 0.6543 - val_acc: 0.6381 Epoch 10/10 945/945 [==============================] - 0s 39us/sample - loss: 0.6433 - acc: 0.6582 - val_loss: 0.6577 - val_acc: 0.6381
pd.DataFrame(h.history).plot(ylim=(-0.05, 1.05))
<matplotlib.axes._subplots.AxesSubplot at 0x7f43286eeac8>
model = Sequential([
Dense(4, input_dim=2, activation='tanh'),
Dense(4, activation='tanh'),
Dense(1, activation='sigmoid')
])
model.compile(SGD(lr=0.5), 'binary_crossentropy', metrics=['accuracy'])
h = model.fit(X_train, y_train, epochs=20, validation_split=0.1)
Train on 945 samples, validate on 105 samples Epoch 1/20 945/945 [==============================] - 0s 137us/sample - loss: 0.6381 - acc: 0.6180 - val_loss: 0.6042 - val_acc: 0.6381 Epoch 2/20 945/945 [==============================] - 0s 38us/sample - loss: 0.4979 - acc: 0.8159 - val_loss: 0.3797 - val_acc: 0.9524 Epoch 3/20 945/945 [==============================] - 0s 39us/sample - loss: 0.2458 - acc: 0.9979 - val_loss: 0.1585 - val_acc: 1.0000 Epoch 4/20 945/945 [==============================] - 0s 41us/sample - loss: 0.1184 - acc: 0.9989 - val_loss: 0.0942 - val_acc: 1.0000 Epoch 5/20 945/945 [==============================] - 0s 41us/sample - loss: 0.0731 - acc: 1.0000 - val_loss: 0.0651 - val_acc: 1.0000 Epoch 6/20 945/945 [==============================] - 0s 41us/sample - loss: 0.0513 - acc: 1.0000 - val_loss: 0.0454 - val_acc: 1.0000 Epoch 7/20 945/945 [==============================] - 0s 39us/sample - loss: 0.0379 - acc: 1.0000 - val_loss: 0.0339 - val_acc: 1.0000 Epoch 8/20 945/945 [==============================] - 0s 40us/sample - loss: 0.0292 - acc: 1.0000 - val_loss: 0.0273 - val_acc: 1.0000 Epoch 9/20 945/945 [==============================] - 0s 43us/sample - loss: 0.0226 - acc: 1.0000 - val_loss: 0.0217 - val_acc: 1.0000 Epoch 10/20 945/945 [==============================] - 0s 48us/sample - loss: 0.0180 - acc: 1.0000 - val_loss: 0.0181 - val_acc: 1.0000 Epoch 11/20 945/945 [==============================] - 0s 49us/sample - loss: 0.0148 - acc: 1.0000 - val_loss: 0.0156 - val_acc: 1.0000 Epoch 12/20 945/945 [==============================] - 0s 48us/sample - loss: 0.0126 - acc: 1.0000 - val_loss: 0.0131 - val_acc: 1.0000 Epoch 13/20 945/945 [==============================] - 0s 47us/sample - loss: 0.0108 - acc: 1.0000 - val_loss: 0.0120 - val_acc: 1.0000 Epoch 14/20 945/945 [==============================] - 0s 53us/sample - loss: 0.0095 - acc: 1.0000 - val_loss: 0.0100 - val_acc: 1.0000 Epoch 15/20 945/945 [==============================] - 0s 46us/sample - loss: 0.0083 - acc: 1.0000 - val_loss: 0.0094 - val_acc: 1.0000 Epoch 16/20 945/945 [==============================] - 0s 50us/sample - loss: 0.0075 - acc: 1.0000 - val_loss: 0.0082 - val_acc: 1.0000 Epoch 17/20 945/945 [==============================] - 0s 50us/sample - loss: 0.0069 - acc: 1.0000 - val_loss: 0.0076 - val_acc: 1.0000 Epoch 18/20 945/945 [==============================] - 0s 48us/sample - loss: 0.0062 - acc: 1.0000 - val_loss: 0.0069 - val_acc: 1.0000 Epoch 19/20 945/945 [==============================] - 0s 49us/sample - loss: 0.0058 - acc: 1.0000 - val_loss: 0.0067 - val_acc: 1.0000 Epoch 20/20 945/945 [==============================] - 0s 50us/sample - loss: 0.0053 - acc: 1.0000 - val_loss: 0.0061 - val_acc: 1.0000
pd.DataFrame(h.history).plot(ylim=(-0.05, 1.05))
<matplotlib.axes._subplots.AxesSubplot at 0x7f43286f0320>
bm_score = pd.Series(y).value_counts()[0] / len(y)
train_score = model.evaluate(X_train, y_train, verbose=0)[1]
test_score = model.evaluate(X_test, y_test, verbose=0)[1]
print("""Accuracy scores:
Benchmark:\t{:0.3}
Train:\t{:0.3}
Test:\t{:0.3}""".format(bm_score, train_score, test_score))
Accuracy scores: Benchmark: 0.667 Train: 1.0 Test: 1.0
from sklearn.metrics import confusion_matrix, classification_report
y_pred = model.predict_classes(X_test)
cm = confusion_matrix(y_test, y_pred)
pd.DataFrame(cm,
index=["Miss", "Hit"],
columns=['pred_Miss', 'pred_Hit'])
pred_Miss | pred_Hit | |
---|---|---|
Miss | 311 | 0 |
Hit | 0 | 139 |
print(classification_report(y_test, y_pred))
precision recall f1-score support 0 1.00 1.00 1.00 311 1 1.00 1.00 1.00 139 accuracy 1.00 450 macro avg 1.00 1.00 1.00 450 weighted avg 1.00 1.00 1.00 450
hticks = np.linspace(-2, 2, 101)
vticks = np.linspace(-2, 2, 101)
aa, bb = np.meshgrid(hticks, vticks)
ab = np.c_[aa.ravel(), bb.ravel()]
c = model.predict(ab)
cc = c.reshape(aa.shape)
ax = df.plot(kind='scatter', c='target', x='lat', y='lon', cmap='bwr')
ax.contourf(aa, bb, cc, cmap='bwr', alpha=0.5)
<matplotlib.contour.QuadContourSet at 0x7f430f504e48>