卷積運算的原理是將一個影像透過卷積運算的 Filter weight(s) 產生多個影像, 在上面第一層的 Convolution 為例:
Max-Pool 運算可以將影像縮減取樣 (downsampling), 如下圖: 原本影像是 4x4, 經過 Max-Pool 運算後, 影像大小為 2x2:
downsampling 有以下好處:
CNN (Convolution Neural Network) 與 MLP 進行資料的前處理方式有所不同, 說明如下:
from keras.datasets import mnist
from keras.utils import np_utils
import numpy as np
np.random.seed(10)
# Read MNIST data
(X_Train, y_Train), (X_Test, y_Test) = mnist.load_data()
# Translation of data
X_Train4D = X_Train.reshape(X_Train.shape[0], 28, 28, 1).astype('float32')
X_Test4D = X_Test.reshape(X_Test.shape[0], 28, 28, 1).astype('float32')
Using TensorFlow backend.
# Standardize feature data
X_Train4D_norm = X_Train4D / 255
X_Test4D_norm = X_Test4D /255
# Label Onehot-encoding
y_TrainOneHot = np_utils.to_categorical(y_Train)
y_TestOneHot = np_utils.to_categorical(y_Test)
接著會依照下面流程圖建立模型:
from keras.models import Sequential
from keras.layers import Dense,Dropout,Flatten,Conv2D,MaxPool2D
model = Sequential()
# Create CN layer 1
model.add(Conv2D(filters=16,
kernel_size=(5,5),
padding='same',
input_shape=(28,28,1),
activation='relu',
name='conv2d_1'))
# Create Max-Pool 1
model.add(MaxPool2D(pool_size=(2,2), name='max_pooling2d_1'))
# Create CN layer 2
model.add(Conv2D(filters=36,
kernel_size=(5,5),
padding='same',
input_shape=(28,28,1),
activation='relu',
name='conv2d_2'))
# Create Max-Pool 2
model.add(MaxPool2D(pool_size=(2,2), name='max_pooling2d_2'))
# Add Dropout layer
model.add(Dropout(0.25, name='dropout_1'))
下面程式碼建立平坦層, 將之前步驟已經建立的池化層2, 共有 36 個 7x7 維度的影像轉換成 1 維向量, 長度是 36x7x7 = 1764, 也就是對應到 1764 個神經元:
model.add(Flatten(name='flatten_1'))
model.add(Dense(128, activation='relu', name='dense_1'))
model.add(Dropout(0.5, name='dropout_2'))
最後建立輸出層, 共有 10 個神經元, 對應到 0~9 共 10 個數字. 並使用 softmax 激活函數 進行轉換 (softmax 函數可以將神經元的輸出轉換成每一個數字的機率):
model.add(Dense(10, activation='softmax', name='dense_2'))
model.summary()
print("")
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 28, 28, 16) 416 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 14, 14, 36) 14436 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 7, 7, 36) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 7, 7, 36) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 1764) 0 _________________________________________________________________ dense_1 (Dense) (None, 128) 225920 _________________________________________________________________ dropout_2 (Dropout) (None, 128) 0 _________________________________________________________________ dense_2 (Dense) (None, 10) 1290 ================================================================= Total params: 242,062 Trainable params: 242,062 Non-trainable params: 0 _________________________________________________________________
接著我們使用 Back Propagation 進行訓練。
# 定義訓練方式
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# 開始訓練
train_history = model.fit(x=X_Train4D_norm,
y=y_TrainOneHot, validation_split=0.2,
epochs=10, batch_size=300, verbose=1)
Train on 48000 samples, validate on 12000 samples Epoch 1/10 48000/48000 [==============================] - 98s - loss: 0.4874 - acc: 0.8477 - val_loss: 0.0964 - val_acc: 0.9722 Epoch 2/10 48000/48000 [==============================] - 105s - loss: 0.1409 - acc: 0.9587 - val_loss: 0.0634 - val_acc: 0.9799 Epoch 3/10 48000/48000 [==============================] - 105s - loss: 0.1029 - acc: 0.9690 - val_loss: 0.0517 - val_acc: 0.9837 Epoch 4/10 48000/48000 [==============================] - 107s - loss: 0.0851 - acc: 0.9747 - val_loss: 0.0456 - val_acc: 0.9862 Epoch 5/10 48000/48000 [==============================] - 103s - loss: 0.0717 - acc: 0.9785 - val_loss: 0.0397 - val_acc: 0.9868 Epoch 6/10 48000/48000 [==============================] - 101s - loss: 0.0648 - acc: 0.9807 - val_loss: 0.0394 - val_acc: 0.9884 Epoch 7/10 48000/48000 [==============================] - 104s - loss: 0.0566 - acc: 0.9829 - val_loss: 0.0418 - val_acc: 0.9873 Epoch 8/10 48000/48000 [==============================] - 100s - loss: 0.0513 - acc: 0.9844 - val_loss: 0.0341 - val_acc: 0.9903 Epoch 9/10 48000/48000 [==============================] - 103s - loss: 0.0451 - acc: 0.9864 - val_loss: 0.0341 - val_acc: 0.9904 Epoch 10/10 48000/48000 [==============================] - 105s - loss: 0.0430 - acc: 0.9870 - val_loss: 0.0342 - val_acc: 0.9901
在 compile 方法中:
之前的訓練步驟產生的 accuracy 與 loss 都會記錄在 train_history 變數.
import matplotlib.pyplot as plt
def plot_image(image):
fig = plt.gcf()
fig.set_size_inches(2,2)
plt.imshow(image, cmap='binary')
plt.show()
def plot_images_labels_predict(images, labels, prediction, idx, num=10):
fig = plt.gcf()
fig.set_size_inches(12, 14)
if num > 25: num = 25
for i in range(0, num):
ax=plt.subplot(5,5, 1+i)
ax.imshow(images[idx], cmap='binary')
title = "l=" + str(labels[idx])
if len(prediction) > 0:
title = "l={},p={}".format(str(labels[idx]), str(prediction[idx]))
else:
title = "l={}".format(str(labels[idx]))
ax.set_title(title, fontsize=10)
ax.set_xticks([]); ax.set_yticks([])
idx+=1
plt.show()
def show_train_history(train_history, train, validation):
plt.plot(train_history.history[train])
plt.plot(train_history.history[validation])
plt.title('Train History')
plt.ylabel(train)
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
使用函數 show_train_history 顯示 accuracy 在 train 與 evaluation 的差異與 loss 在 train 與 evaluation 的差異如下:
show_train_history(train_history, 'acc', 'val_acc')
show_train_history(train_history, 'loss', 'val_loss')
我們已經完成訓練, 接下來要使用 test 測試資料集來評估準確率。
scores = model.evaluate(X_Test4D_norm, y_TestOneHot)
print()
print("\t[Info] Accuracy of testing data = {:2.1f}%".format(scores[1]*100.0))
9984/10000 [============================>.] - ETA: 0s [Info] Accuracy of testing data = 99.1%
print("\t[Info] Making prediction of X_Test4D_norm")
prediction = model.predict_classes(X_Test4D_norm) # Making prediction and save result to prediction
print()
print("\t[Info] Show 10 prediction result (From 240):")
print("%s\n" % (prediction[240:250]))
[Info] Making prediction of X_Test4D_norm 9984/10000 [============================>.] - ETA: 0s [Info] Show 10 prediction result (From 240): [5 9 8 7 2 3 0 4 4 2]
plot_images_labels_predict(X_Test, y_Test, prediction, idx=240)
在這篇文章中有一些個人學習到的一些有趣的重點: