很多人在看過RNN或LSTM的原理說明後, 對於RNN神經網絡在序列資料的學習與應用上很難一開始就理解。在本文中,我們將開發和比較幾種不同的LSTM神經網絡模型。
我們將要使用深度學習來學習英文26個字母出現的順序。也就是說,給定一個英文字母表的某一個字母,來讓神經網絡預測下一個可能會出現的字母。
ABCDEFGHIJKLMNOPQRSTUVWXYZ
例如:
給 J -> 預測 K
給 X -> 預測 Y
這是一個簡單的序列預測問題,一旦被理解,就可以推廣到其他序列預測問題,如時間序列預測和序列分類。
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
from keras.preprocessing.sequence import pad_sequences
# 給定隨機的種子, 以便讓大家跑起來的結果是相同的
numpy.random.seed(7)
Using TensorFlow backend.
我們現在可以定義我們的數據集,字母表(alphabet)。為了便於閱讀,我們使用大寫字母來定義字母表。
我們需要將字母表的每個字母映射到數字以便使用人工網絡來進行訓練。我們可以通過為字符創建字母索引的字典來輕鬆完成此操作。 我們還可以創建一個反向查找,將預測轉換回字符以供以後使用。
# 定義序列數據集
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# 創建字符映射到整數(0 - 25)和反相的查詢字典物件
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# 打印看一下
print("字母對應到數字編號: \n", char_to_int)
print("\n")
print("數字編號對應到字母: \n", int_to_char)
字母對應到數字編號: {'O': 14, 'T': 19, 'M': 12, 'Y': 24, 'E': 4, 'I': 8, 'P': 15, 'X': 23, 'R': 17, 'V': 21, 'W': 22, 'S': 18, 'H': 7, 'Z': 25, 'D': 3, 'J': 9, 'B': 1, 'A': 0, 'Q': 16, 'C': 2, 'G': 6, 'L': 11, 'K': 10, 'F': 5, 'U': 20, 'N': 13} 數字編號對應到字母: {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G', 7: 'H', 8: 'I', 9: 'J', 10: 'K', 11: 'L', 12: 'M', 13: 'N', 14: 'O', 15: 'P', 16: 'Q', 17: 'R', 18: 'S', 19: 'T', 20: 'U', 21: 'V', 22: 'W', 23: 'X', 24: 'Y', 25: 'Z'}
現在我們需要創建我們的輸入(X)和輸出(y)來訓練我們的神經網絡。我們可以通過定義一個輸入序列長度,然後從輸入字母序列中讀取序列。 例如,我們使用輸入長度1.從原始輸入數據的開頭開始,我們可以讀取第一個字母“A”,下一個字母作為預測“B”。我們沿著一個字符移動並重複,直到達到“Z”的預測。
# 準備輸入數據集
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
seq_in = alphabet[i:i + seq_length]
seq_out = alphabet[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out])
print(seq_in, '->', seq_out)
A -> B B -> C C -> D D -> E E -> F F -> G G -> H H -> I I -> J J -> K K -> L L -> M M -> N N -> O O -> P P -> Q Q -> R R -> S S -> T T -> U U -> V V -> W W -> X X -> Y Y -> Z
我們需要將NumPy數組重塑為LSTM網絡所期望的格式,也就是: (samples, time_steps, features)。 同時我們將進行資料的歸一化(normalize)來讓資料的值落於0到1之間。並對標籤值進行one-hot的編碼。
ABCDEFGHIJKLMNOPQRSTUVWXYZ
例如:
給 J -> 預測 K
給 X -> 預測 Y
目標訓練張量結構: (samples, time_steps, features) -> (n , 1, 1 )
請特別注意, 這裡的1個字符會變成1個時間步裡頭的1個element的"feature"向量。
# 重塑 X 資料的維度成為 (samples, time_steps, features)
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
# 歸一化
X = X / float(len(alphabet))
# one-hot 編碼輸出變量
y = np_utils.to_categorical(dataY)
print("X shape: ", X.shape) # (25筆samples, "1"個時間步長, 1個feature)
print("y shape: ", y.shape)
X shape: (25, 1, 1) y shape: (25, 26)
# 創建模型
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm_1 (LSTM) (None, 32) 4352 _________________________________________________________________ dense_1 (Dense) (None, 26) 858 ================================================================= Total params: 5,210 Trainable params: 5,210 Non-trainable params: 0 _________________________________________________________________
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)
Epoch 1/500 - 2s - loss: 3.2660 - acc: 0.0000e+00 Epoch 2/500 - 0s - loss: 3.2582 - acc: 0.0000e+00 Epoch 3/500 - 0s - loss: 3.2551 - acc: 0.0400 Epoch 4/500 - 0s - loss: 3.2524 - acc: 0.0400 Epoch 5/500 - 0s - loss: 3.2495 - acc: 0.0400 Epoch 6/500 - 0s - loss: 3.2470 - acc: 0.0400 Epoch 7/500 - 0s - loss: 3.2440 - acc: 0.0400 Epoch 8/500 - 0s - loss: 3.2411 - acc: 0.0400 Epoch 9/500 - 0s - loss: 3.2378 - acc: 0.0400 Epoch 10/500 - 0s - loss: 3.2348 - acc: 0.0400 ... Epoch 490/500 - 0s - loss: 1.7054 - acc: 0.7200 Epoch 491/500 - 0s - loss: 1.7041 - acc: 0.7600 Epoch 492/500 - 0s - loss: 1.7029 - acc: 0.8800 Epoch 493/500 - 0s - loss: 1.7021 - acc: 0.7600 Epoch 494/500 - 0s - loss: 1.7024 - acc: 0.8800 Epoch 495/500 - 0s - loss: 1.6992 - acc: 0.7600 Epoch 496/500 - 0s - loss: 1.7001 - acc: 0.8000 Epoch 497/500 - 0s - loss: 1.6995 - acc: 0.6800 Epoch 498/500 - 0s - loss: 1.6994 - acc: 0.7600 Epoch 499/500 - 0s - loss: 1.7001 - acc: 0.8000 Epoch 500/500 - 0s - loss: 1.6963 - acc: 0.8400
<keras.callbacks.History at 0x2145c550>
# 評估模型的性能
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
Model Accuracy: 92.00%
# 展示模型預測能力
for pattern in dataX:
# 把26個字母一個個拿進模型來預測會出現的字母
x = numpy.reshape(pattern, (1, len(pattern), 1))
x = x / float(len(alphabet))
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction) # 機率最大的idx
result = int_to_char[index] # 看看預測出來的是那一個字母
seq_in = [int_to_char[value] for value in pattern]
print(seq_in, "->", result) # 打印結果
['A'] -> B ['B'] -> C ['C'] -> D ['D'] -> E ['E'] -> F ['F'] -> G ['G'] -> H ['H'] -> I ['I'] -> J ['J'] -> K ['K'] -> L ['L'] -> M ['M'] -> N ['N'] -> O ['O'] -> P ['P'] -> Q ['Q'] -> R ['R'] -> S ['S'] -> T ['T'] -> U ['U'] -> V ['V'] -> W ['W'] -> Y ['X'] -> Z ['Y'] -> Z
我們可以看到,"序列資料的預測"這個問題對於網絡學習確實是困難的。 原因是,在以上的範例中的LSTM單位沒有任何上下文的知識(時間歩長只有"1")。每個輸入輸出模式以隨機順序(shuffle)出現到人工網網絡上,而且Keras的LSTM網絡內步狀態(state)會在每個訓練循環(epoch)後被重置(reset)。
接下來,讓我們嘗試提供更多的順序資訊來讓LSTM學習。
# 準備輸入數據集
seq_length = 3 # 這次我們要準備3個時間步長的資料
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
seq_in = alphabet[i:i + seq_length] # 3個字符
seq_out = alphabet[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out])
print(seq_in, '->', seq_out)
ABC -> D BCD -> E CDE -> F DEF -> G EFG -> H FGH -> I GHI -> J HIJ -> K IJK -> L JKL -> M KLM -> N LMN -> O MNO -> P NOP -> Q OPQ -> R PQR -> S QRS -> T RST -> U STU -> V TUV -> W UVW -> X VWX -> Y WXY -> Z
ABCDEFGHIJKLMNOPQRSTUVWXYZ
例如:
給 HIJ -> 預測 K
給 EFG -> 預測 H
目標訓練張量結構: (samples, time_steps, features) -> (n , 1, 3 )
請特別注意, 這裡的三個字符會變成一個有3個element的"feature" vector。因此在準備訓練資料集的時候, 1筆訓練資料只有"1"個時間步, 裡頭存放著"3"個字符的資料"features"向量。
# 重塑 X 資料的維度成為 (samples, time_steps, features)
X = numpy.reshape(dataX, (len(dataX), 1, seq_length)) # <-- 特別注意這裡
# 歸一化
X = X / float(len(alphabet))
# 使用one hot encode 對Y值進行編碼
y = np_utils.to_categorical(dataY)
print("X shape: ", X.shape)
print("y shape: ", y.shape)
X shape: (23, 1, 3) y shape: (23, 26)
# 創建模型
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) # <-- 特別注意這裡
model.add(Dense(y.shape[1], activation='softmax'))
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm_2 (LSTM) (None, 32) 4608 _________________________________________________________________ dense_2 (Dense) (None, 26) 858 ================================================================= Total params: 5,466 Trainable params: 5,466 Non-trainable params: 0 _________________________________________________________________
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)
Epoch 1/500 - 2s - loss: 3.2753 - acc: 0.0435 Epoch 2/500 - 0s - loss: 3.2629 - acc: 0.0000e+00 Epoch 3/500 - 0s - loss: 3.2556 - acc: 0.0000e+00 Epoch 4/500 - 0s - loss: 3.2487 - acc: 0.0435 Epoch 5/500 - 0s - loss: 3.2420 - acc: 0.0435 Epoch 6/500 - 0s - loss: 3.2355 - acc: 0.0435 Epoch 7/500 - 0s - loss: 3.2294 - acc: 0.0435 Epoch 8/500 - 0s - loss: 3.2214 - acc: 0.0435 Epoch 9/500 - 0s - loss: 3.2142 - acc: 0.0435 Epoch 10/500 - 0s - loss: 3.2056 - acc: 0.0435 ... Epoch 490/500 - 0s - loss: 1.6114 - acc: 0.7826 Epoch 491/500 - 0s - loss: 1.6089 - acc: 0.7826 Epoch 492/500 - 0s - loss: 1.6108 - acc: 0.8261 Epoch 493/500 - 0s - loss: 1.6091 - acc: 0.7826 Epoch 494/500 - 0s - loss: 1.6057 - acc: 0.7826 Epoch 495/500 - 0s - loss: 1.6060 - acc: 0.7826 Epoch 496/500 - 0s - loss: 1.6058 - acc: 0.8261 Epoch 497/500 - 0s - loss: 1.6045 - acc: 0.7826 Epoch 498/500 - 0s - loss: 1.6042 - acc: 0.7826 Epoch 499/500 - 0s - loss: 1.6006 - acc: 0.8696 Epoch 500/500 - 0s - loss: 1.6011 - acc: 0.7826
<keras.callbacks.History at 0x235b8d68>
# 評估模型的性能
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
Model Accuracy: 82.61%
# 展示一些模型預測
for pattern in dataX:
x = numpy.reshape(pattern, (1, 1, len(pattern)))
x = x / float(len(alphabet))
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
print(seq_in, "->", result)
['A', 'B', 'C'] -> D ['B', 'C', 'D'] -> E ['C', 'D', 'E'] -> F ['D', 'E', 'F'] -> G ['E', 'F', 'G'] -> H ['F', 'G', 'H'] -> I ['G', 'H', 'I'] -> J ['H', 'I', 'J'] -> K ['I', 'J', 'K'] -> L ['J', 'K', 'L'] -> M ['K', 'L', 'M'] -> N ['L', 'M', 'N'] -> O ['M', 'N', 'O'] -> P ['N', 'O', 'P'] -> Q ['O', 'P', 'Q'] -> R ['P', 'Q', 'R'] -> S ['Q', 'R', 'S'] -> T ['R', 'S', 'T'] -> U ['S', 'T', 'U'] -> W ['T', 'U', 'V'] -> X ['U', 'V', 'W'] -> Z ['V', 'W', 'X'] -> Z ['W', 'X', 'Y'] -> Z
我們可以看到,"模型#2"相比於"模型#1"在預測的表現上只有小幅提升。這個簡單的問題,即使使用window方法,我們仍然無法讓LSTM學習到預測正確的字母出現的順序。
以上的範例也是一個誤用LSTM網絡的糟糕的張量結構。事實上,字母序列是一個特徵的"時間步驟(timesteps)",而不是單獨特徵的一個時間步驟。我們已經給了網絡更多的上下文,但是沒有更多的順序上下文(context)。
下一範例中,我們將以"時間步驟(timesteps)"的形式給出更多的上下文(context)。
seq_length = 3
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
seq_in = alphabet[i:i + seq_length]
seq_out = alphabet[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out])
print(seq_in, '->', seq_out)
ABC -> D BCD -> E CDE -> F DEF -> G EFG -> H FGH -> I GHI -> J HIJ -> K IJK -> L JKL -> M KLM -> N LMN -> O MNO -> P NOP -> Q OPQ -> R PQR -> S QRS -> T RST -> U STU -> V TUV -> W UVW -> X VWX -> Y WXY -> Z
ABCDEFGHIJKLMNOPQRSTUVWXYZ
例如:
給 HIJ -> 預測 K
給 EFG -> 預測 H
目標訓練張量結構: (samples, time_steps, features) -> (n , 3, 1 )
準備訓練資料集的時候要把資料的張量結構轉換成, 1筆訓練資料有"3"個時間步, 裡頭存放著"1"個字符的資料"features"向量。
# 重塑 X 資料的維度成為 (samples, time_steps, features)
X = numpy.reshape(dataX, (len(dataX), seq_length, 1)) # <-- 特別注意這裡
# 歸一化
X = X / float(len(alphabet))
# 使用one hot encode 對Y值進行編碼
y = np_utils.to_categorical(dataY)
# 創建模型
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) # <-- 特別注意這裡
model.add(Dense(y.shape[1], activation='softmax'))
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm_3 (LSTM) (None, 32) 4352 _________________________________________________________________ dense_3 (Dense) (None, 26) 858 ================================================================= Total params: 5,210 Trainable params: 5,210 Non-trainable params: 0 _________________________________________________________________
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)
Epoch 1/500 - 2s - loss: 3.2632 - acc: 0.0000e+00 Epoch 2/500 - 0s - loss: 3.2500 - acc: 0.0000e+00 Epoch 3/500 - 0s - loss: 3.2425 - acc: 0.0435 Epoch 4/500 - 0s - loss: 3.2357 - acc: 0.0000e+00 Epoch 5/500 - 0s - loss: 3.2285 - acc: 0.0000e+00 Epoch 6/500 - 0s - loss: 3.2207 - acc: 0.0435 Epoch 7/500 - 0s - loss: 3.2125 - acc: 0.0435 Epoch 8/500 - 0s - loss: 3.2036 - acc: 0.0435 Epoch 9/500 - 0s - loss: 3.1919 - acc: 0.0435 Epoch 10/500 - 0s - loss: 3.1821 - acc: 0.0435 ... Epoch 490/500 - 0s - loss: 0.2457 - acc: 1.0000 Epoch 491/500 - 0s - loss: 0.2387 - acc: 1.0000 Epoch 492/500 - 0s - loss: 0.2394 - acc: 1.0000 Epoch 493/500 - 0s - loss: 0.2384 - acc: 1.0000 Epoch 494/500 - 0s - loss: 0.2416 - acc: 1.0000 Epoch 495/500 - 0s - loss: 0.2385 - acc: 1.0000 Epoch 496/500 - 0s - loss: 0.2380 - acc: 1.0000 Epoch 497/500 - 0s - loss: 0.2331 - acc: 1.0000 Epoch 498/500 - 0s - loss: 0.2341 - acc: 1.0000 Epoch 499/500 - 0s - loss: 0.2371 - acc: 1.0000 Epoch 500/500 - 0s - loss: 0.2325 - acc: 1.0000
<keras.callbacks.History at 0x261e7ac8>
# 評估模型的性能
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
Model Accuracy: 100.00%
# 讓我們擷取3個字符轉成張量結構 shape:(1,3,1)來進行infer
for pattern in dataX:
x = numpy.reshape(pattern, (1, len(pattern), 1))
x = x / float(len(alphabet))
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
print(seq_in, "->", result)
['A', 'B', 'C'] -> D ['B', 'C', 'D'] -> E ['C', 'D', 'E'] -> F ['D', 'E', 'F'] -> G ['E', 'F', 'G'] -> H ['F', 'G', 'H'] -> I ['G', 'H', 'I'] -> J ['H', 'I', 'J'] -> K ['I', 'J', 'K'] -> L ['J', 'K', 'L'] -> M ['K', 'L', 'M'] -> N ['L', 'M', 'N'] -> O ['M', 'N', 'O'] -> P ['N', 'O', 'P'] -> Q ['O', 'P', 'Q'] -> R ['P', 'Q', 'R'] -> S ['Q', 'R', 'S'] -> T ['R', 'S', 'T'] -> U ['S', 'T', 'U'] -> V ['T', 'U', 'V'] -> W ['U', 'V', 'W'] -> X ['V', 'W', 'X'] -> Y ['W', 'X', 'Y'] -> Z
由"模型#3"的表現來看, 當我們以"時間步驟(timesteps)"的形式給出更多的上下文(context)來訓練LSTM模型時, 這時候循環神經網絡在序列資料的學習的效果就可以發揮出它的效用。
"模型#3"在驗證的結果可達到100%的預測準確度(在這個很簡單的26個字母的順序預測的任務上)!
讓我們建立一個模型,來接受"變動字母序列(variable-length)"的輸入來預測下一個字母。
為了簡化,我們將定義一個最大輸入序列長度(比如說"5", 代表輸入的序列可以是 1 ~ 5),以加速訓練。
# 準備訓練資料
num_inputs = 1000
max_len = 5 # 最大序列長度
dataX = []
dataY = []
for i in range(num_inputs):
start = numpy.random.randint(len(alphabet)-2)
end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1))
sequence_in = alphabet[start:end+1]
sequence_out = alphabet[end + 1]
dataX.append([char_to_int[char] for char in sequence_in])
dataY.append(char_to_int[sequence_out])
print(sequence_in, '->', sequence_out)
UVWXY -> Z UVWXY -> Z EFGH -> I GHIJ -> K EFGH -> I DEF -> G CDEFG -> H OP -> Q X -> Y TU -> V IJK -> L LMNOP -> Q T -> U UVWX -> Y X -> Y H -> I EFGHI -> J QRSTU -> V EFG -> H RSTU -> V QRST -> U QR -> S JK -> L GHI -> J KL -> M BCDE -> F AB -> C KLMNO -> P UVWXY -> Z EFGH -> I FG -> H DEF -> G STU -> V FGHI -> J OP -> Q FGHIJ -> K LMNOP -> Q DEF -> G W -> X KLMN -> O WXY -> Z PQRST -> U LMNOP -> Q PQ -> R FGHI -> J QRS -> T CDEFG -> H VW -> X DEF -> G ...
因為輸入序列的長度會在1到max_len之間變動,因此需要以"0"來填充(padding)。在這裡,我們使用Keras內附的pad_sequences()函數並設定使用左側(前綴)填充。
# 將訓練資料轉換為陣列和並進行序列填充(如果需要)
X = pad_sequences(dataX, maxlen=max_len, dtype='float32') # <-- 注意這裡
# 重塑 X 資料的維度成為 (samples, time_steps, features)
X = numpy.reshape(X, (X.shape[0], max_len, 1)) # <-- 特別注意這裡
# 歸一化
X = X / float(len(alphabet))
# 使用one hot encode 對Y值進行編碼
y = np_utils.to_categorical(dataY)
# 創建模型
batch_size = 1
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], 1))) # <-- 注意這裡
model.add(Dense(y.shape[1], activation='softmax'))
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lstm_4 (LSTM) (None, 32) 4352 _________________________________________________________________ dense_4 (Dense) (None, 26) 858 ================================================================= Total params: 5,210 Trainable params: 5,210 Non-trainable params: 0 _________________________________________________________________
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)
Epoch 1/500 - 7s - loss: 3.0984 - acc: 0.0770 Epoch 2/500 - 6s - loss: 2.8499 - acc: 0.1180 Epoch 3/500 - 7s - loss: 2.5159 - acc: 0.1990 Epoch 4/500 - 6s - loss: 2.2225 - acc: 0.2440 Epoch 5/500 - 6s - loss: 2.0490 - acc: 0.2880 Epoch 6/500 - 5s - loss: 1.9284 - acc: 0.3050 Epoch 7/500 - 5s - loss: 1.8142 - acc: 0.3530 Epoch 8/500 - 6s - loss: 1.7206 - acc: 0.3940 Epoch 9/500 - 6s - loss: 1.6397 - acc: 0.4070 Epoch 10/500 - 6s - loss: 1.5607 - acc: 0.4340 ... Epoch 490/500 - 5s - loss: 0.2087 - acc: 0.9550 Epoch 491/500 - 5s - loss: 0.0767 - acc: 0.9910 Epoch 492/500 - 5s - loss: 0.0779 - acc: 0.9900 Epoch 493/500 - 5s - loss: 0.0790 - acc: 0.9920 Epoch 494/500 - 5s - loss: 0.0812 - acc: 0.9860 Epoch 495/500 - 5s - loss: 0.0815 - acc: 0.9870 Epoch 496/500 - 5s - loss: 0.0811 - acc: 0.9840 Epoch 497/500 - 6s - loss: 0.1914 - acc: 0.9610 Epoch 498/500 - 5s - loss: 0.1258 - acc: 0.9730 Epoch 499/500 - 5s - loss: 0.0752 - acc: 0.9940 Epoch 500/500 - 5s - loss: 0.0786 - acc: 0.9890
<keras.callbacks.History at 0x261918d0>
# 評估模型的性能
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
Model Accuracy: 98.50%
# 讓我們擷取1~5個字符轉成張量結構 shape:(1,5,1)來進行infer
for i in range(20):
pattern_index = numpy.random.randint(len(dataX))
pattern = dataX[pattern_index]
x = pad_sequences([pattern], maxlen=max_len, dtype='float32')
x = numpy.reshape(x, (1, max_len, 1))
x = x / float(len(alphabet))
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
print(seq_in, "->", result)
['T', 'U', 'V', 'W', 'X'] -> Y ['B', 'C'] -> D ['G', 'H', 'I'] -> J ['Q', 'R', 'S', 'T'] -> U ['D', 'E', 'F'] -> G ['I', 'J', 'K'] -> L ['G', 'H', 'I'] -> J ['K', 'L', 'M'] -> N ['N'] -> O ['A', 'B', 'C', 'D', 'E'] -> F ['X'] -> Y ['A', 'B', 'C', 'D'] -> E ['V'] -> W ['Q', 'R', 'S', 'T', 'U'] -> V ['B', 'C', 'D', 'E', 'F'] -> G ['R'] -> S ['W'] -> X ['A'] -> B ['E', 'F', 'G'] -> H ['T'] -> U
我們可以看到,雖然這個網絡模型沒有從生成的序列資料中完全學習到英文字母表的順序,但它表現相當的好。如果需要, 我們可以對這個模型進行進一歩的優化與調整,比如更多的訓練循環(more epochs)或更大的網絡(larger network),或兩者。
Jason Brownlee - "Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras"
Keras官網 - Recurrent Layer