alt text

mlcourse.ai Open Machine learning course

Author:Natalia Domozhirova, slack: @ndomozhirova

Tutorial

KERAS : easy way to construct the Neural Networks

alt text

Introduction

Keras is a high-level neural networks API, written in Python.

Major Keras features:

  • its capable of running on top of TensorFlow, CNTK, or Theano.
  • Keras allows for easy and fast prototyping and supports both Perceptrons, Convolutional networks and Recurrent networks (including LSTM), as well as their combinations.
  • Keras is compatible with: Python 2.7-3.6.

To make the process more interesting let's consider the classification example from the real life.

Example description

Let's take the task from one Hakaton, organized by some polypropylene producer this year. So, let’s consider the production of the polypropylene granules by the extruder. Extruder is a kind of “meat grinder” which has the knives at the end of the process which are cutting the output product onto the granules.
The problem is that sometimes the production mass has an irregular consistency and sticks to the knives. When there is a lot of stuck mass - knives can no longer function. In this case it is necessary to stop production process, which is very expensive. If we would catch the very beginning of such sticking process - there is a method to very quickly and painless clean the knives and continue production without stopping. So, the task is to send the stop signal to operator a bit in advance (let’s say not later then 15 minutes before such event) – so that he would have a time for necessary manipulations.

Now we have already preprocessed normalized dataset with the vectors of the system sensors' values (5,160 features) and 0/1 targets. It is already devided into the train and test. Let's download and prepare to work our datasets. In the datasets there are targets in zero column and the timestamps -in the 1st column.So, let's extract our train and test matrix as well as our targets. Also we'll transform the targets to categorical -so to have as a result our targets as 2-dimentional vectors, i.e. the vectors of probabilities of 0 and 1.

In [2]:
import pandas as pd
import numpy as np
from keras.utils import np_utils

df_train = pd.read_csv('train2.tsv', sep='\t', header=None)
df_test = pd.read_csv('test2.tsv', sep='\t', header=None)

Y_train = np.array(df_train[0].values.astype(int))
Y_test = np.array(df_test[0].values.astype(int))
X_train = np.array(df_train.iloc[:,2:].values.astype(float))
X_test = np.array(df_test.iloc[:,2:].values.astype(float))

Y_train = Y_train.astype(np.int)
Y_test = Y_test.astype(np.int)

Y_train = np_utils.to_categorical(Y_train, 2)
Y_test = np_utils.to_categorical(Y_test, 2)

print (X_train.shape)
print (Y_train.shape)
print (X_test.shape)
print (Y_test.shape)
(9498, 5160)
(9498, 2)
(2574, 5160)
(2574, 2)

The Neural Network construction

Let's consider how the simple Newral Network(NN), Multilayer Perceptron (MLP), with 3 hidden layers (as a baseline), constructed by Keras, could help us to solve this problem.

As we have hidden layers - this would be a Deep Neural Network. Also, we can see, that we need to have 5160 neurons in the input layer, as this is the size of our vector X and 2 neurons in the last layer - as this is the size of our target (vs. the picture below, where there are 4 neurons on the output layer). You can read, for example, here or here some more information about MLP structure.

The core data structure of Keras is a model - a way to organize layers. The simplest type of model is the Sequential model, a linear stack of layers, which is appropriate for MLP construction (for more complex architectures, you should use the Keras functional API, which allows to build arbitrary graphs of layers).

In [3]:
import keras
from keras import Sequential
model1 = Sequential() 

After the type of model is defined, we need to consistently add layers as Dense. Stacking layers is as easy as .add().

While adding the layer we need to define the number of neurons and Activation functions which we can tune afterwards. For the fist layer we also need to add the dimention of X vectors (input_dim). In our case this is 5,160. The last layer consists on 2 neurons exactly as our target vestors Y_train and Y_test do.

The number of layers can also be tuned.

In [4]:
from keras.layers import Activation, Dense

model1.add(Dense(64, input_dim=5160))
model1.add(Activation('relu'))

model1.add(Dense(64))
model1.add(Activation('sigmoid'))

model1.add(Dense(128))
model1.add(Activation('tanh'))

model1.add(Dense(2))
model1.add(Activation('softmax'))

Once our model looks good, we need to configure its learning process with .compile().

Here we should also describe the loss function and metrics we want to use as well as optimizer (the type of the Gradient descent to be used) which seem appropriate in each particular case.

In [5]:
model1.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Now we can iterate on our training data in batches with the batch_size we want, where X_train and y_train are Numpy arrays just like in the Scikit-Learn API. Also we can define the number of epochs (i.e. the max number of the full cycles of model's training). Verbose=1 just lets us see the summary of the current stage of calcualtions.

We can also printing our model parameters using model.summary(). It is also can be useful to see the shapes of X_train, y_train,X_test,y_test

Also, we can save the best model version during the trainig process via the callback_save option.

And there is a callback_early stop option to stop the training process when we don't have significant improvement(defined by the min_delta) during the certain number of epochs (patience).

Now our first model is ready:

In [12]:
from keras.callbacks import ModelCheckpoint, EarlyStopping

model1.summary()

print (X_train.shape)
print (Y_train.shape)
print (X_test.shape)
print (Y_test.shape)

callback_save       = ModelCheckpoint('best_model1.model1', monitor='val_acc', verbose=1, save_best_only=True, mode='auto')
callback_earlystop  = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1, mode='auto')


model1.fit(  
    X_train,   
    Y_train,   
    batch_size=20,   
    epochs=10000,  
    verbose=1,
    validation_data=(X_test, Y_test), 
    callbacks=[callback_save, callback_earlystop]
)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 64)                330304    
_________________________________________________________________
activation_1 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                4160      
_________________________________________________________________
activation_2 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               8320      
_________________________________________________________________
activation_3 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 258       
_________________________________________________________________
activation_4 (Activation)    (None, 2)                 0         
=================================================================
Total params: 343,042
Trainable params: 343,042
Non-trainable params: 0
_________________________________________________________________
(9498, 5160)
(9498, 2)
(2574, 5160)
(2574, 2)
Train on 9498 samples, validate on 2574 samples
Epoch 1/10000
9498/9498 [==============================] - 5s 565us/step - loss: 0.0158 - acc: 0.9949 - val_loss: 3.6564 - val_acc: 0.6235

Epoch 00001: val_acc improved from -inf to 0.62354, saving model to best_model1.model1
Epoch 2/10000
9498/9498 [==============================] - 5s 551us/step - loss: 0.0216 - acc: 0.9941 - val_loss: 2.3287 - val_acc: 0.6200

Epoch 00002: val_acc did not improve
Epoch 3/10000
9498/9498 [==============================] - 5s 557us/step - loss: 0.0188 - acc: 0.9937 - val_loss: 1.7905 - val_acc: 0.6185

Epoch 00003: val_acc did not improve
Epoch 4/10000
9498/9498 [==============================] - 5s 563us/step - loss: 0.0037 - acc: 0.9993 - val_loss: 3.1222 - val_acc: 0.6224

Epoch 00004: val_acc did not improve
Epoch 5/10000
9498/9498 [==============================] - 5s 535us/step - loss: 0.0219 - acc: 0.9929 - val_loss: 2.6019 - val_acc: 0.6224

Epoch 00005: val_acc did not improve
Epoch 6/10000
9498/9498 [==============================] - 5s 531us/step - loss: 0.0087 - acc: 0.9973 - val_loss: 3.9070 - val_acc: 0.6080

Epoch 00006: val_acc did not improve
Epoch 7/10000
9498/9498 [==============================] - 5s 555us/step - loss: 0.0164 - acc: 0.9952 - val_loss: 2.4076 - val_acc: 0.6290

Epoch 00007: val_acc improved from 0.62354 to 0.62898, saving model to best_model1.model1
Epoch 8/10000
9498/9498 [==============================] - 5s 527us/step - loss: 0.0100 - acc: 0.9966 - val_loss: 3.2498 - val_acc: 0.6208

Epoch 00008: val_acc did not improve
Epoch 9/10000
9498/9498 [==============================] - 5s 527us/step - loss: 0.0015 - acc: 0.9997 - val_loss: 4.3270 - val_acc: 0.6232

Epoch 00009: val_acc did not improve
Epoch 10/10000
9498/9498 [==============================] - 5s 557us/step - loss: 0.0036 - acc: 0.9985 - val_loss: 4.4523 - val_acc: 0.6220

Epoch 00010: val_acc did not improve
Epoch 11/10000
9498/9498 [==============================] - 5s 527us/step - loss: 0.0475 - acc: 0.9833 - val_loss: 1.6544 - val_acc: 0.6884

Epoch 00011: val_acc improved from 0.62898 to 0.68842, saving model to best_model1.model1
Epoch 12/10000
9498/9498 [==============================] - 5s 533us/step - loss: 0.0270 - acc: 0.9901 - val_loss: 3.0276 - val_acc: 0.6282

Epoch 00012: val_acc did not improve
Epoch 13/10000
9498/9498 [==============================] - 5s 557us/step - loss: 0.0283 - acc: 0.9907 - val_loss: 2.0502 - val_acc: 0.6329

Epoch 00013: val_acc did not improve
Epoch 14/10000
9498/9498 [==============================] - 5s 549us/step - loss: 0.0124 - acc: 0.9960 - val_loss: 3.2015 - val_acc: 0.6290

Epoch 00014: val_acc did not improve
Epoch 15/10000
9498/9498 [==============================] - 5s 525us/step - loss: 0.0112 - acc: 0.9964 - val_loss: 3.4261 - val_acc: 0.6216

Epoch 00015: val_acc did not improve
Epoch 16/10000
9498/9498 [==============================] - 5s 551us/step - loss: 0.0311 - acc: 0.9906 - val_loss: 1.0938 - val_acc: 0.7906

Epoch 00016: val_acc improved from 0.68842 to 0.79060, saving model to best_model1.model1
Epoch 17/10000
9498/9498 [==============================] - 5s 541us/step - loss: 0.0150 - acc: 0.9954 - val_loss: 1.7193 - val_acc: 0.6760

Epoch 00017: val_acc did not improve
Epoch 18/10000
9498/9498 [==============================] - 5s 545us/step - loss: 0.0096 - acc: 0.9975 - val_loss: 3.3131 - val_acc: 0.6445

Epoch 00018: val_acc did not improve
Epoch 19/10000
9498/9498 [==============================] - 5s 560us/step - loss: 0.0167 - acc: 0.9956 - val_loss: 2.9231 - val_acc: 0.6228

Epoch 00019: val_acc did not improve
Epoch 20/10000
9498/9498 [==============================] - 5s 543us/step - loss: 0.0067 - acc: 0.9977 - val_loss: 3.5476 - val_acc: 0.6674

Epoch 00020: val_acc did not improve
Epoch 21/10000
9498/9498 [==============================] - 5s 545us/step - loss: 0.0278 - acc: 0.9914 - val_loss: 3.1976 - val_acc: 0.6235

Epoch 00021: val_acc did not improve
Epoch 22/10000
9498/9498 [==============================] - 5s 557us/step - loss: 0.0146 - acc: 0.9955 - val_loss: 2.7617 - val_acc: 0.6422

Epoch 00022: val_acc did not improve
Epoch 23/10000
9498/9498 [==============================] - 5s 544us/step - loss: 0.0084 - acc: 0.9964 - val_loss: 3.7218 - val_acc: 0.6189

Epoch 00023: val_acc did not improve
Epoch 24/10000
9498/9498 [==============================] - 5s 539us/step - loss: 0.0040 - acc: 0.9989 - val_loss: 4.4816 - val_acc: 0.6224

Epoch 00024: val_acc did not improve
Epoch 25/10000
9498/9498 [==============================] - 5s 548us/step - loss: 0.0017 - acc: 0.9997 - val_loss: 4.6206 - val_acc: 0.6228

Epoch 00025: val_acc did not improve
Epoch 26/10000
9498/9498 [==============================] - 5s 557us/step - loss: 0.0311 - acc: 0.9906 - val_loss: 2.2944 - val_acc: 0.6593

Epoch 00026: val_acc did not improve
Epoch 00026: early stopping
Out[12]:
<keras.callbacks.History at 0x7f60c7c4efd0>

So, we got a baseline with Accuracy= 0.79. It looks cool, as we even didn't tune anything yet!

Let's try to improve this result. For example, we can introduce Dropout - this is a kind of regularization for the Neral Networks. The level of drop out (in the brackets, along with a seed) is a probability of the exclusion from the certain layer the random choice neuron during the current calculations. So, drop outs help to prevent the NN overfitting.
Let's create the new model:

In [14]:
from keras.layers import Dropout

model2 = Sequential() 

model2.add(Dense(64, input_dim=5160))
model2.add(Activation('relu'))
model2.add(Dropout(0.3, seed=123))

model2.add(Dense(64))
model2.add(Activation('sigmoid'))
model2.add(Dropout(0.4, seed=123))

model2.add(Dense(128))
model2.add(Activation('tanh'))
model2.add(Dropout(0.5, seed=123))

model2.add(Dense(2))
model2.add(Activation('softmax'))


model2.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model2.summary()

print (X_train.shape)
print (Y_train.shape)
print (X_test.shape)
print (Y_test.shape)

callback_save       = ModelCheckpoint('best_model2.model2', monitor='val_acc', verbose=1, save_best_only=True, mode='auto')
callback_earlystop  = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1, mode='auto')


model2.fit(  
    X_train,   
    Y_train,   
    batch_size=20,   
    epochs=10000,  
    verbose=1,
    validation_data=(X_test, Y_test), 
    callbacks=[callback_save, callback_earlystop]
)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_9 (Dense)              (None, 64)                330304    
_________________________________________________________________
activation_9 (Activation)    (None, 64)                0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_10 (Dense)             (None, 64)                4160      
_________________________________________________________________
activation_10 (Activation)   (None, 64)                0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_11 (Dense)             (None, 128)               8320      
_________________________________________________________________
activation_11 (Activation)   (None, 128)               0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 2)                 258       
_________________________________________________________________
activation_12 (Activation)   (None, 2)                 0         
=================================================================
Total params: 343,042
Trainable params: 343,042
Non-trainable params: 0
_________________________________________________________________
(9498, 5160)
(9498, 2)
(2574, 5160)
(2574, 2)
Train on 9498 samples, validate on 2574 samples
Epoch 1/10000
9498/9498 [==============================] - 7s 736us/step - loss: 0.4266 - acc: 0.8060 - val_loss: 0.3445 - val_acc: 0.8500

Epoch 00001: val_acc improved from -inf to 0.85004, saving model to best_model2.model2
Epoch 2/10000
9498/9498 [==============================] - 6s 597us/step - loss: 0.3446 - acc: 0.8523 - val_loss: 0.3082 - val_acc: 0.8664

Epoch 00002: val_acc improved from 0.85004 to 0.86636, saving model to best_model2.model2
Epoch 3/10000
9498/9498 [==============================] - 6s 579us/step - loss: 0.3083 - acc: 0.8744 - val_loss: 0.3595 - val_acc: 0.8357

Epoch 00003: val_acc did not improve
Epoch 4/10000
9498/9498 [==============================] - 6s 628us/step - loss: 0.3032 - acc: 0.8801 - val_loss: 0.3751 - val_acc: 0.8458

Epoch 00004: val_acc did not improve
Epoch 5/10000
9498/9498 [==============================] - 6s 591us/step - loss: 0.3061 - acc: 0.8696 - val_loss: 0.3138 - val_acc: 0.8683

Epoch 00005: val_acc improved from 0.86636 to 0.86830, saving model to best_model2.model2
Epoch 6/10000
9498/9498 [==============================] - 6s 592us/step - loss: 0.3001 - acc: 0.8766 - val_loss: 0.3418 - val_acc: 0.8465

Epoch 00006: val_acc did not improve
Epoch 7/10000
9498/9498 [==============================] - 6s 613us/step - loss: 0.2810 - acc: 0.8823 - val_loss: 0.4190 - val_acc: 0.8252

Epoch 00007: val_acc did not improve
Epoch 8/10000
9498/9498 [==============================] - 6s 604us/step - loss: 0.2707 - acc: 0.8907 - val_loss: 0.3932 - val_acc: 0.8489

Epoch 00008: val_acc did not improve
Epoch 9/10000
9498/9498 [==============================] - 6s 593us/step - loss: 0.2727 - acc: 0.8906 - val_loss: 0.3117 - val_acc: 0.8442

Epoch 00009: val_acc did not improve
Epoch 10/10000
9498/9498 [==============================] - 6s 633us/step - loss: 0.2603 - acc: 0.8991 - val_loss: 0.4604 - val_acc: 0.8500

Epoch 00010: val_acc did not improve
Epoch 11/10000
9498/9498 [==============================] - 6s 617us/step - loss: 0.2487 - acc: 0.9023 - val_loss: 0.3615 - val_acc: 0.8248

Epoch 00011: val_acc did not improve
Epoch 12/10000
9498/9498 [==============================] - 7s 711us/step - loss: 0.2302 - acc: 0.9129 - val_loss: 0.5051 - val_acc: 0.8543

Epoch 00012: val_acc did not improve
Epoch 00012: early stopping
Out[14]:
<keras.callbacks.History at 0x7f60c40c2550>

Thus, adding the drop-outs we've increased Accuracy on the test up to 0.86830

We can also tune all gyper-parameters like the number of layers, the levels of drop-outs, activation functions, optimizer, the number of neurons etc. For this purposes we can use, for example, another very friendly and easy-to-apply - Hyperas library. The description with examples you can find here. As a result of such tuning we've got the following model configuration:

In [16]:
model3 = Sequential()

model3.add(Dense(64, input_dim=5160))
model3.add(Activation('relu'))
model3.add(Dropout(0.11729755246044238, seed=123))
    
model3.add(Dense(256))
model3.add(Activation('relu'))
model3.add(Dropout(0.8444244099007299,seed=123))

model3.add(Dense(1024))
model3.add(Activation('linear'))
model3.add(Dropout(0.41266207281071243,seed=123))

model3.add(Dense(256))
model3.add(Activation('relu'))
model3.add(Dropout(0.4844455237320119,seed=123))

model3.add(Dense(2))
model3.add(Activation('softmax'))

model3.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model3.summary()

print (X_train.shape)
print (Y_train.shape)
print (X_test.shape)
print (Y_test.shape)

callback_save       = ModelCheckpoint('best_model3.model3', monitor='val_acc', verbose=1, save_best_only=True, mode='auto')
callback_earlystop  = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1, mode='auto')

model3.fit(  
    X_train,   
    Y_train,   
    batch_size=60,   
    epochs=10000,  
    verbose=1,
    validation_data=(X_test, Y_test), 
    callbacks=[callback_save, callback_earlystop]
)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_18 (Dense)             (None, 64)                330304    
_________________________________________________________________
activation_18 (Activation)   (None, 64)                0         
_________________________________________________________________
dropout_11 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_19 (Dense)             (None, 256)               16640     
_________________________________________________________________
activation_19 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_12 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_20 (Dense)             (None, 1024)              263168    
_________________________________________________________________
activation_20 (Activation)   (None, 1024)              0         
_________________________________________________________________
dropout_13 (Dropout)         (None, 1024)              0         
_________________________________________________________________
dense_21 (Dense)             (None, 256)               262400    
_________________________________________________________________
activation_21 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_14 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_22 (Dense)             (None, 2)                 514       
_________________________________________________________________
activation_22 (Activation)   (None, 2)                 0         
=================================================================
Total params: 873,026
Trainable params: 873,026
Non-trainable params: 0
_________________________________________________________________
(9498, 5160)
(9498, 2)
(2574, 5160)
(2574, 2)
Train on 9498 samples, validate on 2574 samples
Epoch 1/10000
9498/9498 [==============================] - 5s 480us/step - loss: 2.3813 - acc: 0.7626 - val_loss: 0.3410 - val_acc: 0.8481

Epoch 00001: val_acc improved from -inf to 0.84810, saving model to best_model3.model3
Epoch 2/10000
9498/9498 [==============================] - 4s 443us/step - loss: 0.3752 - acc: 0.8513 - val_loss: 0.3382 - val_acc: 0.8427

Epoch 00002: val_acc did not improve
Epoch 3/10000
9498/9498 [==============================] - 5s 512us/step - loss: 0.3520 - acc: 0.8597 - val_loss: 0.3412 - val_acc: 0.8283

Epoch 00003: val_acc did not improve
Epoch 4/10000
9498/9498 [==============================] - 4s 448us/step - loss: 0.3322 - acc: 0.8708 - val_loss: 0.3417 - val_acc: 0.8605

Epoch 00004: val_acc improved from 0.84810 to 0.86053, saving model to best_model3.model3
Epoch 5/10000
9498/9498 [==============================] - 4s 441us/step - loss: 0.3250 - acc: 0.8752 - val_loss: 0.3340 - val_acc: 0.8411

Epoch 00005: val_acc did not improve
Epoch 6/10000
9498/9498 [==============================] - 4s 444us/step - loss: 0.3077 - acc: 0.8847 - val_loss: 0.3714 - val_acc: 0.8644

Epoch 00006: val_acc improved from 0.86053 to 0.86441, saving model to best_model3.model3
Epoch 7/10000
9498/9498 [==============================] - 4s 468us/step - loss: 0.3092 - acc: 0.8872 - val_loss: 0.4350 - val_acc: 0.7786

Epoch 00007: val_acc did not improve
Epoch 8/10000
9498/9498 [==============================] - 4s 446us/step - loss: 0.3018 - acc: 0.8911 - val_loss: 0.7279 - val_acc: 0.7883

Epoch 00008: val_acc did not improve
Epoch 9/10000
9498/9498 [==============================] - 5s 531us/step - loss: 0.2974 - acc: 0.8905 - val_loss: 0.3203 - val_acc: 0.8807

Epoch 00009: val_acc improved from 0.86441 to 0.88073, saving model to best_model3.model3
Epoch 10/10000
9498/9498 [==============================] - 5s 520us/step - loss: 0.2952 - acc: 0.8946 - val_loss: 0.3449 - val_acc: 0.8139

Epoch 00010: val_acc did not improve
Epoch 11/10000
9498/9498 [==============================] - 5s 569us/step - loss: 0.2830 - acc: 0.8973 - val_loss: 0.3052 - val_acc: 0.8497

Epoch 00011: val_acc did not improve
Epoch 12/10000
9498/9498 [==============================] - 5s 553us/step - loss: 0.2646 - acc: 0.9071 - val_loss: 0.5467 - val_acc: 0.8193

Epoch 00012: val_acc did not improve
Epoch 13/10000
9498/9498 [==============================] - 5s 531us/step - loss: 0.2561 - acc: 0.9088 - val_loss: 0.4907 - val_acc: 0.8275

Epoch 00013: val_acc did not improve
Epoch 14/10000
9498/9498 [==============================] - 5s 510us/step - loss: 0.2569 - acc: 0.9142 - val_loss: 0.7922 - val_acc: 0.7883

Epoch 00014: val_acc did not improve
Epoch 15/10000
9498/9498 [==============================] - 5s 476us/step - loss: 0.2509 - acc: 0.9197 - val_loss: 0.3610 - val_acc: 0.8225

Epoch 00015: val_acc did not improve
Epoch 16/10000
9498/9498 [==============================] - 6s 610us/step - loss: 0.2453 - acc: 0.9205 - val_loss: 0.5702 - val_acc: 0.7898

Epoch 00016: val_acc did not improve
Epoch 17/10000
9498/9498 [==============================] - 5s 568us/step - loss: 0.2307 - acc: 0.9244 - val_loss: 0.5710 - val_acc: 0.7906

Epoch 00017: val_acc did not improve
Epoch 18/10000
9498/9498 [==============================] - 6s 583us/step - loss: 0.2225 - acc: 0.9280 - val_loss: 0.7569 - val_acc: 0.7890

Epoch 00018: val_acc did not improve
Epoch 19/10000
9498/9498 [==============================] - 6s 586us/step - loss: 0.2184 - acc: 0.9297 - val_loss: 0.6690 - val_acc: 0.7991

Epoch 00019: val_acc did not improve
Epoch 20/10000
9498/9498 [==============================] - 5s 558us/step - loss: 0.2156 - acc: 0.9345 - val_loss: 1.5746 - val_acc: 0.7257

Epoch 00020: val_acc did not improve
Epoch 21/10000
9498/9498 [==============================] - 5s 568us/step - loss: 0.2186 - acc: 0.9337 - val_loss: 0.4353 - val_acc: 0.8221

Epoch 00021: val_acc did not improve
Epoch 00021: early stopping
Out[16]:
<keras.callbacks.History at 0x7f607efaa518>

Now, with tunned parameters, we've managed to imporove Accuracy up to 0.88073

With Keras it is also possible to use L1/L2 weight regularizations which allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizers.Let's add some regularization on to the 1st layer.

In [17]:
from keras import regularizers 

model4 = Sequential()
model4.add(Dense(64, input_dim=5160,kernel_regularizer=regularizers.l2(0.0015), 
                 activity_regularizer=regularizers.l1(0.0015)))
model3.add(Activation('relu'))
model4.add(Dropout(0.11729755246044238, seed=123))
    
model4.add(Dense(256))
model4.add(Activation('relu'))
model4.add(Dropout(0.8444244099007299,seed=123))

model4.add(Dense(1024))
model4.add(Activation('linear'))
model4.add(Dropout(0.41266207281071243,seed=123))

model4.add(Dense(256))
model4.add(Activation('relu'))
model4.add(Dropout(0.4844455237320119,seed=123))

model4.add(Dense(2))
model4.add(Activation('softmax'))

model4.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model4.summary()

print (X_train.shape)
print (Y_train.shape)
print (X_test.shape)
print (Y_test.shape)

callback_save       = ModelCheckpoint('best_model4.model4', monitor='val_acc', verbose=1, save_best_only=True, mode='auto')
callback_earlystop  = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1, mode='auto')

model4.fit(  
    X_train,   
    Y_train,   
    batch_size=60,   
    epochs=10000,  
    verbose=1,
    validation_data=(X_test, Y_test), 
    callbacks=[callback_save, callback_earlystop]
)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_23 (Dense)             (None, 64)                330304    
_________________________________________________________________
dropout_15 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_24 (Dense)             (None, 256)               16640     
_________________________________________________________________
activation_24 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_16 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_25 (Dense)             (None, 1024)              263168    
_________________________________________________________________
activation_25 (Activation)   (None, 1024)              0         
_________________________________________________________________
dropout_17 (Dropout)         (None, 1024)              0         
_________________________________________________________________
dense_26 (Dense)             (None, 256)               262400    
_________________________________________________________________
activation_26 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_18 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_27 (Dense)             (None, 2)                 514       
_________________________________________________________________
activation_27 (Activation)   (None, 2)                 0         
=================================================================
Total params: 873,026
Trainable params: 873,026
Non-trainable params: 0
_________________________________________________________________
(9498, 5160)
(9498, 2)
(2574, 5160)
(2574, 2)
Train on 9498 samples, validate on 2574 samples
Epoch 1/10000
9498/9498 [==============================] - 5s 565us/step - loss: 12.8759 - acc: 0.6247 - val_loss: 6.9343 - val_acc: 0.8228

Epoch 00001: val_acc improved from -inf to 0.82284, saving model to best_model4.model4
Epoch 2/10000
9498/9498 [==============================] - 5s 562us/step - loss: 11.5926 - acc: 0.6753 - val_loss: 5.5722 - val_acc: 0.8283

Epoch 00002: val_acc improved from 0.82284 to 0.82828, saving model to best_model4.model4
Epoch 3/10000
9498/9498 [==============================] - 5s 484us/step - loss: 11.5830 - acc: 0.7046 - val_loss: 8.1338 - val_acc: 0.7308

Epoch 00003: val_acc did not improve
Epoch 4/10000
9498/9498 [==============================] - 6s 587us/step - loss: 11.4830 - acc: 0.6897 - val_loss: 7.9208 - val_acc: 0.5859

Epoch 00004: val_acc did not improve
Epoch 5/10000
9498/9498 [==============================] - 5s 488us/step - loss: 11.6809 - acc: 0.6848 - val_loss: 6.3156 - val_acc: 0.8310

Epoch 00005: val_acc improved from 0.82828 to 0.83100, saving model to best_model4.model4
Epoch 6/10000
9498/9498 [==============================] - 5s 567us/step - loss: 11.4760 - acc: 0.7123 - val_loss: 5.4389 - val_acc: 0.8213

Epoch 00006: val_acc did not improve
Epoch 7/10000
9498/9498 [==============================] - 5s 549us/step - loss: 11.5868 - acc: 0.7083 - val_loss: 5.6791 - val_acc: 0.8442

Epoch 00007: val_acc improved from 0.83100 to 0.84421, saving model to best_model4.model4
Epoch 8/10000
9498/9498 [==============================] - 5s 504us/step - loss: 11.5519 - acc: 0.7047 - val_loss: 8.5052 - val_acc: 0.6465

Epoch 00008: val_acc did not improve
Epoch 9/10000
9498/9498 [==============================] - 5s 532us/step - loss: 11.5746 - acc: 0.7115 - val_loss: 6.3268 - val_acc: 0.8186

Epoch 00009: val_acc did not improve
Epoch 10/10000
9498/9498 [==============================] - 5s 519us/step - loss: 11.6225 - acc: 0.7179 - val_loss: 13.0844 - val_acc: 0.5455

Epoch 00010: val_acc did not improve
Epoch 11/10000
9498/9498 [==============================] - 5s 511us/step - loss: 11.4641 - acc: 0.7258 - val_loss: 5.4576 - val_acc: 0.8046

Epoch 00011: val_acc did not improve
Epoch 12/10000
9498/9498 [==============================] - 5s 498us/step - loss: 11.5625 - acc: 0.7007 - val_loss: 8.5399 - val_acc: 0.6414

Epoch 00012: val_acc did not improve
Epoch 13/10000
9498/9498 [==============================] - 5s 502us/step - loss: 11.4623 - acc: 0.6997 - val_loss: 5.5184 - val_acc: 0.8353

Epoch 00013: val_acc did not improve
Epoch 14/10000
9498/9498 [==============================] - 5s 531us/step - loss: 11.4803 - acc: 0.6990 - val_loss: 7.8942 - val_acc: 0.5482

Epoch 00014: val_acc did not improve
Epoch 15/10000
9498/9498 [==============================] - 4s 469us/step - loss: 11.4089 - acc: 0.7131 - val_loss: 12.4437 - val_acc: 0.5287

Epoch 00015: val_acc did not improve
Epoch 16/10000
9498/9498 [==============================] - 5s 540us/step - loss: 11.4006 - acc: 0.7220 - val_loss: 9.4164 - val_acc: 0.6570

Epoch 00016: val_acc did not improve
Epoch 00016: early stopping
Out[17]:
<keras.callbacks.History at 0x7f607e8b6ba8>

So, we can see, that adding regualrization with the current coeffitients to the firs layer we've got just Accuracy of 0.84421 which didn't improve the result. This means, that, as usual, they should be carefully tuned :)

When we want to use the best trained model we got, we can just download previously (automatically) saved the best one (via load_model) and apply to the data needed. Let's see what we'll get on the test set:

In [18]:
from keras.models import load_model

model=load_model( 'best_model3.model3')
result = model.predict_on_batch(X_test)
result[:5]
Out[18]:
array([[1.0000000e+00, 6.0317307e-10],
       [9.9969280e-01, 3.0722743e-04],
       [8.6105287e-02, 9.1389477e-01],
       [1.0000000e+00, 0.0000000e+00],
       [8.6105287e-02, 9.1389477e-01]], dtype=float32)

You may also be interested to get a list of all weight tensors of the model, as Numpy arrays via get_weights:

In [19]:
weights =model.get_weights()
weights[:1]
Out[19]:
[array([[ 0.01994331, -0.02098   ,  0.00295136, ..., -0.04022035,
         -0.01529405, -0.02186417],
        [-0.01854505, -0.02727724, -0.02775731, ...,  0.03096288,
         -0.01255704, -0.06090376],
        [-0.00875254, -0.01949397,  0.00224739, ...,  0.02502301,
         -0.00746644,  0.03597088],
        ...,
        [-0.01869415, -0.0278235 , -0.02591862, ..., -0.00605036,
         -0.02817096, -0.05687099],
        [-0.03822353, -0.01258963,  0.01902066, ..., -0.00430657,
         -0.00082583,  0.0209065 ],
        [-0.01998828, -0.00215639,  0.00639338, ...,  0.02406462,
         -0.0115008 ,  0.02471913]], dtype=float32)]

Besides, you would propbably like to get the model config to re-use it in the future. This can be done via get_config:

In [20]:
config= model.get_config()
config
Out[20]:
[{'class_name': 'Dense',
  'config': {'activation': 'linear',
   'activity_regularizer': None,
   'batch_input_shape': (None, 5160),
   'bias_constraint': None,
   'bias_initializer': {'class_name': 'Zeros', 'config': {}},
   'bias_regularizer': None,
   'dtype': 'float32',
   'kernel_constraint': None,
   'kernel_initializer': {'class_name': 'VarianceScaling',
    'config': {'distribution': 'uniform',
     'mode': 'fan_avg',
     'scale': 1.0,
     'seed': None}},
   'kernel_regularizer': None,
   'name': 'dense_18',
   'trainable': True,
   'units': 64,
   'use_bias': True}},
 {'class_name': 'Activation',
  'config': {'activation': 'relu',
   'name': 'activation_18',
   'trainable': True}},
 {'class_name': 'Dropout',
  'config': {'name': 'dropout_11',
   'noise_shape': None,
   'rate': 0.11729755246044238,
   'seed': 123,
   'trainable': True}},
 {'class_name': 'Dense',
  'config': {'activation': 'linear',
   'activity_regularizer': None,
   'bias_constraint': None,
   'bias_initializer': {'class_name': 'Zeros', 'config': {}},
   'bias_regularizer': None,
   'kernel_constraint': None,
   'kernel_initializer': {'class_name': 'VarianceScaling',
    'config': {'distribution': 'uniform',
     'mode': 'fan_avg',
     'scale': 1.0,
     'seed': None}},
   'kernel_regularizer': None,
   'name': 'dense_19',
   'trainable': True,
   'units': 256,
   'use_bias': True}},
 {'class_name': 'Activation',
  'config': {'activation': 'relu',
   'name': 'activation_19',
   'trainable': True}},
 {'class_name': 'Dropout',
  'config': {'name': 'dropout_12',
   'noise_shape': None,
   'rate': 0.8444244099007299,
   'seed': 123,
   'trainable': True}},
 {'class_name': 'Dense',
  'config': {'activation': 'linear',
   'activity_regularizer': None,
   'bias_constraint': None,
   'bias_initializer': {'class_name': 'Zeros', 'config': {}},
   'bias_regularizer': None,
   'kernel_constraint': None,
   'kernel_initializer': {'class_name': 'VarianceScaling',
    'config': {'distribution': 'uniform',
     'mode': 'fan_avg',
     'scale': 1.0,
     'seed': None}},
   'kernel_regularizer': None,
   'name': 'dense_20',
   'trainable': True,
   'units': 1024,
   'use_bias': True}},
 {'class_name': 'Activation',
  'config': {'activation': 'linear',
   'name': 'activation_20',
   'trainable': True}},
 {'class_name': 'Dropout',
  'config': {'name': 'dropout_13',
   'noise_shape': None,
   'rate': 0.41266207281071243,
   'seed': 123,
   'trainable': True}},
 {'class_name': 'Dense',
  'config': {'activation': 'linear',
   'activity_regularizer': None,
   'bias_constraint': None,
   'bias_initializer': {'class_name': 'Zeros', 'config': {}},
   'bias_regularizer': None,
   'kernel_constraint': None,
   'kernel_initializer': {'class_name': 'VarianceScaling',
    'config': {'distribution': 'uniform',
     'mode': 'fan_avg',
     'scale': 1.0,
     'seed': None}},
   'kernel_regularizer': None,
   'name': 'dense_21',
   'trainable': True,
   'units': 256,
   'use_bias': True}},
 {'class_name': 'Activation',
  'config': {'activation': 'relu',
   'name': 'activation_21',
   'trainable': True}},
 {'class_name': 'Dropout',
  'config': {'name': 'dropout_14',
   'noise_shape': None,
   'rate': 0.4844455237320119,
   'seed': 123,
   'trainable': True}},
 {'class_name': 'Dense',
  'config': {'activation': 'linear',
   'activity_regularizer': None,
   'bias_constraint': None,
   'bias_initializer': {'class_name': 'Zeros', 'config': {}},
   'bias_regularizer': None,
   'kernel_constraint': None,
   'kernel_initializer': {'class_name': 'VarianceScaling',
    'config': {'distribution': 'uniform',
     'mode': 'fan_avg',
     'scale': 1.0,
     'seed': None}},
   'kernel_regularizer': None,
   'name': 'dense_22',
   'trainable': True,
   'units': 2,
   'use_bias': True}},
 {'class_name': 'Activation',
  'config': {'activation': 'softmax',
   'name': 'activation_22',
   'trainable': True}}]

So, the model can be reinstantiated from its config via from_config:

In [21]:
model3 = Sequential.from_config(config)

For more model tuning options proposed by Keras pls see here

What about the other types of the Neural Networks?

Yes, you can use the similar approach re the layers' construction principles for LSTM, CNN and some other types of the Deep Neural Networks. For more details pls see here.