In this lesson you are going to learn how to train your NN in batches.
What do you mean batches?
So far when we have been training our model, we have been feeding it all of our training data on every iteration. Sometimes it may make sense to feed the model small batches of 10 or 100 samples at a time. This will allow our model to update its weights more often and possibly give you better results. In addition you may also end up in a situation that you just do not have enough memory to feed the entire training data. By training in batches, this problem would be solved.
# import libraries
import tensorflow as tf
import pandas as pd
import numpy as np
import sys
import datetime
import matplotlib.pyplot as plt
plt.style.use('ggplot') # use this plot style
%matplotlib inline
print('Python version ' + sys.version)
print('Tensorflow version ' + tf.VERSION)
print('Pandas version ' + pd.__version__)
print('Numpy version ' + np.__version__)
Python version 3.5.1 |Anaconda custom (64-bit)| (default, Feb 16 2016, 09:49:46) [MSC v.1900 64 bit (AMD64)] Tensorflow version 0.12.0-rc0 Pandas version 0.19.0 Numpy version 1.11.0
y = a * x^4 + b
TIP: Recommended percentages
# Let's generate 1000 random samples
pool = np.random.rand(1000,1).astype(np.float32)
# Shuffle the samples
np.random.shuffle(pool)
# sample size of 15%
sample = int(1000 * 0.15)
# 15% test
test_x = pool[0:sample]
# 15% validation
valid_x = pool[sample:sample*2]
# 70% training
train_x = pool[sample*2:]
print('Testing data points: ' + str(test_x.shape))
print('Validation data points: ' + str(valid_x.shape))
print('Training data points: ' + str(train_x.shape))
# Let's compute the ouput using 2 for a, 5 for b
test_y = 2.0 * test_x**4 + 5
valid_y = 2.0 * valid_x**4 + 5
train_y = 2.0 * train_x**4 + 5
Testing data points: (150, 1) Validation data points: (150, 1) Training data points: (700, 1)
df = pd.DataFrame({'x':train_x[:,0],
'y':train_y[:,0]})
df.head()
x | y | |
---|---|---|
0 | 0.072982 | 5.000057 |
1 | 0.627874 | 5.310827 |
2 | 0.751243 | 5.637018 |
3 | 0.291485 | 5.014438 |
4 | 0.559812 | 5.196426 |
df.describe()
x | y | |
---|---|---|
count | 700.000000 | 700.000000 |
mean | 0.475430 | 5.353024 |
std | 0.286284 | 0.491342 |
min | 0.000471 | 5.000000 |
25% | 0.228471 | 5.005450 |
50% | 0.482200 | 5.108128 |
75% | 0.718817 | 5.533954 |
max | 0.999141 | 6.993135 |
df.plot.scatter(x='x', y='y', figsize=(15,5));
Make a function that will help you create layers easily
def add_layer(inputs, in_size, out_size, activation_function=None):
# tf.random_normal([what is the size of your batches, size of output layer])
Weights = tf.Variable(tf.truncated_normal([in_size, out_size], mean=0.1, stddev=0.1))
# tf.random_normal([size of output layer])
biases = tf.Variable(tf.truncated_normal([out_size], mean=0.1, stddev=0.1))
# shape of pred = [size of your batches, size of output layer]
pred = tf.matmul(inputs, Weights) + biases
if activation_function is None:
outputs = pred
else:
outputs = activation_function(pred)
return outputs
Start to use W (for weight) and b (for bias) when setting up your variables. Aside from adding your ReLU activation function, it is a good idea to use Tensorflow's *matrix multiplication function (matmul)* as shown below.
The ? in the shape output just means it can be of any shape.
# larger batch sizes help you get to the local minimum faster at a cost of more cpu power
# The strategy is to use batch_size when you cannot fit the entire dataset into memory
# In practice, small to moderate mini-batches (10-500) are generally used
batch_size = 10
# you can adjust the number of neurons in the hidden layers here
hidden_size = 10
# placeholders
# shape=[how many samples do you have, how many input neurons]
x = tf.placeholder(tf.float32, shape=[None, 1], name="01_x")
y = tf.placeholder(tf.float32, shape=[None, 1], name="01_y")
print("shape of x and y:")
print(x.get_shape(),y.get_shape())
shape of x and y: (?, 1) (?, 1)
We will be feeding in the percentage of neurons to keep on every epoch
# drop out
keep_prob = tf.placeholder(tf.float32)
Note that the input of one layer becomes the input of the next layer.
# create your hidden layers!
h1 = add_layer(x, 1, hidden_size, tf.nn.relu)
# here is where we shoot down some of the neurons
h1_drop = tf.nn.dropout(h1, keep_prob)
# add a second layer
h2 = add_layer(h1_drop, hidden_size, hidden_size, tf.nn.relu)
h2_drop = tf.nn.dropout(h2, keep_prob)
# add a third layer
h3 = add_layer(h2_drop, hidden_size, hidden_size, tf.nn.relu)
h3_drop = tf.nn.dropout(h3, keep_prob)
# add a fourth layer
h4 = add_layer(h3_drop, hidden_size, hidden_size, tf.nn.relu)
h4_drop = tf.nn.dropout(h4, keep_prob)
print("shape of hidden layers:")
print(h1_drop.get_shape(), h2_drop.get_shape(), h3_drop.get_shape(), h4_drop.get_shape())
shape of hidden layers: (?, 10) (?, 10) (?, 10) (?, 10)
# Output Layers
pred = add_layer(h4_drop, hidden_size, 1)
print("shape of output layer:")
print(pred.get_shape())
shape of output layer: (?, 1)
# minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y))
# pick optimizer
optimizer = tf.train.GradientDescentOptimizer(0.0099)
train = optimizer.minimize(loss)
Set up the following variables to calculate the accuracy rate of your model. You will do that shortly.
# check accuracy of model
correct_prediction = tf.equal(tf.round(pred), tf.round(y))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Code borrowed from this great Tensorflow Jupyter Notebook.
# Best validation accuracy seen so far.
best_valid_acc = 0.0
# Iteration-number for last improvement to validation accuracy.
last_improvement = 0
# Stop optimization if no improvement found in this many iterations.
require_improvement = 1500
Variable *i* will pull a random sample of n (size of your batches) on every training iteration. Take a look how the variable *train_data* was modified.
# initialize the variables
init = tf.global_variables_initializer()
# hold step and error values
t = []
# Run your graph
with tf.Session() as sess:
# initialize variables
sess.run(init)
# Fit the function.
for step in range(6000):
# pull batches at random
i = np.random.permutation(train_x.shape[0])[:batch_size]
# get your data
train_data = {x:train_x[i,:], y:train_y[i,:], keep_prob: 0.975}
valid_data = {x:valid_x, y:valid_y, keep_prob: 1.0}
test_data = {x:test_x, y:test_y, keep_prob: 1.0}
# training in progress...
train_loss, train_pred = sess.run([loss, train], feed_dict=train_data)
# print every n iterations
if step%100==0:
# capture the step and error for analysis
valid_loss = sess.run(loss, feed_dict=valid_data)
t.append((step, train_loss, valid_loss))
# get snapshot of current training and validation accuracy
train_acc = accuracy.eval(train_data)
valid_acc = accuracy.eval(valid_data)
# If validation accuracy is an improvement over best-known.
if valid_acc > best_valid_acc:
# Update the best-known validation accuracy.
best_valid_acc = valid_acc
# Set the iteration for the last improvement to current.
last_improvement = step
# Flag when ever an improvement is found
improved_str = '*'
else:
# An empty string to be printed below.
# Shows that no improvement was found.
improved_str = ''
print("Training loss at step %d: %f %s" % (step, train_loss, improved_str))
print("Validation %f" % (valid_loss))
# If no improvement found in the required number of iterations.
if step - last_improvement > require_improvement:
print("No improvement found in a while, stopping optimization.")
# Break out from the for-loop.
break
# here is where you see how good of a Data Scientist you are
print("Accuracy on the Training Set:", accuracy.eval(train_data) )
print("Accuracy on the Validation Set:", accuracy.eval(valid_data) )
print("Accuracy on the Test Set:", accuracy.eval(test_data) )
# capture predictions on test data
test_results = sess.run(pred, feed_dict={x:test_x, keep_prob: 1.0})
df_final = pd.DataFrame({'test_x':test_x[:,0],
'pred':test_results[:,0]})
# capture training and validation loss
df_loss = pd.DataFrame(t, columns=['step', 'train_loss', 'valid_loss'])
Training loss at step 0: 23.402122 Validation 20.980297 Training loss at step 100: 0.187115 * Validation 0.143108 Training loss at step 200: 0.250679 Validation 0.077954 Training loss at step 300: 0.152195 * Validation 0.087788 Training loss at step 400: 0.225003 * Validation 0.100714 Training loss at step 500: 0.161775 Validation 0.093067 Training loss at step 600: 0.254302 Validation 0.102409 Training loss at step 700: 0.103872 * Validation 0.109399 Training loss at step 800: 0.274327 Validation 0.077394 Training loss at step 900: 0.148338 Validation 0.103547 Training loss at step 1000: 0.048606 Validation 0.067885 Training loss at step 1100: 0.096235 Validation 0.063774 Training loss at step 1200: 0.072514 Validation 0.064017 Training loss at step 1300: 0.211790 Validation 0.053174 Training loss at step 1400: 0.091291 Validation 0.044657 Training loss at step 1500: 0.081252 Validation 0.037878 Training loss at step 1600: 0.190677 Validation 0.030551 Training loss at step 1700: 0.025660 Validation 0.031784 Training loss at step 1800: 0.077246 Validation 0.021971 Training loss at step 1900: 0.196296 Validation 0.018666 Training loss at step 2000: 0.056428 Validation 0.020353 Training loss at step 2100: 0.043981 * Validation 0.018860 Training loss at step 2200: 0.043882 Validation 0.015705 Training loss at step 2300: 0.033422 Validation 0.010563 Training loss at step 2400: 0.018767 * Validation 0.009835 Training loss at step 2500: 0.040599 * Validation 0.006505 Training loss at step 2600: 0.053013 Validation 0.005265 Training loss at step 2700: 0.040391 Validation 0.004556 Training loss at step 2800: 0.091411 * Validation 0.005383 Training loss at step 2900: 0.104919 * Validation 0.004556 Training loss at step 3000: 0.031472 Validation 0.006872 Training loss at step 3100: 0.019333 Validation 0.004266 Training loss at step 3200: 0.098784 Validation 0.009901 Training loss at step 3300: 0.113221 Validation 0.005738 Training loss at step 3400: 0.050050 Validation 0.003017 Training loss at step 3500: 0.063836 Validation 0.002980 Training loss at step 3600: 0.037302 Validation 0.005684 Training loss at step 3700: 0.027242 Validation 0.003440 Training loss at step 3800: 0.045567 Validation 0.001505 Training loss at step 3900: 0.030083 Validation 0.001552 Training loss at step 4000: 0.019632 * Validation 0.001199 Training loss at step 4100: 0.015194 Validation 0.001870 Training loss at step 4200: 0.047915 Validation 0.002910 Training loss at step 4300: 0.021621 Validation 0.001246 Training loss at step 4400: 0.033902 Validation 0.003228 Training loss at step 4500: 0.014637 Validation 0.004395 Training loss at step 4600: 0.053357 Validation 0.001634 Training loss at step 4700: 0.039563 Validation 0.001245 Training loss at step 4800: 0.317875 Validation 0.007926 Training loss at step 4900: 0.060057 Validation 0.001299 Training loss at step 5000: 0.008178 Validation 0.004327 Training loss at step 5100: 0.038080 Validation 0.001113 Training loss at step 5200: 0.028573 Validation 0.002215 Training loss at step 5300: 0.019867 Validation 0.000795 Training loss at step 5400: 0.021241 Validation 0.005972 Training loss at step 5500: 0.010574 Validation 0.003042 Training loss at step 5600: 0.020798 Validation 0.004027 No improvement found in a while, stopping optimization. Accuracy on the Training Set: 0.9 Accuracy on the Validation Set: 0.953333 Accuracy on the Test Set: 0.98
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(15, 5))
# Chart 1 - Shows the line we are trying to model
df.plot.scatter(x='x', y='y', ax=axes, color='red')
# Chart 2 - Shows the line our trained model came up with
df_final.plot.scatter(x='test_x', y='pred', ax=axes, alpha=0.3)
# add a little sugar
axes.set_title('target vs pred', fontsize=20)
axes.set_ylabel('y', fontsize=15)
axes.set_xlabel('x', fontsize=15)
axes.legend(["target", "pred"], loc='best');
If the *valid_loss* is increasing and your *train_loss* is decreasing then you have a problem. Since you have implemented early stopping, your model will not over train and prevents this issue from getting out of control.
df_loss.set_index('step').plot(logy=True, figsize=(15,5));
Experiment with the batch size and the size of each layer.
This tutorial was created by HEDARO