In this lesson you are going to learn about *Drop Out* and how it helps the problem of overfitting.
What is Drop Out?
This is a technique that will help your model generalize and prevent overfitting. So how does it work? Well, all you are going to do is while training, you are going to literally *shoot down* a percentage of your Neurons. On every iteration you are going to shoot down *at random* a percentage of your neurons. This will prevent your model from getting too cozy with your data and ultimately help you overcome any overfitting issues.
Keep reading and you will see how easy it is to implement this technique.
# import libraries
import tensorflow as tf
import pandas as pd
import numpy as np
import sys
import datetime
import matplotlib.pyplot as plt
plt.style.use('ggplot') # use this plot style
%matplotlib inline
print('Python version ' + sys.version)
print('Tensorflow version ' + tf.VERSION)
print('Pandas version ' + pd.__version__)
print('Numpy version ' + np.__version__)
Python version 3.5.1 |Anaconda custom (64-bit)| (default, Feb 16 2016, 09:49:46) [MSC v.1900 64 bit (AMD64)] Tensorflow version 0.12.0-rc0 Pandas version 0.19.0 Numpy version 1.11.0
y = a * x^4 + b
TIP: Recommended percentages
# Let's generate 1000 random samples
pool = np.random.rand(1000,1).astype(np.float32)
# Shuffle the samples
np.random.shuffle(pool)
# sample size of 15%
sample = int(1000 * 0.15)
# 15% test
test_x = pool[0:sample]
# 15% validation
valid_x = pool[sample:sample*2]
# 70% training
train_x = pool[sample*2:]
print('Testing data points: ' + str(test_x.shape))
print('Validation data points: ' + str(valid_x.shape))
print('Training data points: ' + str(train_x.shape))
# Let's compute the ouput using 2 for a, 5 for b
test_y = 2.0 * test_x**4 + 5
valid_y = 2.0 * valid_x**4 + 5
train_y = 2.0 * train_x**4 + 5
Testing data points: (150, 1) Validation data points: (150, 1) Training data points: (700, 1)
df = pd.DataFrame({'x':train_x[:,0],
'y':train_y[:,0]})
df.head()
x | y | |
---|---|---|
0 | 0.277916 | 5.011931 |
1 | 0.850299 | 6.045484 |
2 | 0.744988 | 5.616065 |
3 | 0.750673 | 5.635088 |
4 | 0.639758 | 5.335038 |
df.describe()
x | y | |
---|---|---|
count | 700.000000 | 700.000000 |
mean | 0.514211 | 5.412032 |
std | 0.282943 | 0.530214 |
min | 0.000243 | 5.000000 |
25% | 0.268031 | 5.010322 |
50% | 0.525482 | 5.152497 |
75% | 0.759363 | 5.665013 |
max | 0.999421 | 6.995372 |
df.plot.scatter(x='x', y='y', figsize=(15,5));
Make a function that will help you create layers easily
def add_layer(inputs, in_size, out_size, activation_function=None):
# tf.random_normal([what is the size of your batches, size of output layer])
Weights = tf.Variable(tf.truncated_normal([in_size, out_size], mean=0.1, stddev=0.1))
# tf.random_normal([size of output layer])
biases = tf.Variable(tf.truncated_normal([out_size], mean=0.1, stddev=0.1))
# shape of pred = [size of your batches, size of output layer]
pred = tf.matmul(inputs, Weights) + biases
if activation_function is None:
outputs = pred
else:
outputs = activation_function(pred)
return outputs
Start to use W (for weight) and b (for bias) when setting up your variables. Aside from adding your ReLU activation function, it is a good idea to use Tensorflow's *matrix multiplication function (matmul)* as shown below.
The ? in the shape output just means it can be of any shape.
For the shape parameter, you can think of it like this...
shape = [how many data points do you have, how many features does each data point have]
For this lesson since we are doing a simple regression, we only have one feature (x). We use the *None* keyword so that we are not restricted on the number of samples to feed our model. This will become more important when you learn about training using batches on a future lesson.
# you can adjust the number of neurons in the hidden layers here
hidden_size = 10
# placeholders
# shape=[how many samples do you have, how many input neurons]
x = tf.placeholder(tf.float32, shape=[None, 1], name="01_x")
y = tf.placeholder(tf.float32, shape=[None, 1], name="01_y")
print("shape of x and y:")
print(x.get_shape(),y.get_shape())
shape of x and y: (?, 1) (?, 1)
We will be feeding in the percentage of neurons to keep on every epoch
# drop out
keep_prob = tf.placeholder(tf.float32)
Note that the input of one layer becomes the input of the next layer.
# create your hidden layers!
h1 = add_layer(x, 1, hidden_size, tf.nn.relu)
# here is where we shoot down some of the neurons
h1_drop = tf.nn.dropout(h1, keep_prob)
# add a second layer
h2 = add_layer(h1_drop, hidden_size, hidden_size, tf.nn.relu)
h2_drop = tf.nn.dropout(h2, keep_prob)
# add a third layer
h3 = add_layer(h2_drop, hidden_size, hidden_size, tf.nn.relu)
h3_drop = tf.nn.dropout(h3, keep_prob)
# add a fourth layer
h4 = add_layer(h3_drop, hidden_size, hidden_size, tf.nn.relu)
h4_drop = tf.nn.dropout(h4, keep_prob)
print("shape of hidden layers:")
print(h1_drop.get_shape(), h2_drop.get_shape(), h3_drop.get_shape(), h4_drop.get_shape())
shape of hidden layers: (?, 10) (?, 10) (?, 10) (?, 10)
# Output Layers
pred = add_layer(h4_drop, hidden_size, 1)
print("shape of output layer:")
print(pred.get_shape())
shape of output layer: (?, 1)
# minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y))
# pick optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
Set up the following variables to calculate the accuracy rate of your model. You will do that shortly.
# check accuracy of model
correct_prediction = tf.equal(tf.round(pred), tf.round(y))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Code borrowed from this great Tensorflow Jupyter Notebook.
# Best validation accuracy seen so far.
best_valid_acc = 0.0
# Iteration-number for last improvement to validation accuracy.
last_improvement = 0
# Stop optimization if no improvement found in this many iterations.
require_improvement = 1500
Notice how we pass in our Drop Out percentage below.
# initialize the variables
init = tf.global_variables_initializer()
# hold step and error values
t = []
# Run your graph
with tf.Session() as sess:
# initialize variables
sess.run(init)
# Fit the function.
for step in range(6000):
# get your data
train_data = {x:train_x, y:train_y, keep_prob: 0.975}
valid_data = {x:valid_x, y:valid_y, keep_prob: 1.0}
test_data = {x:test_x, y:test_y, keep_prob: 1.0}
# training in progress...
train_loss, train_pred = sess.run([loss, train], feed_dict=train_data)
# print every n iterations
if step%100==0:
# capture the step and error for analysis
valid_loss = sess.run(loss, feed_dict=valid_data)
t.append((step, train_loss, valid_loss))
# get snapshot of current training and validation accuracy
train_acc = accuracy.eval(train_data)
valid_acc = accuracy.eval(valid_data)
# If validation accuracy is an improvement over best-known.
if valid_acc > best_valid_acc:
# Update the best-known validation accuracy.
best_valid_acc = valid_acc
# Set the iteration for the last improvement to current.
last_improvement = step
# Flag when ever an improvement is found
improved_str = '*'
else:
# An empty string to be printed below.
# Shows that no improvement was found.
improved_str = ''
print("Training loss at step %d: %f %s" % (step, train_loss, improved_str))
print("Validation %f" % (valid_loss))
# If no improvement found in the required number of iterations.
if step - last_improvement > require_improvement:
print("No improvement found in a while, stopping optimization.")
# Break out from the for-loop.
break
# here is where you see how good of a Data Scientist you are
print("Accuracy on the Training Set:", accuracy.eval(train_data) )
print("Accuracy on the Validation Set:", accuracy.eval(valid_data) )
print("Accuracy on the Test Set:", accuracy.eval(test_data) )
# capture predictions on test data
test_results = sess.run(pred, feed_dict={x:test_x, keep_prob: 1.0})
df_final = pd.DataFrame({'test_x':test_x[:,0],
'pred':test_results[:,0]})
# capture training and validation loss
df_loss = pd.DataFrame(t, columns=['step', 'train_loss', 'valid_loss'])
Training loss at step 0: 22.021866 Validation 16.066334 Training loss at step 100: 0.226587 * Validation 0.069365 Training loss at step 200: 0.230825 * Validation 0.069963 Training loss at step 300: 0.214508 Validation 0.068721 Training loss at step 400: 0.263160 Validation 0.067941 Training loss at step 500: 0.216182 Validation 0.068156 Training loss at step 600: 0.191902 Validation 0.068224 Training loss at step 700: 0.199810 Validation 0.068060 Training loss at step 800: 0.186926 Validation 0.068247 Training loss at step 900: 0.174795 Validation 0.068134 Training loss at step 1000: 0.174818 Validation 0.068273 Training loss at step 1100: 0.165233 Validation 0.068123 Training loss at step 1200: 0.160242 Validation 0.064995 Training loss at step 1300: 0.139545 Validation 0.059115 Training loss at step 1400: 0.161141 Validation 0.056825 Training loss at step 1500: 0.140487 Validation 0.055171 Training loss at step 1600: 0.126061 Validation 0.050796 Training loss at step 1700: 0.122908 Validation 0.042352 Training loss at step 1800: 0.098600 * Validation 0.035082 Training loss at step 1900: 0.102557 * Validation 0.026878 Training loss at step 2000: 0.088191 * Validation 0.020376 Training loss at step 2100: 0.089116 * Validation 0.015061 Training loss at step 2200: 0.070579 * Validation 0.011380 Training loss at step 2300: 0.067470 * Validation 0.008556 Training loss at step 2400: 0.066721 * Validation 0.006749 Training loss at step 2500: 0.056150 * Validation 0.005794 Training loss at step 2600: 0.071035 * Validation 0.004586 Training loss at step 2700: 0.062030 * Validation 0.003829 Training loss at step 2800: 0.055772 * Validation 0.003224 Training loss at step 2900: 0.063225 Validation 0.002700 Training loss at step 3000: 0.054979 Validation 0.002494 Training loss at step 3100: 0.057061 Validation 0.002265 Training loss at step 3200: 0.049995 * Validation 0.001760 Training loss at step 3300: 0.049652 Validation 0.001851 Training loss at step 3400: 0.042552 Validation 0.001784 Training loss at step 3500: 0.051730 Validation 0.001640 Training loss at step 3600: 0.045550 Validation 0.001239 Training loss at step 3700: 0.053862 Validation 0.001314 Training loss at step 3800: 0.042645 Validation 0.001305 Training loss at step 3900: 0.047567 Validation 0.000907 Training loss at step 4000: 0.040443 Validation 0.001296 Training loss at step 4100: 0.042049 Validation 0.001107 Training loss at step 4200: 0.046952 Validation 0.000997 Training loss at step 4300: 0.040871 Validation 0.001238 Training loss at step 4400: 0.039199 Validation 0.001149 Training loss at step 4500: 0.043638 Validation 0.000661 Training loss at step 4600: 0.031260 Validation 0.001100 Training loss at step 4700: 0.044857 Validation 0.000827 Training loss at step 4800: 0.044116 Validation 0.000598 No improvement found in a while, stopping optimization. Accuracy on the Training Set: 0.894286 Accuracy on the Validation Set: 1.0 Accuracy on the Test Set: 0.993333
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(15, 5))
# Chart 1 - Shows the line we are trying to model
df.plot.scatter(x='x', y='y', ax=axes, color='red')
# Chart 2 - Shows the line our trained model came up with
df_final.plot.scatter(x='test_x', y='pred', ax=axes, alpha=0.3)
# add a little sugar
axes.set_title('target vs pred', fontsize=20)
axes.set_ylabel('y', fontsize=15)
axes.set_xlabel('x', fontsize=15)
axes.legend(["target", "pred"], loc='best');
If the *valid_loss* is increasing and your *train_loss* is decreasing then you have a problem. Since you have implemented early stopping, your model will not over train and prevents this issue from getting out of control.
df_loss.set_index('step').plot(logy=True, figsize=(15,5));
I reduced the number of neurons but increased the number of hidden layers. Try different combinations of layers and neurons and see how your model behaves.
This tutorial was created by HEDARO