Tensorflow

10 - Lesson

Overfitting - Part 2¶

In this lesson you are going to learn about *Drop Out* and how it helps the problem of overfitting.

What is Drop Out?

This is a technique that will help your model generalize and prevent overfitting. So how does it work? Well, all you are going to do is while training, you are going to literally *shoot down* a percentage of your Neurons. On every iteration you are going to shoot down *at random* a percentage of your neurons. This will prevent your model from getting too cozy with your data and ultimately help you overcome any overfitting issues.

Keep reading and you will see how easy it is to implement this technique.

Let's Get to Work!¶

In [1]:

# import libraries
import tensorflow as tf
import pandas as pd
import numpy as np
import sys
import datetime
import matplotlib.pyplot as plt
plt.style.use('ggplot') # use this plot style
%matplotlib inline

In [2]:

print('Python version ' + sys.version)
print('Tensorflow version ' + tf.VERSION)
print('Pandas version ' + pd.__version__)
print('Numpy version ' + np.__version__)

Python version 3.5.1 |Anaconda custom (64-bit)| (default, Feb 16 2016, 09:49:46) [MSC v.1900 64 bit (AMD64)]
Tensorflow version 0.12.0-rc0
Pandas version 0.19.0
Numpy version 1.11.0

Function to model¶

y = a * x^4 + b

TIP: Recommended percentages

Training - *70%*
Validation - *15%*
Test - *15%*

In [3]:

# Let's generate 1000 random samples
pool = np.random.rand(1000,1).astype(np.float32)

# Shuffle the samples
np.random.shuffle(pool)

# sample size of 15%
sample = int(1000 * 0.15)

# 15% test
test_x = pool[0:sample]

# 15% validation
valid_x = pool[sample:sample*2]

# 70% training
train_x = pool[sample*2:]

print('Testing data points: ' + str(test_x.shape))
print('Validation data points: ' + str(valid_x.shape))
print('Training data points: ' + str(train_x.shape))

# Let's compute the ouput using 2 for a, 5 for b
test_y = 2.0 * test_x**4 + 5
valid_y = 2.0 * valid_x**4 + 5
train_y = 2.0 * train_x**4 + 5

Testing data points: (150, 1)
Validation data points: (150, 1)
Training data points: (700, 1)

In [4]:

df = pd.DataFrame({'x':train_x[:,0],
                   'y':train_y[:,0]})
df.head()

Out[4]:

	x	y
0	0.277916	5.011931
1	0.850299	6.045484
2	0.744988	5.616065
3	0.750673	5.635088
4	0.639758	5.335038

In [5]:

df.describe()

Out[5]:

	x	y
count	700.000000	700.000000
mean	0.514211	5.412032
std	0.282943	0.530214
min	0.000243	5.000000
25%	0.268031	5.010322
50%	0.525482	5.152497
75%	0.759363	5.665013
max	0.999421	6.995372

In [6]:

df.plot.scatter(x='x', y='y', figsize=(15,5));

Helper Functions¶

Make a function that will help you create layers easily

In [7]:

def add_layer(inputs, in_size, out_size, activation_function=None):
    
    # tf.random_normal([what is the size of your batches, size of output layer])
    Weights = tf.Variable(tf.truncated_normal([in_size, out_size], mean=0.1, stddev=0.1))
    
    # tf.random_normal([size of output layer])
    biases = tf.Variable(tf.truncated_normal([out_size], mean=0.1, stddev=0.1))
    
    # shape of pred = [size of your batches, size of output layer]
    pred = tf.matmul(inputs, Weights) + biases

    if activation_function is None:
        outputs = pred
    else:
        outputs = activation_function(pred)
    return outputs

Model your Graph¶

Start to use W (for weight) and b (for bias) when setting up your variables. Aside from adding your ReLU activation function, it is a good idea to use Tensorflow's *matrix multiplication function (matmul)* as shown below.

The ? in the shape output just means it can be of any shape.

For the shape parameter, you can think of it like this...

shape = [how many data points do you have, how many features does each data point have]

For this lesson since we are doing a simple regression, we only have one feature (x). We use the *None* keyword so that we are not restricted on the number of samples to feed our model. This will become more important when you learn about training using batches on a future lesson.

In [8]:

# you can adjust the number of neurons in the hidden layers here
hidden_size = 10

# placeholders
# shape=[how many samples do you have, how many input neurons]
x = tf.placeholder(tf.float32, shape=[None, 1], name="01_x")
y = tf.placeholder(tf.float32, shape=[None, 1], name="01_y")

print("shape of x and y:")
print(x.get_shape(),y.get_shape())

shape of x and y:
(?, 1) (?, 1)

Add Your Drop Out Placeholder¶

We will be feeding in the percentage of neurons to keep on every epoch

In [9]:

# drop out
keep_prob = tf.placeholder(tf.float32)

Note that the input of one layer becomes the input of the next layer.

In [10]:

# create your hidden layers!
h1 = add_layer(x, 1, hidden_size, tf.nn.relu)

# here is where we shoot down some of the neurons
h1_drop = tf.nn.dropout(h1, keep_prob)

# add a second layer
h2 = add_layer(h1_drop, hidden_size, hidden_size, tf.nn.relu)
h2_drop = tf.nn.dropout(h2, keep_prob)

# add a third layer
h3 = add_layer(h2_drop, hidden_size, hidden_size, tf.nn.relu)
h3_drop = tf.nn.dropout(h3, keep_prob)

# add a fourth layer
h4 = add_layer(h3_drop, hidden_size, hidden_size, tf.nn.relu)
h4_drop = tf.nn.dropout(h4, keep_prob)

print("shape of hidden layers:")
print(h1_drop.get_shape(), h2_drop.get_shape(), h3_drop.get_shape(), h4_drop.get_shape())

shape of hidden layers:
(?, 10) (?, 10) (?, 10) (?, 10)

In [11]:

# Output Layers
pred = add_layer(h4_drop, hidden_size, 1)

print("shape of output layer:")
print(pred.get_shape())

shape of output layer:
(?, 1)

In [12]:

# minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y))

# pick optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

How Good is Your model?¶

Set up the following variables to calculate the accuracy rate of your model. You will do that shortly.

In [13]:

# check accuracy of model
correct_prediction = tf.equal(tf.round(pred), tf.round(y))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Set Up Your Early Stoppage Variables¶

Code borrowed from this great Tensorflow Jupyter Notebook.

In [14]:

# Best validation accuracy seen so far.
best_valid_acc = 0.0

# Iteration-number for last improvement to validation accuracy.
last_improvement = 0

# Stop optimization if no improvement found in this many iterations.
require_improvement = 1500

Training Time!¶

Notice how we pass in our Drop Out percentage below.

*keep_prob of 1.0* means we are not going to shoot down any of our neurons. This is handy since we would not want to do this to our Validation or Test sets.

In [15]:

# initialize the variables
init = tf.global_variables_initializer()

# hold step and error values
t = []

# Run your graph
with tf.Session() as sess:
    
    # initialize variables
    sess.run(init)

    # Fit the function.
    for step in range(6000):

        # get your data
        train_data = {x:train_x, y:train_y, keep_prob: 0.975}
        valid_data = {x:valid_x, y:valid_y, keep_prob: 1.0}
        test_data = {x:test_x, y:test_y, keep_prob: 1.0}
        
        # training in progress...
        train_loss, train_pred = sess.run([loss, train], feed_dict=train_data)        
        
        # print every n iterations
        if step%100==0:
           
            # capture the step and error for analysis
            valid_loss = sess.run(loss, feed_dict=valid_data) 
            t.append((step, train_loss, valid_loss))    
            
            # get snapshot of current training and validation accuracy       
            train_acc = accuracy.eval(train_data)
            valid_acc = accuracy.eval(valid_data)           

            # If validation accuracy is an improvement over best-known.
            if valid_acc > best_valid_acc:
                # Update the best-known validation accuracy.
                best_valid_acc = valid_acc
                
                # Set the iteration for the last improvement to current.
                last_improvement = step

                # Flag when ever an improvement is found
                improved_str = '*'
            else:
                # An empty string to be printed below.
                # Shows that no improvement was found.
                improved_str = ''   
                
            print("Training loss at step %d: %f %s" % (step, train_loss, improved_str))        
            print("Validation %f" % (valid_loss))            
                
            # If no improvement found in the required number of iterations.
            if step - last_improvement > require_improvement:
                print("No improvement found in a while, stopping optimization.")

                # Break out from the for-loop.
                break                
            
            
    # here is where you see how good of a Data Scientist you are        
    print("Accuracy on the Training Set:", accuracy.eval(train_data) )
    print("Accuracy on the Validation Set:", accuracy.eval(valid_data) ) 
    print("Accuracy on the Test Set:", accuracy.eval(test_data) )
    
    # capture predictions on test data 
    test_results = sess.run(pred, feed_dict={x:test_x, keep_prob: 1.0})  
    df_final = pd.DataFrame({'test_x':test_x[:,0],
                             'pred':test_results[:,0]})
    
    # capture training and validation loss
    df_loss = pd.DataFrame(t, columns=['step', 'train_loss', 'valid_loss'])

Training loss at step 0: 22.021866 
Validation 16.066334
Training loss at step 100: 0.226587 *
Validation 0.069365
Training loss at step 200: 0.230825 *
Validation 0.069963
Training loss at step 300: 0.214508 
Validation 0.068721
Training loss at step 400: 0.263160 
Validation 0.067941
Training loss at step 500: 0.216182 
Validation 0.068156
Training loss at step 600: 0.191902 
Validation 0.068224
Training loss at step 700: 0.199810 
Validation 0.068060
Training loss at step 800: 0.186926 
Validation 0.068247
Training loss at step 900: 0.174795 
Validation 0.068134
Training loss at step 1000: 0.174818 
Validation 0.068273
Training loss at step 1100: 0.165233 
Validation 0.068123
Training loss at step 1200: 0.160242 
Validation 0.064995
Training loss at step 1300: 0.139545 
Validation 0.059115
Training loss at step 1400: 0.161141 
Validation 0.056825
Training loss at step 1500: 0.140487 
Validation 0.055171
Training loss at step 1600: 0.126061 
Validation 0.050796
Training loss at step 1700: 0.122908 
Validation 0.042352
Training loss at step 1800: 0.098600 *
Validation 0.035082
Training loss at step 1900: 0.102557 *
Validation 0.026878
Training loss at step 2000: 0.088191 *
Validation 0.020376
Training loss at step 2100: 0.089116 *
Validation 0.015061
Training loss at step 2200: 0.070579 *
Validation 0.011380
Training loss at step 2300: 0.067470 *
Validation 0.008556
Training loss at step 2400: 0.066721 *
Validation 0.006749
Training loss at step 2500: 0.056150 *
Validation 0.005794
Training loss at step 2600: 0.071035 *
Validation 0.004586
Training loss at step 2700: 0.062030 *
Validation 0.003829
Training loss at step 2800: 0.055772 *
Validation 0.003224
Training loss at step 2900: 0.063225 
Validation 0.002700
Training loss at step 3000: 0.054979 
Validation 0.002494
Training loss at step 3100: 0.057061 
Validation 0.002265
Training loss at step 3200: 0.049995 *
Validation 0.001760
Training loss at step 3300: 0.049652 
Validation 0.001851
Training loss at step 3400: 0.042552 
Validation 0.001784
Training loss at step 3500: 0.051730 
Validation 0.001640
Training loss at step 3600: 0.045550 
Validation 0.001239
Training loss at step 3700: 0.053862 
Validation 0.001314
Training loss at step 3800: 0.042645 
Validation 0.001305
Training loss at step 3900: 0.047567 
Validation 0.000907
Training loss at step 4000: 0.040443 
Validation 0.001296
Training loss at step 4100: 0.042049 
Validation 0.001107
Training loss at step 4200: 0.046952 
Validation 0.000997
Training loss at step 4300: 0.040871 
Validation 0.001238
Training loss at step 4400: 0.039199 
Validation 0.001149
Training loss at step 4500: 0.043638 
Validation 0.000661
Training loss at step 4600: 0.031260 
Validation 0.001100
Training loss at step 4700: 0.044857 
Validation 0.000827
Training loss at step 4800: 0.044116 
Validation 0.000598
No improvement found in a while, stopping optimization.
Accuracy on the Training Set: 0.894286
Accuracy on the Validation Set: 1.0
Accuracy on the Test Set: 0.993333

In [16]:

fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(15, 5))

# Chart 1 - Shows the line we are trying to model
df.plot.scatter(x='x', y='y', ax=axes, color='red')

# Chart 2 - Shows the line our trained model came up with
df_final.plot.scatter(x='test_x', y='pred', ax=axes, alpha=0.3)

# add a little sugar
axes.set_title('target vs pred', fontsize=20)
axes.set_ylabel('y', fontsize=15)
axes.set_xlabel('x', fontsize=15)
axes.legend(["target", "pred"], loc='best');

Check for Overfitting¶

If the *valid_loss* is increasing and your *train_loss* is decreasing then you have a problem. Since you have implemented early stopping, your model will not over train and prevents this issue from getting out of control.

In [17]:

df_loss.set_index('step').plot(logy=True, figsize=(15,5));

Your Turn¶

I reduced the number of neurons but increased the number of hidden layers. Try different combinations of layers and neurons and see how your model behaves.

This tutorial was created by HEDARO