The TensorFlow Estimator API

High-level API's are extremely important in all software development because they provide simple abstractions for doing very complicated tasks. This makes it easier to write and understand your source-code, and it lowers the risk of errors.

Previously, we saw how to use various builder API's for creating Neural Networks in TensorFlow. However, there was a lot of additional code required for training the models and using them on new data. The Estimator is another high-level API that implements most of this, although it can be debated how simple it really is.

Using the Estimator API consists of several steps:

  1. Define functions for inputting data to the Estimator.
  2. Either use an existing Estimator (e.g. a Deep Neural Network), which is also called a pre-made or Canned Estimator. Or create your own Estimator, in which case you also need to define the optimizer, performance metrics, etc.
  3. Train the Estimator using the training-set defined in step 1.
  4. Evaluate the performance of the Estimator on the test-set defined in step 1.
  5. Use the trained Estimator to make predictions on other data.

Imports

In [ ]:
%matplotlib inline
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

Load Data

The MNIST data-set is about 12 MB and will be downloaded automatically if it is not located in the given dir.

In [ ]:
from tensorflow.keras.datasets import mnist

# Fetch and format the mnist data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_train = y_train.astype('int32')
y_test = y_test.astype('int32')
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

Copy some of the data-dimensions for convenience.

In [ ]:
sample_image = x_train[0]

# The number of pixels in each dimension of an image.
img_size = sample_image.shape[0]

# The images are stored in one-dimensional arrays of this length.
img_size_flat = sample_image.ravel().shape[0]

# Tuple with height and width of images used to reshape arrays.
img_shape = sample_image.shape

# Number of classes, one class for each of 10 digits.
num_classes = len(set(y_train))

# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1

Helper-function for plotting images

Function used to plot 9 images in a 3x3 grid, and writing the true and predicted classes below each image.

In [ ]:
def plot_images(images, cls_true, cls_pred=None):
    assert len(images) == len(cls_true) == 9
    
    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3, 3)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(images[i].reshape(img_shape), cmap='binary')

        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true[i])
        else:
            xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)
        
        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])
    
    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()

Plot a few images to see if data is correct

In [ ]:
# Get the first images from the test-set.
images = x_test[0:9]

# Get the true classes for those images.
cls_true = y_test[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)

Input Functions for the Estimator

Rather than providing raw data directly to the Estimator, we must provide functions that return the data. This allows for more flexibility in data-sources and how the data is randomly shuffled and iterated.

Note that we will create an Estimator using the DNNClassifier which assumes the class-numbers are integers so we use data.y_train_cls instead of data.y_train which are one-hot encoded arrays.

The function also has parameters for batch_size, queue_capacity and num_threads for finer control of the data reading. In our case we take the data directly from a numpy array in memory, so it is not needed.

In [ ]:
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": x_train},
    y=y_train,
    num_epochs=None,
    shuffle=True)

This actually returns a function:

In [ ]:
train_input_fn

Calling this function returns a tuple with TensorFlow ops for returning the input and output data:

In [ ]:
train_input_fn()

Similarly we need to create a function for reading the data for the test-set. Note that we only want to process these images once so num_epochs=1 and we do not want the images shuffled so shuffle=False.

In [ ]:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": x_test},
    y=y_test,
    num_epochs=1,
    shuffle=False)

An input-function is also needed for predicting the class of new data. As an example we just use a few images from the test-set.

In [ ]:
some_images = x_test[0:9]
In [ ]:
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": some_images},
    num_epochs=1,
    shuffle=False)

The class-numbers are actually not used in the input-function as it is not needed for prediction. However, the true class-number is needed when we plot the images further below.

In [ ]:
some_images_cls = y_test[0:9]

Pre-Made Estimator

When using a pre-made Estimator, we need to specify the input features for the data. In this case we want to input images from our data-set which are numeric arrays of the given shape.

In [ ]:
feature_x = tf.feature_column.numeric_column("x", shape=img_shape)

You can have several input features which would then be combined in a list:

In [ ]:
feature_columns = [feature_x]

In this example we want to use a 3-layer DNN with 512, 256 and 128 units respectively.

In [ ]:
num_hidden_units = [512, 256, 128]

The DNNClassifier then constructs the neural network for us. We can also specify the activation function and various other parameters (see the docs). Here we just specify the number of classes and the directory where the checkpoints will be saved.

In [ ]:
model = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                   hidden_units=num_hidden_units,
                                   activation_fn=tf.nn.relu,
                                   n_classes=num_classes,
                                   model_dir="./checkpoints_tutorial17-1/")

Training

We can now train the model for a given number of iterations. This automatically loads and saves checkpoints so we can continue the training later.

Note that the text INFO:tensorflow: is printed on every line and makes it harder to quickly read the actual progress. It should have been printed on a single line instead.

In [ ]:
model.train(input_fn=train_input_fn, steps=2000)

Evaluation

Once the model has been trained, we can evaluate its performance on the test-set.

In [ ]:
result = model.evaluate(input_fn=test_input_fn)
In [ ]:
result
In [ ]:
print("Classification accuracy: {0:.2%}".format(result["accuracy"]))

Predictions

The trained model can also be used to make predictions on new data.

Note that the TensorFlow graph is recreated and the checkpoint is reloaded every time we make predictions on new data. If the model is very large then this could add a significant overhead.

It is unclear why the Estimator is designed this way, possibly because it will always use the latest checkpoint and it can also be distributed easily for use on multiple computers.

In [ ]:
predictions = model.predict(input_fn=predict_input_fn)
In [ ]:
cls = [p['classes'] for p in predictions]
In [ ]:
cls_pred = np.array(cls, dtype='int').squeeze()
cls_pred
In [ ]:
plot_images(images=some_images,
            cls_true=some_images_cls,
            cls_pred=cls_pred)

New Estimator

If you cannot use one of the built-in Estimators, then you can create an arbitrary TensorFlow model yourself. To do this, you first need to create a function which defines the following:

  1. The TensorFlow model, e.g. a Convolutional Neural Network.
  2. The output of the model.
  3. The loss-function used to improve the model during optimization.
  4. The optimization method.
  5. Performance metrics.

The Estimator can be run in three modes: Training, Evaluation, or Prediction. The code is mostly the same, but in Prediction-mode we do not need to setup the loss-function and optimizer.

This is another aspect of the Estimator API that is poorly designed and resembles how we did ANSI C programming using structs in the old days. It would probably have been more elegant to split this into several functions and sub-classed the Estimator-class.

In [ ]:
def model_fn(features, labels, mode, params):
    # Args:
    #
    # features: This is the x-arg from the input_fn.
    # labels:   This is the y-arg from the input_fn,
    #           see e.g. train_input_fn for these two.
    # mode:     Either TRAIN, EVAL, or PREDICT
    # params:   User-defined hyper-parameters, e.g. learning-rate.
    
    # Reference to the tensor named "x" in the input-function.
    x = features["x"]

    # The convolutional layers expect 4-rank tensors
    # but x is a 2-rank tensor, so reshape it.
    net = tf.reshape(x, [-1, img_size, img_size, num_channels])    

    # First convolutional layer.
    net = tf.layers.conv2d(inputs=net, name='layer_conv1',
                           filters=16, kernel_size=5,
                           padding='same', activation=tf.nn.relu)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)

    # Second convolutional layer.
    net = tf.layers.conv2d(inputs=net, name='layer_conv2',
                           filters=36, kernel_size=5,
                           padding='same', activation=tf.nn.relu)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)    

    # Flatten to a 2-rank tensor.
    net = tf.contrib.layers.flatten(net)
    # Eventually this should be replaced with:
    # net = tf.layers.flatten(net)

    # First fully-connected / dense layer.
    # This uses the ReLU activation function.
    net = tf.layers.dense(inputs=net, name='layer_fc1',
                          units=128, activation=tf.nn.relu)    

    # Second fully-connected / dense layer.
    # This is the last layer so it does not use an activation function.
    net = tf.layers.dense(inputs=net, name='layer_fc2',
                          units=10)

    # Logits output of the neural network.
    logits = net

    # Softmax output of the neural network.
    y_pred = tf.nn.softmax(logits=logits)
    
    # Classification output of the neural network.
    y_pred_cls = tf.argmax(y_pred, axis=1)

    if mode == tf.estimator.ModeKeys.PREDICT:
        # If the estimator is supposed to be in prediction-mode
        # then use the predicted class-number that is output by
        # the neural network. Optimization etc. is not needed.
        spec = tf.estimator.EstimatorSpec(mode=mode,
                                          predictions=y_pred_cls)
    else:
        # Otherwise the estimator is supposed to be in either
        # training or evaluation-mode. Note that the loss-function
        # is also required in Evaluation mode.
        
        # Define the loss-function to be optimized, by first
        # calculating the cross-entropy between the output of
        # the neural network and the true labels for the input data.
        # This gives the cross-entropy for each image in the batch.
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,
                                                                       logits=logits)

        # Reduce the cross-entropy batch-tensor to a single number
        # which can be used in optimization of the neural network.
        loss = tf.reduce_mean(cross_entropy)

        # Define the optimizer for improving the neural network.
        optimizer = tf.train.AdamOptimizer(learning_rate=params["learning_rate"])

        # Get the TensorFlow op for doing a single optimization step.
        train_op = optimizer.minimize(
            loss=loss, global_step=tf.train.get_global_step())

        # Define the evaluation metrics,
        # in this case the classification accuracy.
        metrics = \
        {
            "accuracy": tf.metrics.accuracy(labels, y_pred_cls)
        }

        # Wrap all of this in an EstimatorSpec.
        spec = tf.estimator.EstimatorSpec(
            mode=mode,
            loss=loss,
            train_op=train_op,
            eval_metric_ops=metrics)
        
    return spec

Create an Instance of the Estimator

We can specify hyper-parameters e.g. for the learning-rate of the optimizer.

In [ ]:
params = {"learning_rate": 1e-4}

We can then create an instance of the new Estimator.

Note that we don't provide feature-columns here as it is inferred automatically from the data-functions when model_fn() is called.

It is unclear from the TensorFlow documentation why it is necessary to specify the feature-columns when using DNNClassifier in the example above, when it is not needed here.

In [ ]:
model = tf.estimator.Estimator(model_fn=model_fn,
                               params=params,
                               model_dir="./checkpoints_tutorial17-2/")

Training

Now that our new Estimator has been created, we can train it.

In [ ]:
model.train(input_fn=train_input_fn, steps=1000)

Evaluation

Once the model has been trained, we can evaluate its performance on the test-set.

In [ ]:
result = model.evaluate(input_fn=test_input_fn)
In [ ]:
result
In [ ]:
print("Classification accuracy: {0:.2%}".format(result["accuracy"]))

Predictions

The model can also be used to make predictions on new data.

In [ ]:
predictions = model.predict(input_fn=predict_input_fn)
In [ ]:
cls_pred = np.array(list(predictions))
cls_pred
In [ ]:
plot_images(images=some_images,
            cls_true=some_images_cls,
            cls_pred=cls_pred)

License (MIT)

Copyright (c) 2016-2017 by Magnus Erik Hvass Pedersen

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.