# The TensorFlow Estimator API¶

High-level API's are extremely important in all software development because they provide simple abstractions for doing very complicated tasks. This makes it easier to write and understand your source-code, and it lowers the risk of errors.

Previously, we saw how to use various builder API's for creating Neural Networks in TensorFlow. However, there was a lot of additional code required for training the models and using them on new data. The Estimator is another high-level API that implements most of this, although it can be debated how simple it really is.

Using the Estimator API consists of several steps:

1. Define functions for inputting data to the Estimator.
2. Either use an existing Estimator (e.g. a Deep Neural Network), which is also called a pre-made or Canned Estimator. Or create your own Estimator, in which case you also need to define the optimizer, performance metrics, etc.
3. Train the Estimator using the training-set defined in step 1.
4. Evaluate the performance of the Estimator on the test-set defined in step 1.
5. Use the trained Estimator to make predictions on other data.

## Imports¶

In [ ]:
%matplotlib inline
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np


The MNIST data-set is about 12 MB and will be downloaded automatically if it is not located in the given dir.

In [ ]:
from tensorflow.keras.datasets import mnist

# Fetch and format the mnist data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_train = y_train.astype('int32')
y_test = y_test.astype('int32')
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')


Copy some of the data-dimensions for convenience.

In [ ]:
sample_image = x_train[0]

# The number of pixels in each dimension of an image.
img_size = sample_image.shape[0]

# The images are stored in one-dimensional arrays of this length.
img_size_flat = sample_image.ravel().shape[0]

# Tuple with height and width of images used to reshape arrays.
img_shape = sample_image.shape

# Number of classes, one class for each of 10 digits.
num_classes = len(set(y_train))

# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1


### Helper-function for plotting images¶

Function used to plot 9 images in a 3x3 grid, and writing the true and predicted classes below each image.

In [ ]:
def plot_images(images, cls_true, cls_pred=None):
assert len(images) == len(cls_true) == 9

# Create figure with 3x3 sub-plots.
fig, axes = plt.subplots(3, 3)

for i, ax in enumerate(axes.flat):
# Plot image.
ax.imshow(images[i].reshape(img_shape), cmap='binary')

# Show true and predicted classes.
if cls_pred is None:
xlabel = "True: {0}".format(cls_true[i])
else:
xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])

# Show the classes as the label on the x-axis.
ax.set_xlabel(xlabel)

# Remove ticks from the plot.
ax.set_xticks([])
ax.set_yticks([])

# Ensure the plot is shown correctly with multiple plots
# in a single Notebook cell.
plt.show()


### Plot a few images to see if data is correct¶

In [ ]:
# Get the first images from the test-set.
images = x_test[0:9]

# Get the true classes for those images.
cls_true = y_test[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)


## Input Functions for the Estimator¶

Rather than providing raw data directly to the Estimator, we must provide functions that return the data. This allows for more flexibility in data-sources and how the data is randomly shuffled and iterated.

Note that we will create an Estimator using the DNNClassifier which assumes the class-numbers are integers so we use data.y_train_cls instead of data.y_train which are one-hot encoded arrays.

The function also has parameters for batch_size, queue_capacity and num_threads for finer control of the data reading. In our case we take the data directly from a numpy array in memory, so it is not needed.

In [ ]:
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": x_train},
y=y_train,
num_epochs=None,
shuffle=True)


This actually returns a function:

In [ ]:
train_input_fn


Calling this function returns a tuple with TensorFlow ops for returning the input and output data:

In [ ]:
train_input_fn()


Similarly we need to create a function for reading the data for the test-set. Note that we only want to process these images once so num_epochs=1 and we do not want the images shuffled so shuffle=False.

In [ ]:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": x_test},
y=y_test,
num_epochs=1,
shuffle=False)


An input-function is also needed for predicting the class of new data. As an example we just use a few images from the test-set.

In [ ]:
some_images = x_test[0:9]

In [ ]:
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": some_images},
num_epochs=1,
shuffle=False)


The class-numbers are actually not used in the input-function as it is not needed for prediction. However, the true class-number is needed when we plot the images further below.

In [ ]:
some_images_cls = y_test[0:9]


When using a pre-made Estimator, we need to specify the input features for the data. In this case we want to input images from our data-set which are numeric arrays of the given shape.

In [ ]:
feature_x = tf.feature_column.numeric_column("x", shape=img_shape)


You can have several input features which would then be combined in a list:

In [ ]:
feature_columns = [feature_x]


In this example we want to use a 3-layer DNN with 512, 256 and 128 units respectively.

In [ ]:
num_hidden_units = [512, 256, 128]


The DNNClassifier then constructs the neural network for us. We can also specify the activation function and various other parameters (see the docs). Here we just specify the number of classes and the directory where the checkpoints will be saved.

In [ ]:
model = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=num_hidden_units,
activation_fn=tf.nn.relu,
n_classes=num_classes,
model_dir="./checkpoints_tutorial17-1/")


### Training¶

We can now train the model for a given number of iterations. This automatically loads and saves checkpoints so we can continue the training later.

Note that the text INFO:tensorflow: is printed on every line and makes it harder to quickly read the actual progress. It should have been printed on a single line instead.

In [ ]:
model.train(input_fn=train_input_fn, steps=2000)


### Evaluation¶

Once the model has been trained, we can evaluate its performance on the test-set.

In [ ]:
result = model.evaluate(input_fn=test_input_fn)

In [ ]:
result

In [ ]:
print("Classification accuracy: {0:.2%}".format(result["accuracy"]))


### Predictions¶

The trained model can also be used to make predictions on new data.

Note that the TensorFlow graph is recreated and the checkpoint is reloaded every time we make predictions on new data. If the model is very large then this could add a significant overhead.

It is unclear why the Estimator is designed this way, possibly because it will always use the latest checkpoint and it can also be distributed easily for use on multiple computers.

In [ ]:
predictions = model.predict(input_fn=predict_input_fn)

In [ ]:
cls = [p['classes'] for p in predictions]

In [ ]:
cls_pred = np.array(cls, dtype='int').squeeze()
cls_pred

In [ ]:
plot_images(images=some_images,
cls_true=some_images_cls,
cls_pred=cls_pred)


# New Estimator¶

If you cannot use one of the built-in Estimators, then you can create an arbitrary TensorFlow model yourself. To do this, you first need to create a function which defines the following:

1. The TensorFlow model, e.g. a Convolutional Neural Network.
2. The output of the model.
3. The loss-function used to improve the model during optimization.
4. The optimization method.
5. Performance metrics.

The Estimator can be run in three modes: Training, Evaluation, or Prediction. The code is mostly the same, but in Prediction-mode we do not need to setup the loss-function and optimizer.

This is another aspect of the Estimator API that is poorly designed and resembles how we did ANSI C programming using structs in the old days. It would probably have been more elegant to split this into several functions and sub-classed the Estimator-class.

In [ ]:
def model_fn(features, labels, mode, params):
# Args:
#
# features: This is the x-arg from the input_fn.
# labels:   This is the y-arg from the input_fn,
#           see e.g. train_input_fn for these two.
# mode:     Either TRAIN, EVAL, or PREDICT
# params:   User-defined hyper-parameters, e.g. learning-rate.

# Reference to the tensor named "x" in the input-function.
x = features["x"]

# The convolutional layers expect 4-rank tensors
# but x is a 2-rank tensor, so reshape it.
net = tf.reshape(x, [-1, img_size, img_size, num_channels])

# First convolutional layer.
net = tf.layers.conv2d(inputs=net, name='layer_conv1',
filters=16, kernel_size=5,
net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)

# Second convolutional layer.
net = tf.layers.conv2d(inputs=net, name='layer_conv2',
filters=36, kernel_size=5,
net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)

# Flatten to a 2-rank tensor.
net = tf.contrib.layers.flatten(net)
# Eventually this should be replaced with:
# net = tf.layers.flatten(net)

# First fully-connected / dense layer.
# This uses the ReLU activation function.
net = tf.layers.dense(inputs=net, name='layer_fc1',
units=128, activation=tf.nn.relu)

# Second fully-connected / dense layer.
# This is the last layer so it does not use an activation function.
net = tf.layers.dense(inputs=net, name='layer_fc2',
units=10)

# Logits output of the neural network.
logits = net

# Softmax output of the neural network.
y_pred = tf.nn.softmax(logits=logits)

# Classification output of the neural network.
y_pred_cls = tf.argmax(y_pred, axis=1)

if mode == tf.estimator.ModeKeys.PREDICT:
# If the estimator is supposed to be in prediction-mode
# then use the predicted class-number that is output by
# the neural network. Optimization etc. is not needed.
spec = tf.estimator.EstimatorSpec(mode=mode,
predictions=y_pred_cls)
else:
# Otherwise the estimator is supposed to be in either
# training or evaluation-mode. Note that the loss-function
# is also required in Evaluation mode.

# Define the loss-function to be optimized, by first
# calculating the cross-entropy between the output of
# the neural network and the true labels for the input data.
# This gives the cross-entropy for each image in the batch.
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,
logits=logits)

# Reduce the cross-entropy batch-tensor to a single number
# which can be used in optimization of the neural network.
loss = tf.reduce_mean(cross_entropy)

# Define the optimizer for improving the neural network.

# Get the TensorFlow op for doing a single optimization step.
train_op = optimizer.minimize(
loss=loss, global_step=tf.train.get_global_step())

# Define the evaluation metrics,
# in this case the classification accuracy.
metrics = \
{
"accuracy": tf.metrics.accuracy(labels, y_pred_cls)
}

# Wrap all of this in an EstimatorSpec.
spec = tf.estimator.EstimatorSpec(
mode=mode,
loss=loss,
train_op=train_op,
eval_metric_ops=metrics)

return spec


### Create an Instance of the Estimator¶

We can specify hyper-parameters e.g. for the learning-rate of the optimizer.

In [ ]:
params = {"learning_rate": 1e-4}


We can then create an instance of the new Estimator.

Note that we don't provide feature-columns here as it is inferred automatically from the data-functions when model_fn() is called.

It is unclear from the TensorFlow documentation why it is necessary to specify the feature-columns when using DNNClassifier in the example above, when it is not needed here.

In [ ]:
model = tf.estimator.Estimator(model_fn=model_fn,
params=params,
model_dir="./checkpoints_tutorial17-2/")


### Training¶

Now that our new Estimator has been created, we can train it.

In [ ]:
model.train(input_fn=train_input_fn, steps=1000)


### Evaluation¶

Once the model has been trained, we can evaluate its performance on the test-set.

In [ ]:
result = model.evaluate(input_fn=test_input_fn)

In [ ]:
result

In [ ]:
print("Classification accuracy: {0:.2%}".format(result["accuracy"]))


### Predictions¶

The model can also be used to make predictions on new data.

In [ ]:
predictions = model.predict(input_fn=predict_input_fn)

In [ ]:
cls_pred = np.array(list(predictions))
cls_pred

In [ ]:
plot_images(images=some_images,
cls_true=some_images_cls,
cls_pred=cls_pred)