The architecture of TensorFlow is designed to make it easy to fit machine learning models across a variety of architectures, automatically optimizing how computations are allocated to the available resources.

The TensorFlow user is responsible for specifying their model in the Python client layer. When this is done well, it results in a very performant model. However, as we have seen constructing a static model is not particularly intuitive (and certainly not *Pythonic*). In order to help alleviate this, some changes to the API have been implemented in newer versions of TensorFlow. Most important among them are:

- Eager execution model
- Integration of the Keras library

We will introduce both of these here.

TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without building graphs. Operations return concrete values instead of constructing a computational graph to run later. This better aligns users’ expectations about the programming model better with TensorFlow, making it easier to learn and apply.

Eager execution is a flexible machine learning platform for research and experimentation, providing:

*An intuitive interface*—Structure your code naturally and use Python data structures. Quickly iterate on small models and small data.*Easier debugging*—Call ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting.*Natural control flow*—Use Python control flow instead of graph control flow, simplifying the specification of dynamic models.

The tradeoff inherent in eager execution is that models run with increased overhead, typically resulting in slower performance (though this is continually being improved).

In an interactive computing environment like Jupyter, eager execution must be specified before TensorFlow is used. This is done by calling `tf.enable_eager_execution()`

near the top of the notebook.

In [ ]:

```
import tensorflow as tf
tf.enable_eager_execution()
```

Enabling eager execution changes how TensorFlow operations behave—now they
immediately evaluate and return their values to Python. `tf.Tensor`

objects
reference concrete values instead of symbolic handles to nodes in a computational
graph. Since there isn't a computational graph to build and run later in a
session, it's easy to inspect results using `print()`

or a debugger. Evaluating,
printing, and checking tensor values does not break the flow for computing
gradients.

In [ ]:

```
import numpy as np
np.random.seed(42)
x = np.random.random((2,2))
m = tf.matmul(x, x)
print("matrix multiplication result:, {}".format(m))
```

In [ ]:

```
a = tf.constant([[1, 2],
[3, 4]])
# Broadcasting support
b = tf.add(a, 1)
print(b)
```

In [ ]:

```
# Operator overloading is supported
print(a * b)
```

Eager execution works nicely with NumPy. NumPy
operations accept `tf.Tensor`

arguments. TensorFlow
math operations convert
Python objects and NumPy arrays to `tf.Tensor`

objects. The
`tf.Tensor.numpy`

method returns the object's value as a NumPy `ndarray`

.

In [ ]:

```
a.numpy()
```

The `tf.contrib.eager`

module contains symbols available to both eager and graph execution
environments and is useful for writing code to work with graphs:

In [ ]:

```
tfe = tf.contrib.eager
```

A major benefit of eager execution is that all the functionality of the host
language is available while your model is executing. For example, we can use Python's control flow statements like `for`

loops or conditionals:

In [ ]:

```
def fibonacci(n):
n = tf.convert_to_tensor(n)
if n < 2:
return n
a, b = tf.constant(0), tf.constant(1)
for i in range(n.numpy()+1):
a, b = b, a + b
return b
```

In [ ]:

```
print(fibonacci(2))
```

Automatic differentiation
is useful for implementing machine learning algorithms such as
backpropagation for training
neural networks. During eager execution, use `tf.GradientTape`

to trace
operations for computing gradients later.

`tf.GradientTape`

is an opt-in feature to provide maximal performance when
not tracing. Since different operations can occur during each call, all
forward-pass operations get recorded to a "tape". To compute the gradient, play
the tape backwards and then discard. A particular `tf.GradientTape`

can only
compute one gradient; subsequent calls throw a runtime error.

In [ ]:

```
w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
loss = w * w
tape.gradient(loss, w)
```

In much the same way that PyMC3 allows Bayesian models to be specified in Theano at a high level, Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow. Keras is a modular, extensible library that allows for easy construction of deep learning models. It includes classes for building either convolutional networks and recurrent networks, and supports CPU and GPU computation.

Keras is used for fast prototyping, advanced research, and production, with three key advantages:

**User friendly**: Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.**Modular and composable**: Keras models are made by connecting configurable building blocks together, with few restrictions.**Easy to extend**: Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

Keras was recently integrated into the TensorFlow project, so it does not have to be separately downloaded, but is available as a sub-module.

In [ ]:

```
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='ticks')
```

In [ ]:

```
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
```

To learn how deep neural networks are constructed in Keras, we will use a famous benchmarking dataset, MNIST. The MNIST database of handwritten digits, which includes a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST.

The original black and white images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field. This results in a vector of 784 values for each image.

In [ ]:

```
from tensorflow.keras.datasets import mnist
# Fetch and format the mnist data
(mnist_images, mnist_labels), _ = mnist.load_data()
```

In [ ]:

```
plt.imshow(mnist_images[1].reshape(28,28), cmap='gray');
```

We can convert the raw data for use in Keras.

A more elegant way to feed data into your model (compared to feeding NumPy arrays into a session) is to set up an **input pipeline**, using the TensorFlow Dataset API. A Dataset can be used to represent an input pipeline as a nested structures of tensors and an associated set of transformations that act on those tensors.

This is what it looks like. Consider some arbitrary input data, in the form of a NumPy array:

In [ ]:

```
fake_data = np.random.normal((100, 5))
```

The `from_tensor_slices`

function creates a `Dataset`

whose elements are slices of the given tensors:

In [ ]:

```
a_dataset = tf.data.Dataset.from_tensor_slices(fake_data)
```

The `make_one_shot_iterator`

creates an `Iterator`

for enumerating the elements of this dataset. As we will see, this facilitates mini-batch processing.

In [ ]:

```
data = a_dataset.make_one_shot_iterator().get_next()
data
```

In the case of our MNIST image data, we first flatten the image data, convert it to floats, and scale before feeding it into a `Dataset`

.

In [ ]:

```
dataset = tf.data.Dataset.from_tensor_slices(
(tf.cast(mnist_images.reshape(mnist_images.shape[0], 784)/255, tf.float32),
tf.cast(mnist_labels,tf.int64)))
```

Finally, the dataset is shuffled and configured to batch-update.

In [ ]:

```
dataset = dataset.shuffle(1000).batch(32)
```

The simplest model class in Keras is the `Sequential`

class object. It allows networks to be constructed layer by layer, beginning with the data input and terminating with an output layer. Only the input layer requires explicit dimensions to be passed (via the keyword argument `input_dim`

); the rest are inferred based on the size of the layer.

Between layers, we also define an **activation** function for the outputs from the previous layer.

Here is a simple network with two hidden layers. The output layer will be of size 10, corresponding the the number of classes in the dataset.

In [ ]:

```
mnist_model = Sequential()
mnist_model.add(Dense(512, input_shape=(784,)))
mnist_model.add(Activation('relu'))
mnist_model.add(Dense(512))
mnist_model.add(Activation('relu'))
mnist_model.add(Dense(10))
mnist_model.add(Activation('softmax'))
mnist_model.summary()
```

The following example creates a multi-layer model that classifies the standard MNIST handwritten digits. It demonstrates the optimizer and layer APIs to build trainable graphs in an eager execution environment.

Activations can either be used through an `Activation`

layer, as we have done here, or through the `activation`

argument supported by all forward layers:

```
model.add(Dense(64))
model.add(Activation('tanh'))
```

This is equivalent to:

```
model.add(Dense(64, activation='tanh'))
```

Thus, a more concise way of specifying the same model is:

In [ ]:

```
mnist_model = Sequential([
Dense(512, activation='relu', input_shape=(784,)),
Dense(512, activation='relu'),
Dense(10, activation='softmax')
])
mnist_model.summary()
```

For the hidden layers, we have used a **rectified linear unit (RELU)**. This is the simple function:

This activation has beens shown to perform well in the training of deep neural networks for supervised learning. It is a sparse activation, and has efficient gradient propagation.

We use the **softmax** activation for the output layer because, like the logistic, it transforms inputs to the unit interval.

There are many tf.keras.layers available with some common constructor parameters:

`activation`

: Set the activation function for the layer. This parameter is specified by the name of a built-in function or as a callable object. By default, no activation is applied.`kernel_initializer`

and`bias_initializer`

: The initialization schemes that create the layer's weights (kernel and bias). This parameter is a name or a callable object. This defaults to the "Glorot uniform" initializer.`kernel_regularizer`

and`bias_regularizer`

: The regularization schemes that apply the layer's weights (kernel and bias), such as L1 or L2 regularization. By default, no regularization is applied.

The following instantiates `tf.keras.layers.Dense`

layers using constructor arguments:

In [ ]:

```
from tensorflow.keras import regularizers, initializers
# Create a sigmoid layer:
Dense(64, activation='sigmoid')
# Or:
Dense(64, activation=tf.sigmoid)
# A linear layer with L1 regularization of factor 0.01 applied to the kernel matrix:
Dense(64, kernel_regularizer=regularizers.l1(0.01))
# A linear layer with L2 regularization of factor 0.01 applied to the bias vector:
Dense(64, bias_regularizer=regularizers.l2(0.01))
# A linear layer with a kernel initialized to a random orthogonal matrix:
Dense(64, kernel_initializer='orthogonal')
# A linear layer with a bias vector initialized to 2.0s:
Dense(64, bias_initializer=initializers.constant(2.0))
```

Fitting the model first requires a compilation step, for which we specify three arguments:

- an
`optimizer`

. This could be the string identifier of an existing optimizer (such as`rmsprop`

or`adagrad`

), or an instance of the`Optimizer`

class. See: optimizers. - a
`loss`

function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as`categorical_crossentropy`

or`mse`

), or it can be an objective function. See: loss functions. - a list of
`metrics`

. For any classification problem you will want to set this to`metrics=['accuracy']`

. A metric could be the string identifier of an existing metric (only`accuracy`

is supported at this point), or a custom metric function. See: metrics.

Here, we will use the `sparse_softmax_cross_entropy`

loss function, which computes sparse softmax cross entropy between logits and labels by measuring the probability error in discrete classification tasks in which the classes are mutually exclusive.

Even without training, call the model and inspect the output in eager execution:

In [ ]:

```
for images,labels in dataset.take(1):
print("Logits: ", mnist_model(images[0:1]).numpy())
```

While keras models have a builtin training loop (using the `fit`

method), sometimes you need more customization. Here's an example, of a training loop implemented with eager:

In [ ]:

```
optimizer = tf.train.AdamOptimizer()
loss_history = []
```

In [ ]:

```
for (batch, (images, labels)) in enumerate(dataset.take(400)):
if batch % 80 == 0:
print()
print('.', end='')
with tf.GradientTape() as tape:
logits = mnist_model(images, training=True)
loss_value = tf.losses.sparse_softmax_cross_entropy(labels, logits)
loss_history.append(loss_value.numpy())
grads = tape.gradient(loss_value, mnist_model.variables)
optimizer.apply_gradients(zip(grads, mnist_model.variables),
global_step=tf.train.get_or_create_global_step())
```

In [ ]:

```
import matplotlib.pyplot as plt
plt.plot(loss_history)
plt.xlabel('Batch #')
plt.ylabel('Loss [entropy]')
```

Many machine learning models are represented by composing layers. When
using TensorFlow with eager execution you can either write your own layers or
use a layer provided in the `tf.keras.layers`

package.

As we have seen, when composing layers into models you can use `tf.keras.Sequential`

to represent
models which are a linear stack of layers. It is easy to use for basic models:

In [ ]:

```
model = Sequential([
Dense(512, activation='relu', input_shape=(784,)),
Dense(512, activation='relu'),
Dense(10, activation='softmax')
])
```

Alternatively, organize models in classes by inheriting from `tf.keras.Model`

.
This is a container for layers that is a layer itself, allowing `tf.keras.Model`

objects to contain other `tf.keras.Model`

objects.

In [ ]:

```
class MNISTModel(tf.keras.Model):
def __init__(self):
super(MNISTModel, self).__init__()
self.dense1 = tf.keras.layers.Dense(units=512, activation='relu')
self.dense2 = tf.keras.layers.Dense(units=512, activation='relu')
self.dense_out = tf.keras.layers.Dense(units=10, activation='softmax')
def call(self, input):
"""Run the model."""
result = self.dense1(input)
result = self.dense2(result)
result = self.dense_out(result)
return result
model = MNISTModel()
```

It's not required to set an input shape for the `tf.keras.Model`

class since
the parameters are set the first time input is passed to the layer.

`tf.keras.layers`

classes create and contain their own model variables that
are tied to the lifetime of their layer objects. To share layer variables, share
their objects.

In [ ]:

```
optimizer = tf.train.AdamOptimizer()
loss_history = []
for (batch, (images, labels)) in enumerate(dataset.take(400)):
if batch % 80 == 0:
print()
print('.', end='')
with tf.GradientTape() as tape:
logits = model(images)
loss_value = tf.losses.sparse_softmax_cross_entropy(labels, logits)
loss_history.append(loss_value.numpy())
grads = tape.gradient(loss_value, model.variables)
optimizer.apply_gradients(zip(grads, model.variables),
global_step=tf.train.get_or_create_global_step())
```

Since `tf.gradients`

does not work under eager execution, the `tf.GradientTape`

class, which records operations within in its context manager, and constructs a computation graph from them. They are then applied with `apply_gradients`

for backpropagation.

In [ ]:

```
plt.plot(loss_history)
plt.xlabel('Batch #')
plt.ylabel('Loss [entropy]')
```

Recall the iris morphometric dataset, which includes measurements from three species:

- Iris setosa
- Iris virginica
- Iris versicolor

Figure 1. Iris setosa (by Radomil, CC BY-SA 3.0), Iris versicolor, (by Dlanglois, CC BY-SA 3.0), and Iris virginica (by Frank Mayfield, CC BY-SA 2.0). |

Let's create a custom neural network classifier using Keras in eager execution mode.

Download the dataset file using the tf.keras.utils.get_file function. This returns the file path of the downloaded file.

In [ ]:

```
train_dataset_url = "http://download.tensorflow.org/data/iris_training.csv"
train_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_dataset_url),
origin=train_dataset_url)
print("Local copy of the dataset file: {}".format(train_dataset_fp))
```

In [ ]:

```
train_dataset_fp
```

In [ ]:

```
# column order in CSV file
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
feature_names = column_names[:-1]
label_name = column_names[-1]
print("Features: {}".format(feature_names))
print("Label: {}".format(label_name))
```

In [ ]:

```
class_names = ['Iris setosa', 'Iris versicolor', 'Iris virginica']
```

`Dataset`

¶Since the dataset is a CSV-formatted text file, we can use the `make_csv_dataset`

function to parse the data into a `Dataset`

. Since this function generates data for training models, the default behavior is to shuffle the data (`shuffle=True, shuffle_buffer_size=10000`

), and repeat the dataset forever (`num_epochs=None`

). We can also set the `batch_size`

parameter.

In [ ]:

```
batch_size = 32
train_dataset = tf.data.experimental.make_csv_dataset(
train_dataset_fp,
batch_size,
column_names=column_names,
label_name=label_name,
num_epochs=1)
```

The `make_csv_dataset`

function returns a `tf.data.Dataset`

of `(features, label)`

pairs, where `features`

is a dictionary: `{'feature_name': value}`

With eager execution enabled, these `Dataset`

objects are iterable. Let's look at a batch of features:

In [ ]:

```
features, labels = next(iter(train_dataset))
features
```

To simplify the model building step, create a function to repackage the features dictionary into a single array with shape: `(batch_size, num_features)`

.

This function uses the `tf.stack`

function which takes values from a list of tensors and creates a combined tensor at the specified dimension.

In [ ]:

```
def pack_features_vector(features, labels):
"""Pack the features into a single array."""
features = tf.stack(list(features.values()), axis=1)
return features, labels
```

Then use the `Dataset.map`

method to pack the `features`

of each `(features,label)`

pair into the training dataset:

In [ ]:

```
train_dataset = train_dataset.map(pack_features_vector)
```

The features element of the `Dataset`

are now arrays with shape `(batch_size, num_features)`

. Let's look at the first few examples:

In [ ]:

```
features, labels = next(iter(train_dataset))
print(features[:5])
```

We will construct a simple network of two `Dense`

layers with 10 nodes each, and an output layer with 3 nodes representing our label predictions. The first layer's `input_shape`

parameter corresponds to the number of features from the dataset, and is required.

In [ ]:

```
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(4,)), # input shape required
tf.keras.layers.Dense(10, activation=tf.nn.relu),
tf.keras.layers.Dense(3)
])
```

Let's have a quick look at what this model does to a batch of features:

In [ ]:

```
predictions = model(features)
predictions[:5]
```

Here, each example returns a logit for each class.

The `softmax`

function transforms these logits to a probability for each class.

In [ ]:

```
tf.nn.softmax(predictions[:5])
```

Taking the `tf.argmax`

across classes gives us the predicted class index. But, the model hasn't been trained yet, so these aren't good predictions.

In [ ]:

```
print("Prediction: {}".format(tf.argmax(predictions, axis=1)))
print(" Labels: {}".format(labels))
```

We will use the `categorical_crossentropy`

loss function, which takes the model's class probability predictions and the desired label, and returns the average loss across the examples.

In [ ]:

```
def loss(model, x, y):
y_ = model(x)
return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)
l = loss(model, features, labels)
print("Loss test: {}".format(l))
```

Since we are operating in eager mode, we will create a `GradientTape`

context to apply the loss function:

In [ ]:

```
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss(model, inputs, targets)
return loss_value, tape.gradient(loss_value, model.trainable_variables)
```

We will use the `GradientDescentOptimizer`

that implements the *stochastic gradient descent* (SGD) algorithm. The `learning_rate`

sets the step size to take for each iteration down the hill.

Let's setup the optimizer and the `global_step`

counter:

In [ ]:

```
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
global_step = tf.train.get_or_create_global_step()
```

We'll use this to calculate a single optimization step:

In [ ]:

```
loss_value, grads = grad(model, features, labels)
print("Step: {}, Initial Loss: {}".format(global_step.numpy(),
loss_value.numpy()))
optimizer.apply_gradients(zip(grads, model.variables), global_step)
print("Step: {}, Loss: {}".format(global_step.numpy(),
loss(model, features, labels).numpy()))
```

With all the pieces in place, the model is ready for training! A training loop feeds the dataset examples into the model to help it make better predictions. The following code block sets up these training steps:

- Iterate each
*epoch*. An epoch is one pass through the dataset. - Within an epoch, iterate over each example in the training
`Dataset`

grabbing its*features*(`x`

) and*label*(`y`

). - Using the example's features, make a prediction and compare it with the label. Measure the inaccuracy of the prediction and use that to calculate the model's loss and gradients.
- Use an
`optimizer`

to update the model's variables. - Keep track of some stats for visualization.
- Repeat for each epoch.

The `num_epochs`

variable is the number of times to loop over the dataset collection. Counter-intuitively, training a model longer does not guarantee a better model. Choosing the right number usually requires both experience and experimentation.

In [ ]:

```
# keep results for plotting
train_loss_results = []
train_accuracy_results = []
num_epochs = 201
for epoch in range(num_epochs):
epoch_loss_avg = tfe.metrics.Mean()
epoch_accuracy = tfe.metrics.Accuracy()
# Training loop - using batches of 32
for x, y in train_dataset:
# Optimize the model
loss_value, grads = grad(model, x, y)
optimizer.apply_gradients(zip(grads, model.variables),
global_step)
# Track progress
epoch_loss_avg(loss_value) # add current batch loss
# compare predicted label to actual label
epoch_accuracy(tf.argmax(model(x), axis=1, output_type=tf.int32), y)
# end epoch
train_loss_results.append(epoch_loss_avg.result())
train_accuracy_results.append(epoch_accuracy.result())
if epoch % 50 == 0:
print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(epoch,
epoch_loss_avg.result(),
epoch_accuracy.result()))
```

While it's helpful to print out the model's training progress, it's often *more* helpful to see this progress. We want to ensure that the *loss* go down and the *accuracy* go up.

In [ ]:

```
fig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))
fig.suptitle('Training Metrics')
axes[0].set_ylabel("Loss", fontsize=14)
axes[0].plot(train_loss_results)
axes[1].set_ylabel("Accuracy", fontsize=14)
axes[1].set_xlabel("Epoch", fontsize=14)
axes[1].plot(train_accuracy_results);
```

The setup for the test `Dataset`

is similar to the setup for training `Dataset`

. Download the CSV text file and parse that values, then give it a little shuffle:

In [ ]:

```
test_url = "http://download.tensorflow.org/data/iris_test.csv"
test_fp = tf.keras.utils.get_file(fname=os.path.basename(test_url),
origin=test_url)
```

In [ ]:

```
test_dataset = tf.contrib.data.make_csv_dataset(
test_fp,
batch_size,
column_names=column_names,
label_name='species',
num_epochs=1,
shuffle=False)
test_dataset = test_dataset.map(pack_features_vector)
```

Unlike the training stage, the model only evaluates a single epoch of the test data. In the following code cell, we iterate over each example in the test set and compare the model's prediction against the actual label.

In [ ]:

```
test_accuracy = tfe.metrics.Accuracy()
for (x, y) in test_dataset:
logits = model(x)
prediction = tf.argmax(logits, axis=1, output_type=tf.int32)
test_accuracy(prediction, y)
print("Test set accuracy: {:.3%}".format(test_accuracy.result()))
```

We can see on the last batch, for example, the model is usually correct:

In [ ]:

```
tf.stack([y,prediction],axis=1)
```

Build a multi-layer neural network to predict wine varietals using the wine chemistry dataset.

In [ ]:

```
import pandas as pd
wine = pd.read_table("../data/wine.dat", sep='\s+')
attributes = ['Alcohol',
'Malic acid',
'Ash',
'Alcalinity of ash',
'Magnesium',
'Total phenols',
'Flavanoids',
'Nonflavanoid phenols',
'Proanthocyanins',
'Color intensity',
'Hue',
'OD280/OD315 of diluted wines',
'Proline']
grape = wine.pop('region')
y = grape.values-1
X = wine.values
```

In [ ]:

```
# Write your answer here
```