This notebook demonstrates:
# Import TensorFlow.
import tensorflow as tf
# Import TensorFlow eager execution support (subject to future changes).
import tensorflow.contrib.eager as tfe
# Enable eager execution.
tfe.enable_eager_execution()
To demonstrate fitting a model with TensorFlow's eager execution, we'll fit a linear model to some synthesized data (which includes some noise).
In the code, we use the variable names w
and b
to represent the single weight and bias we'll use to fit our model.
# The constants we'll try to fit our variables to:
true_w = 3
true_b = 2
NUM_EXAMPLES = 1000
# Our inputs:
inputs = tf.random_normal(shape=[NUM_EXAMPLES, 1])
# Our labels, with noise:
noise = tf.random_normal(shape=[NUM_EXAMPLES, 1])
labels = inputs * true_w + true_b + noise
# Plot the Data (Optional)
import matplotlib.pyplot as plt
plt.scatter(inputs.numpy(), labels.numpy())
plt.show()
We'll use Keras's object-oriented Dense
layer to create our variables. In this case, we'll create a Dense
layer with a single weight and bias.
(Note: We're using the implementation of Dense
found in tf.layers.Dense
though the documentation link is for tf.contrib.keras.layers.Dense
. When TensorFlow 1.4 is released, the documentation will also be in tf.layers.Dense
)
# Create TensorFlow Variables using Keras's Dense layer.
wb = tf.layers.Dense(units=1, use_bias=True)
# We can access the underlying TensorFlow variables using wb.variables.
# However, the variables won't exist until the dimensions of the input
# tensors are known. Once the dimensions of the input tensors are known,
# Keras can create and initialize the variables. Until then, Keras will
# report the variables as an empty list: [].
wb.variables
[]
Our loss function is the standard L2 loss (where we reduce the loss to its mean across its inputs).
def loss_fn(inputs, labels, wb):
"""Calculates the mean L2 loss for our linear model."""
predictions = wb(inputs)
return tf.reduce_mean(tf.square(predictions - labels))
# Test loss function (optional).
loss_fn(inputs, labels, wb)
<tf.Tensor: id=40, shape=(), dtype=float32, numpy=7.3549819>
# At this point, the variables exist, and can now be queried:
w, b = wb.variables
print("w: " + str(w.read_value()))
print("b: " + str(b.read_value()))
w: tf.Tensor([[ 1.56891453]], shape=(1, 1), dtype=float32) b: tf.Tensor([ 0.], shape=(1,), dtype=float32)
implicit_value_and_gradients()
¶With a loss function defined, we can calculate gradients and apply them to our variables to update them.
To calculate the gradients, we wrap our loss function using the implicit_value_and_gradients()
function.
implicit_value_and_gradients()
returns a function that accepts the same inputs as the function passed in, and returns a tuple consisting of:
loss_fn()
), andtf.Tensor
) with respect to a given variabletf.Variable
)Test it out below to get a feel for what it does. Notice how the first value of the returned tuple (the loss) is the same as the value returned in the cell above that tests our loss function.
# Produce our gradients function. See description above for details about
# the returned function's signature.
value_and_gradients_fn = tfe.implicit_value_and_gradients(loss_fn)
# Show outputs of value_and_gradients_fn.
print("Outputs of value_and_gradients_fn:")
value, grads_and_vars = value_and_gradients_fn(inputs, labels, wb)
print('Loss: {}'.format(value))
for (grad, var) in grads_and_vars:
print("")
print('Gradient: {}\nVariable: {}'.format(grad, var))
Outputs of value_and_gradients_fn: Loss: tf.Tensor(7.35498, shape=(), dtype=float32) Gradient: tf.Tensor([[-3.00773573]], shape=(1, 1), dtype=float32) Variable: <tf.Variable 'dense/kernel:0' shape=(1, 1) dtype=float32> Gradient: tf.Tensor([-4.06519032], shape=(1,), dtype=float32) Variable: <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32>
We'll use a GradientDescentOptimizer
to fit our model.
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
Now we have everything needed to start fitting our variables to the data!
In the next cell, we'll demo these capabilities. We'll:
w
and b
w
and b
You can run the cell multiple times. Each time, you should see the values of w
and b
get closer to their true values of 3 and 2.
# Test the optimizer.
print("Values of w, b, BEFORE applying gradients:")
w, b = wb.variables
print(w.read_value().numpy(), b.read_value().numpy())
print()
# Calculate the gradients:
empirical_loss, gradients_and_variables = value_and_gradients_fn(
inputs, labels, wb)
optimizer.apply_gradients(gradients_and_variables)
print("Values of w, b, AFTER applying gradients:")
print(w.read_value().numpy(), b.read_value().numpy())
Values of w, b, BEFORE applying gradients: (array([[ 1.56891453]], dtype=float32), array([ 0.], dtype=float32)) () Values of w, b, AFTER applying gradients: (array([[ 1.86968815]], dtype=float32), array([ 0.40651903], dtype=float32))
Of course, now we can simply turn all of this code into a self-standing training loop. We'll also capture our loss and approximations of w
and b
and plot them over time.
# Train our variables.
# numpy is used for its asscalar() function.
import numpy as np
num_training_steps = 10
def train_model(inputs, labels, wb, optimizer, num_training_steps):
loss_at_step = []
w_at_step = []
b_at_step = []
for step_num in range(num_training_steps):
loss, gradients_and_variables = value_and_gradients_fn(inputs, labels, wb)
loss_at_step.append(np.asscalar(loss.numpy()))
optimizer.apply_gradients(gradients_and_variables)
w, b = wb.variables
w_at_step.append(np.asscalar(w.read_value().numpy()))
b_at_step.append(np.asscalar(b.read_value().numpy()))
print(w_at_step)
t = range(0, num_training_steps)
plt.plot(t, loss_at_step, 'k',
t, w_at_step, 'r',
t, [true_w] * num_training_steps, 'r--',
t, b_at_step, 'b',
t, [true_b] * num_training_steps, 'b--')
plt.legend(['loss', 'w estimate', 'w true', 'b estimate', 'b true'])
plt.show()
train_model(inputs, labels, wb, optimizer, num_training_steps)
[2.111051321029663, 2.3047544956207275, 2.4602210521698, 2.5850086212158203, 2.6851789951324463, 2.7655951976776123, 2.830157995223999, 2.8819968700408936, 2.9236228466033936, 2.9570505619049072]
Using our loss function as an example (loss_fn()
), there are several other ways we could compute gradients:
tfe.implicit_gradients()
tfe.gradients_function()
tfe.implicit_value_and_gradients()
tfe.value_and_gradients_function()
Each of these functions does the following:
They differ only in what information they return.
The following two functions return a function that returns only the variables' gradients:
tfe.gradients_function()
: Returns the partial derivatives of the function f()
with respect to the parameters of f()
.tfe.implicit_gradients()
: Returns the partial derivatives of the function f()
with respect to the trainable parameters (tf.Variable
) used by f()
.In our example above, the tf.layers.Dense
object encapsulates the trainable parameters.
The following two functions are identical to their counterparts above, except that they also return the value of the wrapped function.
tfe.implicit_value_and_gradients()
tfe.value_and_gradients_function()
In the demos below, we show examples for the implicit_*
functions, since our existing loss function works seamlessly with these versions. (The other versions require that your parameters are tensors and tensors only; in our example, we're using a Dense
layer.)
# tfe.implicit_gradients() demo
gradients_fn = tfe.implicit_gradients(loss_fn)
# Returns only gradients and variables:
gradients_fn(inputs, labels, wb)
[(<tf.Tensor: id=673, shape=(1, 1), dtype=float32, numpy=array([[-0.26846504]], dtype=float32)>, <tf.Variable 'dense/kernel:0' shape=(1, 1) dtype=float32>), (<tf.Tensor: id=671, shape=(1,), dtype=float32, numpy=array([-0.32890949], dtype=float32)>, <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32>)]
# tfe.implicit_value_and_gradients() demo
value_gradients_fn = tfe.implicit_value_and_gradients(loss_fn)
# Returns the value returned by the function passed in, gradients, and variables:
value_gradients_fn(inputs, labels, wb)
(<tf.Tensor: id=688, shape=(), dtype=float32, numpy=1.0623235>, [(<tf.Tensor: id=720, shape=(1, 1), dtype=float32, numpy=array([[-0.26846504]], dtype=float32)>, <tf.Variable 'dense/kernel:0' shape=(1, 1) dtype=float32>), (<tf.Tensor: id=718, shape=(1,), dtype=float32, numpy=array([-0.32890949], dtype=float32)>, <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32>)])