Eager Execution: Working with Gradients

This notebook demonstrates:

  • How to get gradients using TensorFlow's eager execution capabilities
  • How to apply the gradients so you can update your variables

Setup: Import eager and enable eager execution.

In [0]:
# Import TensorFlow.
import tensorflow as tf

# Import TensorFlow eager execution support (subject to future changes).
import tensorflow.contrib.eager as tfe

# Enable eager execution.
tfe.enable_eager_execution()

Fitting a Simple Linear Model

Step 1: Synthesize some data

To demonstrate fitting a model with TensorFlow's eager execution, we'll fit a linear model to some synthesized data (which includes some noise).

In the code, we use the variable names w and b to represent the single weight and bias we'll use to fit our model.

In [0]:
# The constants we'll try to fit our variables to:
true_w = 3
true_b = 2

NUM_EXAMPLES = 1000

# Our inputs:
inputs = tf.random_normal(shape=[NUM_EXAMPLES, 1])

# Our labels, with noise:
noise = tf.random_normal(shape=[NUM_EXAMPLES, 1])
labels = inputs * true_w + true_b + noise
In [3]:
# Plot the Data (Optional)

import matplotlib.pyplot as plt

plt.scatter(inputs.numpy(), labels.numpy())
plt.show()

Step 2: Define our TensorFlow variables

We'll use Keras's object-oriented Dense layer to create our variables. In this case, we'll create a Dense layer with a single weight and bias.

(Note: We're using the implementation of Dense found in tf.layers.Dense though the documentation link is for tf.contrib.keras.layers.Dense. When TensorFlow 1.4 is released, the documentation will also be in tf.layers.Dense)

In [4]:
# Create TensorFlow Variables using Keras's Dense layer.

wb = tf.layers.Dense(units=1, use_bias=True)

# We can access the underlying TensorFlow variables using wb.variables.
# However, the variables won't exist until the dimensions of the input
# tensors are known. Once the dimensions of the input tensors are known,
# Keras can create and initialize the variables. Until then, Keras will
# report the variables as an empty list: [].

wb.variables
Out[4]:
[]

Step 3: Define our loss function

Our loss function is the standard L2 loss (where we reduce the loss to its mean across its inputs).

In [0]:
def loss_fn(inputs, labels, wb):
  """Calculates the mean L2 loss for our linear model."""
  predictions = wb(inputs)
  return tf.reduce_mean(tf.square(predictions - labels))
In [6]:
# Test loss function (optional).

loss_fn(inputs, labels, wb)
Out[6]:
<tf.Tensor: id=40, shape=(), dtype=float32, numpy=7.3549819>
In [7]:
# At this point, the variables exist, and can now be queried:

w, b = wb.variables
print("w: " + str(w.read_value()))
print("b: " + str(b.read_value()))
w: tf.Tensor([[ 1.56891453]], shape=(1, 1), dtype=float32)
b: tf.Tensor([ 0.], shape=(1,), dtype=float32)

Step 4: Create our gradients function using implicit_value_and_gradients()

With a loss function defined, we can calculate gradients and apply them to our variables to update them.

To calculate the gradients, we wrap our loss function using the implicit_value_and_gradients() function.

implicit_value_and_gradients() returns a function that accepts the same inputs as the function passed in, and returns a tuple consisting of:

  1. the value returned by the function passed in (in this case, the loss calculated by loss_fn()), and
  2. a list of tuples consisting of:
    1. The value of the gradient (a tf.Tensor) with respect to a given variable
    2. The corresponding variable (tf.Variable)

Test it out below to get a feel for what it does. Notice how the first value of the returned tuple (the loss) is the same as the value returned in the cell above that tests our loss function.

In [0]:
# Produce our gradients function. See description above for details about
# the returned function's signature.

value_and_gradients_fn = tfe.implicit_value_and_gradients(loss_fn)
In [9]:
# Show outputs of value_and_gradients_fn.

print("Outputs of value_and_gradients_fn:")

value, grads_and_vars = value_and_gradients_fn(inputs, labels, wb)

print('Loss: {}'.format(value))
for (grad, var) in grads_and_vars:
  print("")
  print('Gradient: {}\nVariable: {}'.format(grad, var))
Outputs of value_and_gradients_fn:
Loss: tf.Tensor(7.35498, shape=(), dtype=float32)

Gradient: tf.Tensor([[-3.00773573]], shape=(1, 1), dtype=float32)
Variable: <tf.Variable 'dense/kernel:0' shape=(1, 1) dtype=float32>

Gradient: tf.Tensor([-4.06519032], shape=(1,), dtype=float32)
Variable: <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32>

Step 5: Create an optimizer

We'll use a GradientDescentOptimizer to fit our model.

In [0]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)

Step 5a: Test Our Optimizer

Now we have everything needed to start fitting our variables to the data!

In the next cell, we'll demo these capabilities. We'll:

  1. Print the current values of w and b
  2. Calculate the loss and gradients
  3. Apply the gradients
  4. Print out the new values of w and b

You can run the cell multiple times. Each time, you should see the values of w and b get closer to their true values of 3 and 2.

In [11]:
# Test the optimizer.

print("Values of w, b, BEFORE applying gradients:")
w, b = wb.variables
print(w.read_value().numpy(), b.read_value().numpy())
print()

# Calculate the gradients:
empirical_loss, gradients_and_variables = value_and_gradients_fn(
    inputs, labels, wb)
optimizer.apply_gradients(gradients_and_variables)

print("Values of w, b, AFTER applying gradients:")
print(w.read_value().numpy(), b.read_value().numpy())
Values of w, b, BEFORE applying gradients:
(array([[ 1.56891453]], dtype=float32), array([ 0.], dtype=float32))
()
Values of w, b, AFTER applying gradients:
(array([[ 1.86968815]], dtype=float32), array([ 0.40651903], dtype=float32))

Step 6: Create a training loop

Of course, now we can simply turn all of this code into a self-standing training loop. We'll also capture our loss and approximations of w and b and plot them over time.

In [12]:
# Train our variables.

# numpy is used for its asscalar() function.
import numpy as np

num_training_steps = 10

def train_model(inputs, labels, wb, optimizer, num_training_steps):
  loss_at_step = []
  w_at_step = []
  b_at_step = []
  for step_num in range(num_training_steps):
    loss, gradients_and_variables = value_and_gradients_fn(inputs, labels, wb)
    loss_at_step.append(np.asscalar(loss.numpy()))
    
    optimizer.apply_gradients(gradients_and_variables)
    w, b = wb.variables
    w_at_step.append(np.asscalar(w.read_value().numpy()))
    b_at_step.append(np.asscalar(b.read_value().numpy()))

  print(w_at_step)
  t = range(0, num_training_steps)
  plt.plot(t, loss_at_step, 'k',
           t, w_at_step, 'r',
           t, [true_w] * num_training_steps, 'r--',
           t, b_at_step, 'b',
           t, [true_b] * num_training_steps, 'b--')
  plt.legend(['loss', 'w estimate', 'w true', 'b estimate', 'b true'])
  plt.show()

train_model(inputs, labels, wb, optimizer, num_training_steps)
[2.111051321029663, 2.3047544956207275, 2.4602210521698, 2.5850086212158203, 2.6851789951324463, 2.7655951976776123, 2.830157995223999, 2.8819968700408936, 2.9236228466033936, 2.9570505619049072]

Other Ways to Compute Gradients

Using our loss function as an example (loss_fn()), there are several other ways we could compute gradients:

  1. tfe.implicit_gradients()
  2. tfe.gradients_function()
  3. tfe.implicit_value_and_gradients()
  4. tfe.value_and_gradients_function()

Each of these functions does the following:

  • Wraps a function.
  • Returns a function with the same input signature as the wrapped function.

They differ only in what information they return.

Gradients-only functions

The following two functions return a function that returns only the variables' gradients:

  1. tfe.gradients_function(): Returns the partial derivatives of the function f() with respect to the parameters of f().
  2. tfe.implicit_gradients(): Returns the partial derivatives of the function f() with respect to the trainable parameters (tf.Variable) used by f().

In our example above, the tf.layers.Dense object encapsulates the trainable parameters.

Value and gradients functions

The following two functions are identical to their counterparts above, except that they also return the value of the wrapped function.

  1. tfe.implicit_value_and_gradients()
  2. tfe.value_and_gradients_function()

Gradient demos

In the demos below, we show examples for the implicit_* functions, since our existing loss function works seamlessly with these versions. (The other versions require that your parameters are tensors and tensors only; in our example, we're using a Dense layer.)

In [13]:
# tfe.implicit_gradients() demo
gradients_fn = tfe.implicit_gradients(loss_fn)

# Returns only gradients and variables:
gradients_fn(inputs, labels, wb)
Out[13]:
[(<tf.Tensor: id=673, shape=(1, 1), dtype=float32, numpy=array([[-0.26846504]], dtype=float32)>,
  <tf.Variable 'dense/kernel:0' shape=(1, 1) dtype=float32>),
 (<tf.Tensor: id=671, shape=(1,), dtype=float32, numpy=array([-0.32890949], dtype=float32)>,
  <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32>)]
In [14]:
# tfe.implicit_value_and_gradients() demo
value_gradients_fn = tfe.implicit_value_and_gradients(loss_fn)

# Returns the value returned by the function passed in, gradients, and variables:
value_gradients_fn(inputs, labels, wb)
Out[14]:
(<tf.Tensor: id=688, shape=(), dtype=float32, numpy=1.0623235>,
 [(<tf.Tensor: id=720, shape=(1, 1), dtype=float32, numpy=array([[-0.26846504]], dtype=float32)>,
   <tf.Variable 'dense/kernel:0' shape=(1, 1) dtype=float32>),
  (<tf.Tensor: id=718, shape=(1,), dtype=float32, numpy=array([-0.32890949], dtype=float32)>,
   <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32>)])