Notebook

CS 20 : TensorFlow for Deep Learning Research¶

Lecture 04 : Eager execution¶

Automatic differentiation and gradient tape¶

Reference

https://www.tensorflow.org/tutorials/eager/automatic_differentiation?hl=ko

Setup¶

In [1]:

from __future__ import absolute_import, division, print_function
import numpy as np
import tensorflow as tf

tf.enable_eager_execution()

print(tf.__version__)

1.12.0

Gradient tapes¶

TensorFlow provides the tf.GradientTape API for automatic differentiation - *computing the gradient of a computation with respect to its input variables. Tensorflow "records" all operations executed inside the context of a tf.GradientTape onto a "tape". Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a "recorded" computation using reverse mode differentiation.*

In [2]:

# Trainable variables (created by `tf.Variable` or `tf.get_variable`, where
# `trainable=True` is default in both cases) are automatically watched. Tensors
# can be manually watched by invoking the `watch` method on this context
# manager.

x = tf.constant(1, dtype = tf.float32)

# z = y^2, y = 2x, z = (2x)^2
with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.add(x, x)
    z = tf.multiply(y, y)
    
# Derivative of z with respect to the original input tensor x
dz_dx = tape.gradient(target = z, sources = x)
print(dz_dx)

tf.Tensor(8.0, shape=(), dtype=float32)

You can also request gradients of the output with respect to intermediate values computed during a "recorded" tf.GradientTape context.

In [3]:

x = tf.constant(1, dtype = tf.float32)

# z = y^2, y = 2x, z = (2x)^2
with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.add(x, x)
    z = tf.multiply(y, y)
    
# Use the tape to compute the derivative of z with respect to the
# intermediate value y.
dz_dy = tape.gradient(target = z, sources = y)
print(dz_dy)

tf.Tensor(4.0, shape=(), dtype=float32)

By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method. as resources are released when the tape object is garbage collected. For example:

In [4]:

x = tf.constant(1, dtype = tf.float32)

# z = y^2, y = 2x, z = (2x)^2
with tf.GradientTape(persistent = True) as tape:
    tape.watch(x)
    y = tf.add(x, x)
    z = tf.multiply(y, y)
    
dz_dy = tape.gradient(target = z, sources = y)
dy_dx = tape.gradient(target = y, sources = x)
dz_dx = tape.gradient(target = z, sources = x)

print(dz_dy, dy_dx, dz_dx)

tf.Tensor(4.0, shape=(), dtype=float32) tf.Tensor(2.0, shape=(), dtype=float32) tf.Tensor(8.0, shape=(), dtype=float32)

Recording control flow¶

Because tapes record operations as they are executed, Python control flow (using ifs and whiles for example) is naturally handled:

In [5]:

def f(x, y):
    output = 1.0
    for i in range(y):
        if i > 1 and i < 5:
            output = tf.multiply(output, x)
    return output

def grad(x, y):
    with tf.GradientTape() as tape:
        tape.watch(x)
        out = f(x, y)
    return tape.gradient(out, x) 

x = tf.convert_to_tensor(2.0)

print(grad(x, 6)) # out = x^3
print(grad(x, 5)) # out = x^3
print(grad(x, 4)) # out = x^2

tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)

Higher-order gradients¶

Operations inside of the GradientTape context manager are recorded for automatic differentiation. If gradients are computed in that context, then the gradient computation is recorded as well. As a result, the exact same API works for higher-order gradients as well. For example:

In [6]:

x = tf.Variable(1.0)  # Create a Tensorflow variable initialized to 1.0

with tf.GradientTape() as t:
    with tf.GradientTape() as t2:
        y = x * x * x
    # Compute the gradient inside the 't' context manager
    # which means the gradient computation is differentiable as well.
    dy_dx = t2.gradient(y, x)
d2y_dx2 = t.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)