Gradient vs. Newton step¶

In [3]:

import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline

/Users/pavanramkumar/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

In [4]:

def simpleaxis(ax):
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.get_xaxis().tick_bottom()
    ax.get_yaxis().tick_left()
    ax.tick_params(labelsize=14)

Compare two functions: $y = x^2$ and $y = x^6$¶

In [38]:

x = np.linspace(-1, 1, 100)
y1 = x ** 2
y2 = x ** 6

plt.figure(figsize=(7,6))
ax = plt.subplot(111)
simpleaxis(ax)
plt.plot(x, y1, 'k', alpha=0.5, lw=2)
plt.plot(x, y2, 'g', alpha=0.5, lw=2)
plt.plot(-0.25, 0.25 ** 2, 'ko', alpha=0.5, lw=2)
plt.plot(-0.25, 0.25 ** 4, 'go', alpha=0.5, lw=2)
plt.ylim([-0.1, 0.4])
plt.legend(['$y = x^2$', '$y = x^6$'], frameon=False, fontsize=16, loc='upper center')
plt.show()

Let's consider $x_0 = -0.25$.

Gradient step¶

For $y = x^2$, the gradient $y'(x_0) = 2x_0 = -0.5$.

For $y = x^6$, the gradient $y'(x_0) = 6x_0^5 = -0.0002$.

For a learning rate of $\zeta = 0.1$, the corresponding steps are:

$0.05$ for $y = x^2$ and $0.00002$ for $y = x^6$.

Thus, even though $x_0$ is the same distance away from the minimum $x^* = 0$ in both functions, we take an extremely small step for $y = x^6$.

Newton step¶

Gradients¶

For $y = x^2$, the gradient $y'(x_0) = 2x_0 = -0.5$.

For $y = x^6$, the gradient $y'(x_0) = 6x_0^5 = -0.0002$.

Second derivatives¶

For $y = x^2$, the second derivative $y''(x_0) = 2$.

For $y = x^6$, the second derivative $y''(x_0) = 30x_0^4 = 0.117$.

The corresponding steps are:

$0.25$ for $y = x^2$ and $0.001$ for $y = x^6$.

Thus, the Newton step takes the local curvaturve into account. It proposes larger steps for regions that are relatively flat and smaller steps for regions where the curvature is high.