import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
/Users/pavanramkumar/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment. warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
def simpleaxis(ax):
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
ax.tick_params(labelsize=14)
x = np.linspace(-1, 1, 100)
y1 = x ** 2
y2 = x ** 6
plt.figure(figsize=(7,6))
ax = plt.subplot(111)
simpleaxis(ax)
plt.plot(x, y1, 'k', alpha=0.5, lw=2)
plt.plot(x, y2, 'g', alpha=0.5, lw=2)
plt.plot(-0.25, 0.25 ** 2, 'ko', alpha=0.5, lw=2)
plt.plot(-0.25, 0.25 ** 4, 'go', alpha=0.5, lw=2)
plt.ylim([-0.1, 0.4])
plt.legend(['$y = x^2$', '$y = x^6$'], frameon=False, fontsize=16, loc='upper center')
plt.show()
Let's consider $x_0 = -0.25$.
For $y = x^2$, the gradient $y'(x_0) = 2x_0 = -0.5$.
For $y = x^6$, the gradient $y'(x_0) = 6x_0^5 = -0.0002$.
For a learning rate of $\zeta = 0.1$, the corresponding steps are:
$0.05$ for $y = x^2$ and $0.00002$ for $y = x^6$.
Thus, even though $x_0$ is the same distance away from the minimum $x^* = 0$ in both functions, we take an extremely small step for $y = x^6$.
The corresponding steps are:
$0.25$ for $y = x^2$ and $0.001$ for $y = x^6$.
Thus, the Newton step takes the local curvaturve into account. It proposes larger steps for regions that are relatively flat and smaller steps for regions where the curvature is high.