Newton Method

Important: Please read the installation page for details about how to install the toolboxes. $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\CC}{\mathbb{C}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\Zz}{\mathcal{Z}}$ $\newcommand{\Ww}{\mathcal{W}}$ $\newcommand{\Vv}{\mathcal{V}}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\NN}{\mathcal{N}}$ $\newcommand{\Hh}{\mathcal{H}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\Ee}{\mathcal{E}}$ $\newcommand{\Cc}{\mathcal{C}}$ $\newcommand{\Gg}{\mathcal{G}}$ $\newcommand{\Ss}{\mathcal{S}}$ $\newcommand{\Pp}{\mathcal{P}}$ $\newcommand{\Ff}{\mathcal{F}}$ $\newcommand{\Xx}{\mathcal{X}}$ $\newcommand{\Mm}{\mathcal{M}}$ $\newcommand{\Ii}{\mathcal{I}}$ $\newcommand{\Dd}{\mathcal{D}}$ $\newcommand{\Ll}{\mathcal{L}}$ $\newcommand{\Tt}{\mathcal{T}}$ $\newcommand{\si}{\sigma}$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$

This tour explores the use of the Newton method for the unconstrained optimization of a smooth function

In [2]:

Newton Method for Unconstrained Problems

We test here Newton method for the minimization of a 2-D function.

We define a highly anisotropic function, the Rosenbrock function

$$ g(x) = (1-x_1)^2 + 100 (x_2-x_1^2)^2 $$

In [3]:
f = @(x1,x2)(1-x1).^2 + 100*(x2-x1.^2).^2;

The minimum of the function is reached at $x^\star=(1,1)$ and $f(x^\star)=0$.

Evaluate the function on a regular grid.

In [4]:
x1 = linspace(-2,2,150);
x2 = linspace(-.5,3,150);
[X2,X1] = meshgrid(x2,x1);
F = f(X1,X2);

3-D display.

In [5]:
surf(x2,x1, F, perform_hist_eq(F, 'linear') ); 
% shading interp;
% camlight;
% axis tight;
% colormap jet;

2-D display (histogram equalization helps to better visualize the iso-contours).

In [6]:
imageplot( perform_hist_eq(F, 'linear') );
colormap jet(256);

Gradient descent methods, that only use first order (gradient) information about $f$ are not able to efficiently minimize this function because of its high anisotropy.

Define the gradient of $f$ $$ \nabla g(x) = \pa{ \pd{g(x)}{x_1}, \pd{g(x)}{x_2} } = \pa{ 2 (x_1-1) + 400 x_1 (x_1^2-x_2), 200 (x_2-x_1^2) } \in \RR^2. $$

In [7]:
gradf = @(x1,x2)[2*(x1-1) + 400*x1.*(x1.^2-x2); 200*(x2-x1.^2)];
Gradf = @(x)gradf(x(1),x(2));

Compute its Hessian $$ Hf(x) = \begin{pmatrix}$ \frac{\partial^2 g(x)}{\partial x_1^2} & \frac{\partial^2 g(x)}{\partial x_1 \partial x_2} \ \frac{\partial^2 g(x)}{\partial x_1 \partial x_2} & \frac{\partial^2 g(x)}{\partial x_2^2}$

      2 + 400 (x_1^2-x_2) + 800 x_1^2 & -400 x_1 \\
      -400 x_1 & 200
  \end{pmatrix} \in \RR^{2 \times 2}$


In [8]:
hessf = @(x1,x2)[2 + 400*(x1.^2-x2) + 800*x1.^2, -400*x1; ...
                -400*x1,  200];
Hessf = @(x)hessf(x(1),x(2));

The Newton descent method starting from some $x^{(0)} \in \RR^2$, $$ x^{(\ell+1)} = x^{(\ell)} - Hf( x^{(\ell)} )^{-1} \nabla f(x^{(\ell)}). $$

Exercise 1

Implement the Newton algorithm. Display the evolution of $f(x^{(\ell)})$ and $\norm{x^{(\ell)}-x^{(+\infty)}}$ during the iterations. isplay

In [9]:
In [10]:
%% Insert your code here.

Exercise 2

Display the evolution of $x^{(\ell)}$, from several starting points.

In [11]:
In [12]:
%% Insert your code here.

Gradient and Divergence of Images

Local differential operators like gradient, divergence and laplacian are the building blocks for variational image processing.

Load an image $g \in \RR^N$ of $N=n \times n$ pixels.

In [13]:
n = 256;
g = rescale( load_image('lena',n) );

Display it.

In [14]:

For continuous functions, the gradient reads $$ \nabla g(x) = \pa{ \pd{g(x)}{x_1}, \pd{g(x)}{x_2} } \in \RR^2. $$ (note that here, the variable $x$ denotes the 2-D spacial position).

We discretize this differential operator using first order finite differences. $$ (\nabla g)_i = ( g_{i_1,i_2}-g_{i_1-1,i_2}, g_{i_1,i_2}-g_{i_1,i_2-1} ) \in \RR^2. $$ Note that for simplity we use periodic boundary conditions.

Compute its gradient, using finite differences.

In [15]:
s = [n 1:n-1];
grad = @(f)cat(3, f-f(s,:), f-f(:,s));

One thus has $ \nabla : \RR^N \mapsto \RR^{N \times 2}. $

In [16]:
v = grad(g);

One can display each of its components.

In [17]:
imageplot(v(:,:,1), 'd/dx', 1,2,1);
imageplot(v(:,:,2), 'd/dy', 1,2,2);

One can also display it using a color image.

In [18]:

One can display its magnitude $\norm{\nabla g(x)}$, which is large near edges.

In [19]:
imageplot( sqrt( sum3(v.^2,3) ) );

The divergence operator maps vector field to images. For continuous vector fields $v(x) \in \RR^2$, it is defined as $$ \text{div}(v)(x) = \pd{v_1(x)}{x_1} + \pd{v_2(x)}{x_2} \in \RR. $$ (note that here, the variable $x$ denotes the 2-D spacial position). It is minus the adjoint of the gadient, i.e. $\text{div} = - \nabla^*$.

It is discretized, for $v=(v^1,v^2)$ as $$ \text{div}(v)_i = v^1_{i_1+1,i_2} - v^1_{i_1,i_2} + v^2_{i_1,i_2+1} - v^2_{i_1,i_2} . $$

In [20]:
t = [2:n 1];
div = @(v)v(t,:,1)-v(:,:,1) + v(:,t,2)-v(:,:,2);

The Laplacian operatore is defined as $\Delta=\text{div} \circ \nabla = -\nabla^* \circ \nabla$. It is thus a negative symmetric operator.

In [21]:
delta = @(f)div(grad(f));

Display $\Delta f_0$.

In [22]:

Check that the relation $ \norm{\nabla f} = - \dotp{\Delta f}{f}. $

In [23]:
dotp = @(a,b)sum(a(:).*b(:));
fprintf('Should be 0: %.3i\n', dotp(grad(g), grad(g)) + dotp(delta(g),g) );
Should be 0: 000

Newton Method in Image Processing

We consider now the problem of denoising an image $y \in \RR^N$ where $N = n \times n$ is the number of pixels ($n$ being the number of rows/columns in the image).

Add noise to the clean image, to simulate a noisy image $y$.

In [24]:
sigma = .1;
y = g + randn(n)*sigma;

Display the noisy image $y$.

In [25]: