
This tour explores the use of gradient descent method for unconstrained and constrained optimization of a smooth function

In [2]:
addpath('toolbox_signal')


## Gradient Descent for Unconstrained Problems¶

We consider the problem of finding a minimum of a function $f$, hence solving $$\umin{x \in \RR^d} f(x)$$ where $f : \RR^d \rightarrow \RR$ is a smooth function.

Note that the minimum is not necessarily unique. In the general case, $f$ might exhibit local minima, in which case the proposed algorithms is not expected to find a global minimizer of the problem. In this tour, we restrict our attention to convex function, so that the methods will converge to a global minimizer.

The simplest method is the gradient descent, that computes $$x^{(k+1)} = x^{(k)} - \tau_k \nabla f(x^{(k)}),$$ where $\tau_k>0$ is a step size, and $\nabla f(x) \in \RR^d$ is the gradient of $f$ at the point $x$, and $x^{(0)} \in \RR^d$ is any initial point.

In the convex case, if $f$ is of class $C^2$, in order to ensure convergence, the step size should satisfy $$0 < \tau_k < \frac{2}{ \sup_x \norm{Hf(x)} }$$ where $Hf(x) \in \RR^{d \times d}$ is the Hessian of $f$ at $x$ and $\norm{\cdot}$ is the spectral operator norm (largest eigenvalue).

We consider a simple problem, corresponding to the minimization of a 2-D quadratic form $$f(x) = \frac{1}{2} \pa{ x_1^2 + \eta x_2^2, }$$ where $\eta>0$ controls the anisotropy, and hence the difficulty, of the problem.

Anisotropy parameter $\eta$.

In [3]:
eta = 10;


Function $f$.

In [4]:
f = @(x)( x(1)^2 + eta*x(2)^2 ) /2;


Background image of the function.

In [5]:
t = linspace(-.7,.7,101);
[u,v] = meshgrid(t,t);
F = ( u.^2 + eta*v.^2 )/2 ;


Display the function as a 2-D image.

In [6]:
clf; hold on;
imagesc(t,t,F); colormap jet(256);
contour(t,t,F, 20, 'k');
axis off; axis equal;


In [7]:
Gradf = @(x)[x(1); eta*x(2)];


The step size should satisfy $\tau_k < 2/\eta$. We use here a constrant step size.

In [8]:
tau = 1.8/eta;


Exercise 1

Perform the gradient descent using a fixed step size $\tau_k=\tau$. Display the decay of the energy $f(x^{(k)})$ through the iteration. Save the iterates so that |X(:,k)| corresponds to $x^{(k)}$.

In [9]:
exo1()

In [10]:
%% Insert your code here.


Display the iterations.

In [11]:
clf; hold on;
imagesc(t,t,F); colormap jet(256);
contour(t,t,F, 20, 'k');
h = plot(X(1,:), X(2,:), 'k.-');
set(h, 'LineWidth', 2);
set(h, 'MarkerSize', 15);
axis off; axis equal;


Exercise 2

Display the iteration for several different step sizes.

In [12]:
exo2()

In [13]:
%% Insert your code here.


## Gradient and Divergence of Images¶

Local differential operators like gradient, divergence and laplacian are the building blocks for variational image processing.

Load an image $x_0 \in \RR^N$ of $N=n \times n$ pixels.

In [14]:
n = 256;


Display it.

In [15]:
clf;
imageplot(x0);


For a continuous function $g$, the gradient reads $$\nabla g(s) = \pa{ \pd{g(s)}{s_1}, \pd{g(s)}{s_2} } \in \RR^2.$$ (note that here, the variable $s$ denotes the 2-D spacial position).

We discretize this differential operator on a discrete image $x \in \RR^N$ using first order finite differences. $$(\nabla x)_i = ( x_{i_1,i_2}-x_{i_1-1,i_2}, x_{i_1,i_2}-x_{i_1,i_2-1} ) \in \RR^2.$$ Note that for simplity we use periodic boundary conditions.

Compute its gradient, using finite differences.

In [16]:
grad = @(x)cat(3, x-x([end 1:end-1],:), x-x(:,[end 1:end-1]));


One thus has $\nabla : \RR^N \mapsto \RR^{N \times 2}.$

In [17]:
v = grad(x0);


One can display each of its components.

In [18]:
clf;
imageplot(v(:,:,1), 'd/dx', 1,2,1);
imageplot(v(:,:,2), 'd/dy', 1,2,2);


One can also display it using a color image.

In [19]:
clf;
imageplot(v);


One can display its magnitude $\norm{(\nabla x)_i}$, which is large near edges.

In [20]:
clf;
imageplot( sqrt( sum3(v.^2,3) ) );


The divergence operator maps vector field to images. For continuous vector fields $v(s) \in \RR^2$, it is defined as $$\text{div}(v)(s) = \pd{v_1(s)}{s_1} + \pd{v_2(s)}{s_2} \in \RR.$$ (note that here, the variable $s$ denotes the 2-D spacial position). It is minus the adjoint of the gadient, i.e. $\text{div} = - \nabla^*$.

It is discretized, for $v=(v^1,v^2)$ as $$\text{div}(v)_i = v^1_{i_1+1,i_2} - v^1_{i_1,i_2} + v^2_{i_1,i_2+1} - v^2_{i_1,i_2} .$$

In [21]:
div = @(v)v([2:end 1],:,1)-v(:,:,1) + v(:,[2:end 1],2)-v(:,:,2);


The Laplacian operatore is defined as $\Delta=\text{div} \circ \nabla = -\nabla^* \circ \nabla$. It is thus a negative symmetric operator.

In [22]:
delta = @(x)div(grad(x));


Display $\Delta x_0$.

In [23]:
clf;
imageplot(delta(x0));


Check that the relation $\norm{\nabla x} = - \dotp{\Delta x}{x}.$

In [24]:
dotp = @(a,b)sum(a(:).*b(:));

Should be 0: 000


## Gradient Descent in Image Processing¶

We consider now the problem of denoising an image $y \in \RR^d$ where $d = n \times n$ is the number of pixels ($n$ being the number of rows/columns in the image).

Add noise to the original image, to simulate a noisy image.

In [25]:
sigma = .1;
y = x0 + randn(n)*sigma;


Display the noisy image $y$.

In [26]:
clf;
imageplot(clamp(y));