#!/usr/bin/env python # coding: utf-8 # Forward-Backward Splitting # =================================== # # $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ # $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ # $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ # $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ # $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ # $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ # $\newcommand{\uargmin}[1]{\underset{#1}{\arg\min}\;}$ # $\newcommand{\norm}[1]{\|#1\|}$ # $\newcommand{\abs}[1]{\left|#1\right|}$ # $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ # $\newcommand{\pa}[1]{\left(#1\right)}$ # $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ # $\newcommand{\qandq}{\quad\text{and}\quad}$ # $\newcommand{\qwhereq}{\quad\text{where}\quad}$ # $\newcommand{\qifq}{ \quad \text{if} \quad }$ # $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ # $\newcommand{\ZZ}{\mathbb{Z}}$ # $\newcommand{\CC}{\mathbb{C}}$ # $\newcommand{\RR}{\mathbb{R}}$ # $\newcommand{\EE}{\mathbb{E}}$ # $\newcommand{\Zz}{\mathcal{Z}}$ # $\newcommand{\Ww}{\mathcal{W}}$ # $\newcommand{\Vv}{\mathcal{V}}$ # $\newcommand{\Nn}{\mathcal{N}}$ # $\newcommand{\NN}{\mathcal{N}}$ # $\newcommand{\Hh}{\mathcal{H}}$ # $\newcommand{\Bb}{\mathcal{B}}$ # $\newcommand{\Ee}{\mathcal{E}}$ # $\newcommand{\Cc}{\mathcal{C}}$ # $\newcommand{\Gg}{\mathcal{G}}$ # $\newcommand{\Ss}{\mathcal{S}}$ # $\newcommand{\Pp}{\mathcal{P}}$ # $\newcommand{\Ff}{\mathcal{F}}$ # $\newcommand{\Xx}{\mathcal{X}}$ # $\newcommand{\Mm}{\mathcal{M}}$ # $\newcommand{\Ii}{\mathcal{I}}$ # $\newcommand{\Dd}{\mathcal{D}}$ # $\newcommand{\Ll}{\mathcal{L}}$ # $\newcommand{\Tt}{\mathcal{T}}$ # $\newcommand{\si}{\sigma}$ # $\newcommand{\al}{\alpha}$ # $\newcommand{\la}{\lambda}$ # $\newcommand{\ga}{\gamma}$ # $\newcommand{\Ga}{\Gamma}$ # $\newcommand{\La}{\Lambda}$ # $\newcommand{\si}{\sigma}$ # $\newcommand{\Si}{\Sigma}$ # $\newcommand{\be}{\beta}$ # $\newcommand{\de}{\delta}$ # $\newcommand{\De}{\Delta}$ # $\newcommand{\phi}{\varphi}$ # $\newcommand{\th}{\theta}$ # $\newcommand{\om}{\omega}$ # $\newcommand{\Om}{\Omega}$ # # In[1]: from __future__ import division get_ipython().run_line_magic('pylab', 'inline') get_ipython().run_line_magic('load_ext', 'autoreload') get_ipython().run_line_magic('autoreload', '2') # This numerical tour presents the Forward-Backward (FB) algorithm to # minimize the sum of a smooth and a simple function. It shows an # application to sparse deconvolution. # Forward-Backward Algorithm # -------------------------- # We consider the problem of minimizing the sum of two functions # $$ E^\star = \umin{x \in \RR^N} E(x) = f(x) + g(x). $$ # # So, we want to find a vector $x^\star$ solution to the problem, i.e. a minimizer of $E=f+g$. # # # We assume that $f$ is a $C^1$ function with $\beta$-Lipschitz gradient. # # # # We also assume that $g$ is "simple", in the sense that one can compute exactly and quickly its proximity operator, which is defined as # $$ \text{prox}_{\ga g}(x) = \uargmin{y \in \RR^N} \frac{1}{2}\norm{x-y}^2 + \ga g(y). $$ # for any $\ga > 0$. # # # The forward-backward algorithm reads, after initializing $x^{(0)} \in \RR^N$, # $$ x^{(k+1)} = \text{prox}_{\ga g}\pa{ x^{(k)} - \ga \nabla f( x^{(k)} ) }. $$ # # # If $0 < \ga < 2/\beta$, then this scheme converges to a minimizer of # $f+g$. # # Sparse Regularization of Inverse Problems # ----------------------------------------- # We consider a linear inverse problem # $$ y = A x^\sharp + w \in \RR^P$$ # where $x^\sharp \in \RR^N$ is the (unknown) signal to recover, $w \in # \RR^P$ is a noise vector, and $A \in \RR^{P \times N}$ models the # acquisition device. # # # To recover an estimate of the signal $x^\sharp$, we consider basis # pursuit denoising, which makes use of the $\ell^1$ norm as sparsity # enforcing penalty: # $$ \umin{x \in \RR^N} \frac{1}{2} \norm{A x-y}^2 + \la \norm{x}_1, $$ # where the $\ell^1$ norm is defined as # $$ \norm{x}_1 = \sum_i \abs{x_i}. $$ # # # The parameter $\la$ should be set in accordance to the noise level # $\norm{w}$. # # # This minimization problem can be cast in the form of minimizing $f+g$ # where # $$ f(x) = \frac{1}{2} \norm{Ax-y}^2 # \qandq g(x) = \la \norm{x}_1. $$ # # # $f$ is smooth; we have # $$ \nabla f(x) = A^* (A x - y), $$ # which is $\beta$-Lipschitz continuous, with # $$ \beta = \norm{ A^* A }. $$ # # # The $\ell^1$-norm is "simple", because its proximal operator is soft # thresholding: # $$ \big(\text{prox}_{\ga g}(x)\big)_n = \max\pa{ 0, 1 - \frac{\la \ga}{\abs{x_n}} } x_n. $$ # # Signal Deconvolution on Synthetic Sparse Data # ----------------------------------------------------- # A simple linearized model of seismic acquisition considers a linear filtering # operator (convolution): # $$ A x = h \ast x $$ # # The filter $h$ is called the impulse response, or the poind spread function, of the acquisition process $x\mapsto Ax$. # In[2]: N = 1024 # We define the width of the filter $h$. # In[3]: s = 5 # We define $h$ as the second derivative of a # Gaussian. # In[4]: t = arange(-N/2,N/2) h = (1-t**2/s**2)*exp(-(t**2)/(2*s**2)) h = h - h.mean() # We define the operator $A$. For simplicity, here periodic boundary conditions are used, so that the convolution is efficiently implemented as a product in Fourier domain. # In[5]: h_tf = fft.fft(fft.fftshift(h)) opA = lambda u : real(fft.ifft(fft.fft(u) * h_tf)) # We display the filter $h$ and its spectrum (amplitude of its Fourier transform). # In[6]: figsize(7,5) plot(t,h) xlim(-100,100) # In[7]: plot(t,fft.fftshift(abs(h_tf))) # We generate a synthetic sparse signal $x^\sharp$, with only a small number of nonzero # coefficients. # In[8]: random.seed(80) # we set the seed of the random number generator for reproducibility purpose. # In[9]: s = round(N*.01) # number of nonzero elements of xsharp sel = random.permutation(N) sel = sel[0:s] # indices of the nonzero elements of xsharp xsharp = zeros(N) xsharp[sel] = sign(randn(s)) * (1-0.3*rand(s)) # In[10]: noiselevel = 0.2 # Compute the measurements $y=A x^\sharp + w$ where $w$ is a realization # of white Gaussian noise. # In[11]: y = opA(xsharp) + noiselevel * randn(N) # In[12]: figsize(14,5) stem(sel,xsharp[sel]) xlim(0,N-1) title('signal $x^\sharp$') # In[13]: figsize(14,5) plot(range(N),y) xlim(0,N-1) title('signal $y$') # Deconvolution # --------------------------- # We now implement the foward-backward algorithm to recover an estimate of the sparse signal # # # We define the regularization parameter $\la$. # In[14]: Lambda = 3 # We define the proximity operator of $\ga g$. # In[15]: def prox_gamma_g (x, gamma, Lambda) : return x - x/maximum(abs(x)/(Lambda*gamma),1) # soft-thresholding # We define the gradient operator of $f$. Note that $A^*=A$ because the filter $h$ is symmetric. # __Exercise__ # # Write the code of the function grad_f. # In[16]: grad_f = lambda x : # put your code here # We define the Lipschitz constant $\beta$ of $\nabla f$. # In[17]: beta = abs(fft.fft(h)).max()**2 beta # We define the stepsize $\ga$, which must be smaller than $2/\beta$. # In[18]: gamma = 1.9 / beta # We compute the solution of $\ell_1$ deconvolution (basis pursuit denoising). # We keep track of the energy $E_k=f(x^{(k)})+g(x^{(k)})$. # __Exercise__ # # Write the code of the forward-backward iteration. # In[19]: nbiter = 2000 x = y En_array = zeros(nbiter+1) En_array[0] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1) for iter in range(nbiter): # iter goes from 0 to nbiter-1 # put your code here En_array[iter+1] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1) x_restored = x # We display the result. # In[21]: fig, (subfig1,subfig2) = subplots(1,2,figsize=(16,7)) # one figure with two horizontal subfigures subfig1.stem(xsharp) subfig2.stem(x_restored) subfig1.set_title('$x^\sharp$') subfig2.set_title('$x_\mathrm{restored}$') # We plot the relative error $(E_k-E^\star)/(E_0-E^\star)$ in log-scale with respect to $k$. # In[22]: plot(log10((En_array[0:1800]-En_array[-1])/(En_array[0]-En_array[-1]))) # Over-relaxed Forward-Backward # ----------------------------- # It is possible to introduce a relaxation parameter $\rho$ with $0 < \rho < 1$. The over-relaxed foward-backward algorithm initializes $x^{(0)} \in \RR^N$, # and then iterates, for $k=1,2,\ldots$ # $$ z^{(k)} = \text{prox}_{\ga g}\pa{ # x^{(k-1)} - \ga \nabla f( x^{(k-1)} ) }. # $$ # $$ x^{(k)} = z^{(k)} + # \rho \pa{ z^{(k)} - x^{(k-1)} } $$ # # # Let us assume $\gamma=1/\beta$. Convergence of the iterates $x^{(k)}$ and $z^{(k)}$ to a solution is guaranteed # for $ 0 < \rho < 1/2 $. The weaker property of convergence of $ E(x^{(k)}) $ # to $E^\star$ is proved, when $ 1/2\leq \rho <1 $. # __Exercise__ # # Write the code of the over-relaxed forward-backward iteration. # In[27]: gamma = 1/beta nbiter = 1700 rho = 0.95 x = y En_array_overrelaxed = zeros(nbiter+1) En_array_overrelaxed[0] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1) for iter in range(nbiter): # put your code here En_array_overrelaxed[iter+1] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1) # In[28]: plot(log10((En_array[0:1800]-En_array[-1])/(En_array[0]-En_array[-1]))) plot(log10((En_array_overrelaxed[0:1800]-En_array[-1])/(En_array[0]-En_array[-1]))) # As we see, in this example, over-relaxation does not bring any speedup, because $\gamma$ is lower than without over-relaxation. There are other setting parameters or other problems, for which over-relaxation does bring a significant speedup. # FISTA-like Accelerated Forward-Backward Algorithm # --------------------------- # We consider the FISTA algorithm introduced in: # # # A. Beck and M. Teboulle, # "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems", # _SIAM Journal on Imaging Sciences_, 2009. # # More precisely, we consider a slightly modified version of FISTA, whose convergence is proved, see # A. Chambolle and C. Dossal, "On the convergence of the iterates of "FISTA"", preprint, 2015. # Given an initial estimate $x^{(0)}$ of the solution and a parameter $a>2$, the algorithm sets $\gamma=1/\beta$, sets $z^{(0)}=x^{(0)} \in \RR^N$, # and iterates, for $k=1,2,\ldots$ # $$ x^{(k)} = \text{prox}_{\ga g}\pa{ # z^{(k-1)} - \ga \nabla f( z^{(k-1)} ) }. # $$ # $$ \alpha_k=(k-1)/(k+a) $$ # $$ z^{(k)} = x^{(k)} + # \alpha_k # \pa{ x^{(k)} - x^{(k-1)} } $$ # # # It is proved that the iterates $x^{(k)}$ converge to a solution $x^\star$ of the problem. Moreover, # the optimal convergence rate for this class of problems is reached, # namely # $$ E_k - E^\star = O(1/k^2), $$ # whereas the convergence rate for the normal forward-backward is only # $O(1/k)$. # # Note the difference between the over-relaxed forward-backward and the accelerated forward-backward: the later is based on an inertia mechanism, of different nature than over-relaxation. # __Exercise__ # # Write the code of the accelerated forward-backward iteration. # In[31]: gamma = 1/beta nbiter = 1700 a = 10 x = y En_array_fista = zeros(nbiter+1) En_array_fista[0] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1) for iter in range(nbiter): # put your code here En_array_fista[iter+1] = norm(opA(x) - y)**2/2 + Lambda*norm(x, ord=1) # In[32]: plot(log10((En_array[0:1800]-En_array[-1])/(En_array[0]-En_array[-1]))) plot(log10((En_array_overrelaxed[0:1800]-En_array[-1])/(En_array[0]-En_array[-1]))) plot(log10((En_array_fista[0:1800]-En_array[-1])/(En_array[0]-En_array[-1]))) # __Exercise__ # # Try different values of a. You can try a=3,10,30,50,100. # We can note that the accelerated forward-backward is not monotonic: the cost function E is not decreasing along with the iterations and some oscillations are present.