#!/usr/bin/env python
# coding: utf-8

# Forward-Backward Method on the Dual Problem
# ===========================
# 
# This numerical tour is an introduction to convex duality with an
# application to total variation denoising.
# 
# $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$
# $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$
# $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$
# $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$
# $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$
# $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$
# $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$
# $\newcommand{\norm}[1]{\|#1\|}$
# $\newcommand{\abs}[1]{\left|#1\right|}$
# $\newcommand{\choice}[1]{ \left\{  \begin{array}{l} #1 \end{array} \right. }$
# $\newcommand{\pa}[1]{\left(#1\right)}$
# $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$
# $\newcommand{\qandq}{\quad\text{and}\quad}$
# $\newcommand{\qwhereq}{\quad\text{where}\quad}$
# $\newcommand{\qifq}{ \quad \text{if} \quad }$
# $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$
# $\newcommand{\ZZ}{\mathbb{Z}}$
# $\newcommand{\CC}{\mathbb{C}}$
# $\newcommand{\RR}{\mathbb{R}}$
# $\newcommand{\EE}{\mathbb{E}}$
# $\newcommand{\Zz}{\mathcal{Z}}$
# $\newcommand{\Ww}{\mathcal{W}}$
# $\newcommand{\Vv}{\mathcal{V}}$
# $\newcommand{\Nn}{\mathcal{N}}$
# $\newcommand{\NN}{\mathcal{N}}$
# $\newcommand{\Hh}{\mathcal{H}}$
# $\newcommand{\Bb}{\mathcal{B}}$
# $\newcommand{\Ee}{\mathcal{E}}$
# $\newcommand{\Cc}{\mathcal{C}}$
# $\newcommand{\Gg}{\mathcal{G}}$
# $\newcommand{\Ss}{\mathcal{S}}$
# $\newcommand{\Pp}{\mathcal{P}}$
# $\newcommand{\Ff}{\mathcal{F}}$
# $\newcommand{\Xx}{\mathcal{X}}$
# $\newcommand{\Mm}{\mathcal{M}}$
# $\newcommand{\Ii}{\mathcal{I}}$
# $\newcommand{\Dd}{\mathcal{D}}$
# $\newcommand{\Ll}{\mathcal{L}}$
# $\newcommand{\Tt}{\mathcal{T}}$
# $\newcommand{\si}{\sigma}$
# $\newcommand{\al}{\alpha}$
# $\newcommand{\la}{\lambda}$
# $\newcommand{\ga}{\gamma}$
# $\newcommand{\Ga}{\Gamma}$
# $\newcommand{\La}{\Lambda}$
# $\newcommand{\si}{\sigma}$
# $\newcommand{\Si}{\Sigma}$
# $\newcommand{\be}{\beta}$
# $\newcommand{\de}{\delta}$
# $\newcommand{\De}{\Delta}$
# $\newcommand{\phi}{\varphi}$
# $\newcommand{\th}{\theta}$
# $\newcommand{\om}{\omega}$
# $\newcommand{\Om}{\Omega}$

# In[1]:


from __future__ import division
get_ipython().run_line_magic('pylab', 'inline')
get_ipython().run_line_magic('load_ext', 'autoreload')
get_ipython().run_line_magic('autoreload', '2')


# 
# 
# Convex Duality
# --------------
# Given a convex, proper, and lower semi-continuous function $f$ defined on $\RR^N$, its
# _conjugate_ $f^*$ is the convex function defined as
# $$ \forall u \in \RR^N, \quad f^*(u) = \umax{x \in \RR^N} \dotp{x}{u} - f(x). $$
# 
# 
# One has $(f^*)^*=f$.
# 
# One can show that if $f(x) = \frac{1}{2} \norm{A x - b}^2$, where $A \in \RR^{N \times N}$
# is an invertible matrix, then
# $$f^*(u) = \frac{1}{2} \norm{(A^*)^{-1} u + b}^2- \frac{1}{2}\norm{b}^2$$
# 
# One can show that in the case of $\ell^p$ norms
# $$ f(x) = \norm{x}_p = \pa{ \sum_{i=1}^N \abs{x_i}^p }^{1/p}, $$
# with the usual extension to $p=+\infty$
# $$ \norm{x}_\infty = \umax{1 \leq i \leq N} \abs{x_i},$$
# then one has
# $$ f^* = \iota_{\norm{\cdot}_q \leq 1}
#  \qwhereq \frac{1}{p}+\frac{1}{q}=1, $$
# where $\iota_{\Cc}$ is the indicator function of the convex set
# $\Cc$.
# 
# Forward-backward on the Dual Problem
# ------------------------------------------
# We are concerned with the minimization of composite problems of the form
# $$ \umin{x \in \RR^N} f(x) + g(Lx) $$
# where $ L \in  \RR^{P \times N} $ is a linear operator (a matrix),
#  $f : \RR^N \rightarrow \RR $
# and $g : \RR^P \rightarrow \RR $
# are convex functionals.
# 
# 
# We now assume that $f$ is $\beta$-strongly convex.
# Then, $f^*$ is differentiable  and
# its gradient is $\beta$-Lipschitz continuous.
# 
# 
# In this case, the Fenchel-Rockafellar theorem shows that one can solve
# the following _dual problem_
# $$ \umin{u \in \RR^P} f^*( -L^* u ) + g^*(u) $$
# and recover the unique solution $x^\star$
# of the primal problem from a (non-necessarily unique) solution $u^\star$
# of the dual problem, as
# $$ x^\star = \nabla f^*( -L^* u^\star ). $$
# 
# Moreover, the optimal primal and dual costs satisfy
# $$ E^\star= f(x^\star) + g(Lx^\star) =
#   -  f^*( -L^* u^\star ) - g^*(u^\star). $$
# 
# 
# Denoting $F(u) = f^*( -L^* u )$ and $G(u) = g^*(u)$, one thus needs
# to solve the problem
# $$ \umin{u \in \RR^P} F(u) + G(u). $$
# 
# 
# We assume that the function $g$ is simple, in the sense that
# one can apply the proximity operator $\mathrm{prox}_{\ga g}$
# for any $\ga > 0$.
# 
# Then, $G=g^*$ is simple as well, 
# because of the
# Moreau identity
# $$ x = \text{prox}_{\ga g^*}(x) + \ga \text{prox}_{g/\ga}(x/\ga). $$
# 
# 
# Since $F$ is smooth and $G$ is simple, one can apply the
# forward-backward algorithm, which reads,
# after initilizing $u^{(0)} \in \RR^P$,
# $$ u^{(k+1)} = \text{prox}_{\ga G}\pa{ u^{(k)} - \ga \nabla F( u^{(k)} )  }. $$
# 
# 
# 
# The primal iterates are defined as
# $$ x^{(k)} = \nabla f^*( -L^* u^{(k)} ). $$
# 
# So, we can write the algorithm in the primal-dual form
# $$
# \left|\begin{array}{l}
# x^{(k)} = \nabla f^*( -L^* u^{(k)} )\\
# u^{(k+1)} = \text{prox}_{\ga g^*}\pa{ u^{(k)} + \ga Lx^{(k)}}.
# \end{array}\right.
# $$
# 
# If $0< \ga < 2/(\beta\|L\|^2)$, $x^{(k)}$ converges to a primal solution $x^\star$ and $u^{(k)}$ converges to a dual solution $u^\star$.
# 
# 
# Total Variation
# ---------------
# 
# Like in lab 1, we are interested in imaging inverse problems: we want to estimate the unknown sharp image $s^\sharp$ from the data $y=Ax^\sharp\ +\mbox{ noise}$, where $A$ is the linear observation operator. 
# 
# In lab 1, we
# defined the 'discrete gradient' $D$ and we used as regularizer the Tikhonov functional $\frac{1}{2}\|Dx\|^2_{2,2}=\frac{1}{2}\sum_{n_1,n_2} \|(Dx)_{n_1,n_2}\|^2_2=\frac{1}{2}\sum_{n_1,n_2} (Dx)_{n_1,n_2,v}^2+(Dx)_{n_1,n_2,h}^2$.
# 
# A better regularizer for imaging problems is the _total variation_, which corresponds to taking the $\ell_1$ norm instead of the squared $\ell_2$ norm of the discrete gradient:
# 
# $$\mathrm{TV}(x)=\|Dx\|_{1,2}=\sum_{n_1,n_2} \|(Dx)_{n_1,n_2}\|_2=\sum_{n_1,n_2} \sqrt{ (Dx)_{n_1,n_2,v}^2+(Dx)_{n_1,n_2,h}^2}.$$
# 
# The total variation was first introduced in L. I. Rudin, S. Osher, E. Fatemi,
# "Nonlinear total variation based noise removal algorithms," _Physica D_, vol. 60, 1992.
# 
# The difficulty is that there is no way to compute the proximity operator of the total variation in 2-D (fast exact algorithms exist in 1-D). So, we must view it as the simple functional $\lambda \|\cdot\|_{1,2}$ composed with the linear operator $D$.
# Indeed, the proximity operator of $\lambda \|\cdot\|_{1,2}$ is simply elementwise soft-thresholding of the magnitude of the vector field. 
# 
# So, if $A$ is invertible, the dual forward-backward algorithm is appropriate to solve least-squares problems regularized with the total-variation.
# 

# ## Denoising Lena
# 
# In the following, we consider the image denoising problem, i.e. $A=\mathrm{Id}$, applied to the Lena image. So, given the noisy image $y$, we want to estimated the clean Lena image $x^\sharp$ by the image $x^\star$ solution to the strongly convex primal problem
# 
# $$\umin{x \in \RR^{N_1\times N_2}} E_p(x)= f(x)+g(Dx)=\frac{1}{2}\|x-y\|^2+\lambda\|Dx\|_{1,2}$$
# 
# The dual problem is
# 
# $$\umin{u \in \RR^{N_1\times N_2\times 2}} E_d(u)= f^*(-D^*u)+g^*(u)=\frac{1}{2}\|-D^*u+y\|^2- \frac{1}{2}\norm{y}^2+\imath_{\|\cdot\|_{\infty,2}\leq\lambda}(u)$$
# 
# Chambolle Dual Algorithm 
# -------------------------
# 
# We will solve the total-variation regularized denoising problem by applying the forward-backward algorithm on the dual problem. This is sometimes called the _Chambolle Dual Algorithm_, because this idea was initially proposed in
# A. Chambolle,
# "An Algorithm for Total Variation Minimization and Applications",
# _Journal of Mathematical Imaging and Vision_, 20(1-2), 2004.
# 
# Note that since $g^*$ is an indicator function, the forward-backward algorithm reverts to projected gradient descent.
# 
# For a more general framework, see
# P. L. Combettes, D. Dung, and B. C. Vu,
# "Dualization of signal recovery problems",
# _Set-Valued and Variational Analysis_, vol. 18, 2010
# 

# We create the noisy image $y$.

# In[2]:


from scipy import misc
xsharp = misc.lena()
xsharp = xsharp.astype(float32)


# In[3]:


figsize(9,9)
imshow(xsharp, interpolation='nearest', cmap=cm.gray, vmin=0, vmax=255)
title('This is Lena')


# In[4]:


(N1,N2) = shape(xsharp)
noiselevel = 20
y = xsharp + noiselevel * randn(N1,N2)


# In[5]:


figsize(9,9)
imshow(y, interpolation='nearest', cmap=cm.gray, vmin=0, vmax=255)
title('Noisy version of Lena')


# We define the discrete gradient operator $D$ and its adjoint $D^*$.

# In[6]:


D = lambda x : c_['2,3',r_[diff(x,1,0), zeros([1,x.shape[1]])],c_[diff(x,1,1), zeros([x.shape[0],1])]]


# In[7]:


Dadj = lambda v : r_['0,2',-v[0,:,0],-diff(v[:-1,:,0],1,0),v[-2,:,0]] + c_['1,2',-v[:,0,1],-diff(v[:,:-1,1],1,1),v[:,-2,1]]


# The gradient of $f^*$ is simply $\nabla f^*:x\mapsto x+y$.

# In[8]:


def grad_f_conj (x,y) :
    return x + y


# The proximity operator $\text{prox}_{\ga g^*}$ does not depend on $\gamma$ and is such that $$\big(\text{prox}_{\ga g^*}(u)\big)_{n_1,n_2}=\frac{1}{\max(\|u_{n_1,n_2}\|_2/\lambda,1)}u_{n_1,n_2}\in\mathbb{R}^2,\quad\forall (n_1,n_2).$$

# In[9]:


def prox_g_conj (u, Lambda) :
    return u/tile(maximum(sqrt(sum(u**2,2,keepdims=True))/Lambda,1),(1,1,2)) # soft-thresholding


# __Exercise__ 
# 
# Write the code of the dual forward-backward iteration for denoising.

# In[11]:


Lambda = 25
gamma = 1.9/8 # we must have 0 < gamma < 2/8
nbiter = 400
(N1,N2) = y.shape
u = zeros([N1,N2,2])
Ep_array = zeros(nbiter) # array for the primal energy E_p
Ed_array = zeros(nbiter) # array for the dual energy E_d
sqnormy = norm(y)**2/2
for iter in range(nbiter):  
    # put your code here
    Ep_array[iter] = norm(x-y)**2/2 + Lambda*sum(sqrt(sum(D(x)**2,2))) 
    Ed_array[iter] = norm(y-Dadj(u))**2/2 - sqnormy
xdenoised = x


# We display the denoised image.

# In[12]:


figsize(9,9)
imshow(xdenoised, interpolation='nearest', cmap=cm.gray, vmin=0, vmax=255)
title('Denoised image')


# We check that the primal energy and the opposite of the dual energy converge to the same value 

# In[13]:


figsize(15,7)
plot(Ep_array)
plot(-Ed_array)


# We plot the difference in log-scale.

# In[14]:


figsize(15,7)
plot(log10(Ep_array+Ed_array))


# We observe that the convergence is slow: $E_d$ decreases like $O(1/k)$ 

# __Exercise__
# 
# Write the code with the forward-backward algorithm replaced by the FISTA-like accelerated forward-backward algorithm (see the NT on Forward-Backward).

# In[91]:


gamma = # put the right value here
a = 3   # try different values
Lambda = 25
nbiter = 400
(N1,N2) = y.shape
Ep_array_fista = zeros(nbiter) # array for the primal energy E_p
Ed_array_fista = zeros(nbiter) # array for the dual energy E_d
sqnormy = norm(y)**2/2
# put your code here
for iter in range(nbiter):
    # put your code here
    Ep_array_fista[iter] = norm(x-y)**2/2 + Lambda*sum(sqrt(sum(D(x)**2,2))) 
    Ed_array_fista[iter] = norm(y-Dadj(u))**2/2 - sqnormy
xdenoised = x


# We compare the convergence speed of the two algorithms.

# In[93]:


figsize(15,7)
plot(log10(Ep_array+Ed_array))
plot(log10(Ep_array_fista+Ed_array_fista))


# We observe again that the accelerated forward-backward algorithm, with its convergence in $O(1/k^2)$, is much faster than the forward-backward algorithm.
# 
# But for image processing applications, a rough precision is sufficient. Here, the image obtained with 50 iterations of the accelerated forward-backward algorithm is visually identical to the one obtained with much more iterations.
<script>
  $(document).ready(function(){
      $('div.prompt').hide();
  });
</script>