Phase transition diagrams of the Basis Pursuit algorithm for various random matrices with different "concentration properties"

$\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\CC}{\mathbb{C}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\E}{\mathbb{E}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\Zz}{\mathcal{Z}}$ $\newcommand{\Ww}{\mathcal{W}}$ $\newcommand{\Vv}{\mathcal{V}}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\NN}{\mathcal{N}}$ $\newcommand{\Hh}{\mathcal{H}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\Ee}{\mathcal{E}}$ $\newcommand{\Cc}{\mathcal{C}}$ $\newcommand{\Gg}{\mathcal{G}}$ $\newcommand{\Ss}{\mathcal{S}}$ $\newcommand{\Pp}{\mathcal{P}}$ $\newcommand{\Ff}{\mathcal{F}}$ $\newcommand{\Xx}{\mathcal{X}}$ $\newcommand{\Mm}{\mathcal{M}}$ $\newcommand{\Ii}{\mathcal{I}}$ $\newcommand{\Dd}{\mathcal{D}}$ $\newcommand{\Ll}{\mathcal{L}}$ $\newcommand{\Tt}{\mathcal{T}}$ $\newcommand{\si}{\sigma}$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$

This notebook provides empirical evidences to the theoretical results from

"On the gap between RIP-properties and sparse recovery conditions" Sjoerd Dirksen, Guillaume Lecué, Holger Rauhut

available at http://arxiv.org/abs/1504.05073

In [1]:
from __future__ import division
import numpy as np
%pylab inline
#%load_ext autoreload
#%autoreload 2
Populating the interactive namespace from numpy and matplotlib

Compressed Sensing Recovery with the Douglas-Rachford algorithm

In Compressed sensing, one is given a highly underdetermined linear system with $m$ equations and $n$ unknown variables with $n>>m$: find $\hat x$ such that $y=A \hat x$ where $A\in\R^{m\times n}$ is a compression matrix, i.e. $n>m$. This problem is ill posed because $\hat x$ is higher dimensional than $y$ and so there is an infinite number of solutions. What saves the day is that the target vector $\hat x$ we want to recover is sparse in the sense that it has a short support.

A natural idea is therefore to look for the vector with the shortest support in the space of solutions but this problem is usually NP-hard because the function "size of the support" of a vector is not convex. So one may look for the convex function the "closest" to the last one, which is here the $\ell_1$-norm. So we end up with a procedure that aims to find the solution with the smallest $\ell_1$ norm: $$ x_{recovered} \in \arg\min_{x\in\R^n} \norm{x}_1 \quad\mbox{s.t.}\quad Ax=y$$ where the $\ell^1$ norm is defined as $$ \norm{x}_1 = \sum_{j=1}^n \abs{x_j}. $$ This procedure is called the basis pursuit.

It is a convex optimization problem of a non-smooth function under affine constraints. This can be shown to be equivalent to a linear programming problem, for which various algorithms can be used (simplex, interior points). We propose here to use the Douglas-Rachford algorithm.

Douglas-Rachford Algorithm

The Douglas-Rachford (DR) algorithm is an iterative scheme designed to solve convex optimization of the form $$ \umin{x} f(x) + g(x) $$ where $f$ and $g$ are convex functions, of which one is able to compute the proximal operators.

We recall that the proximal operator of a function $h$ is defined by $$\mathrm{prox}_{h}(x) \in {\rm argmin}_{u\in\R^N}\Big(\frac{1}{2}\norm{x-u}_2^2+h(u)\Big)$$

The Douglas-Rachford algorithm takes an arbitrary initialization element $s^{(0)}$, a parameter $\ga>0$ and iterates, for $k=1,2,\ldots$

$$ \left|\begin{array}{l} x^{(k)} = \mathrm{prox}_{\gamma f} (s^{(k-1)} )\\ z^{(k)} = \text{prox}_{\ga g}( 2x^{(k)}-s^{(k-1)})\\ s^{(k)} = s^{(k-1)}+ z^{(k)}-x^{(k)}. \end{array}\right. $$

It is of course possible to inter-change the roles of $f$ and $g$, which defines a different algorithm.

The iterates $x^{(k)}$ converge to a solution of the minimization problem $\min f+g$. DR algorithm can be understood as an approximation of a fixed point solution to $$ s = F(s) \mbox{ where } F(s) = s + \mathrm{prox}_{\gamma f}(2\mathrm{prox}_{\gamma g}(s)- s) - \mathrm{prox}_{\gamma g}(s) $$

Implementation of the Basis Pursuit via the Douglas-Rachford algorithm

It is possible to recast the basis pursuit procedure as the minimization of a sum $f+g$ where $g(x) = \norm{x}_1$ and $f(x)=\iota_{\Omega}$ where $\Omega = \enscond{x}{Ax=y}$ is an affine space, and $\iota_\Omega$ is the indicator function $$ \iota_\Omega(x) = \choice{ 0 \qifq x \in \Omega, \\ +\infty \qifq x \notin \Omega. } $$

The proximal operator of the $\ell^1$ norm is a soft thresholding operator: $$ \text{prox}_{\gamma \norm{\cdot}_1}(x)_j = \max\pa{ 0, 1-\frac{\ga}{\abs{x_j}} } x_j, \quad j=1,\ldots,n $$

Phase transition diagrams

A phase transition diagram is a usefull (simulation) tool to answer the following question:

Given a compression matrix $A$ of size $m\times n$ and a reconstrcution algorithm (for instance the Basis Pursuit via Douglas-Rachford) what is the maximal sparsity parameter $s$ such that "many" $s$-sparse signal $\hat x\in\R^n$ can be exactly reconstructed from $y=A\hat x$ (and $A$) using the algorithm?

To answer this question, we iterate over $s=1,2,\cdots,m$ (no need to go beyond $m$) and for each $s$ we simulate nbtest signals $\hat x$ with sparsity $s$, run the reconstruction algorithm with $y=A\hat x$ and check if the reconstruction is exact (that is, let say, $||x_{restored} - \hat x||_2\leq 0.001$). If the reconstruction is exact for the nbtest signals then we plot a white pixel if reconstruction failed for all nbtest signals then we plot a black pixel. For every result inbetween we plot a grey pixel. (in what follows, we use colour red in place of the white colour and colour blue in place of black colour).

This gives a line of the diagram matrix for a given number of measurements $m$. Then, we repeat this construction for all $m=1,2,\ldots,n$.

We end up with a $n\times n$ matrix and use some heatmap representation to draw the phase transition diagram. (Actually, we stop at $s\leq n/2$ since the proportional case $s\sim m \sim n$ is not very interesting in Compressed sensing (even though this regime, sometimes called the "Kashin regime", has been extensively studied in the theory of Banach spaces since Dvoretsky theorem).

Douglas-Rachford procedure

In [2]:
#proximal operator of \ell_1 norm
def prox_gamma_g(x, gamma):
    return x - x/maximum(abs(x)/gamma,1) # soft-thresholding

#proximal operator of the indicator function of an affine space 
def prox_f(x, y, A, pA):
    return x + pA.dot(y-A.dot(x))

def DR(n, y, A, pA, nbiter, gamma):
    s = zeros(n)
    for iter in range(nbiter):  # iter goes from 0 to nbiter-1
        x = prox_f(s, y, A, pA)
        z = prox_gamma_g(2*x-s, gamma)
        s = s + z - x
    return x

Phase transition matrix

In [3]:
#initialization of the seed for reproducibility
random.seed(0)
In [4]:
def signal(n, sparsity):
    sel = random.permutation(n)
    sel = sel[0:sparsity]   # indices of the nonzero elements of xsharp
    xsharp = zeros(n)
    xsharp[sel] = 1
    return xsharp
In [9]:
def phase_transition_mat(n, nbiter, gamma, nbtest):
    """return a n.n matrix with the number of reconstruction success for every  1\leq m \leq n measurements 
    and sparsity 1\leq s\leq n/2
    n : ambiant dimension of the signals
    nbiter : number of iteration in the DR algorithm
    gamma : threshold parameter in the soft threshold proximal function
    nbtest : number of tests for each pixel"""
    PTM = zeros((n,int(n/2)))
    set_ind_failure = []
    for m in range(1,n+1):#construct one line of the Phase transition matrix for a given number of measurements P
        if (m % 20) == 0:
            print("line number {} done".format(m))
        A = randn(m,n) / sqrt(m)
        pA = pinv(A) # pseudo-inverse. Equivalent to pA = A.T.dot(inv(A.dot(A.T)))
        ind_failure = 0
        for sparsity in range(1,min(m, int(n/2))+1):
            nb_success = 0         
            for i in range(nbtest):
                xsharp = signal(n, sparsity)
                y = A.dot(xsharp)
                x_restored = DR(n, y, A, pA, nbiter, gamma)
                if norm(x_restored-xsharp, ord=2) <0.001:
                    nb_success = nb_success + 1
            PTM[m-1, sparsity-1] = nb_success
    return PTM

Construction of the Phase transition matrix

n: the size of the ambiant space

nbiter: the number of iteration in the Douglas Rachford algorithm

gamma: theshold parameter in the soft thresholding operator

nbtest: number of test (=number of signals) constructed for each pixel

In [13]:
n, nbiter, gamma, nbtest = 30, 20, 1, 10
mat = phase_transition_mat(n, nbiter, gamma, nbtest)# construction of the matrix with the number of success among nbtest
line number 20 done

Construction of the 'frontier'

In [14]:
def frontier(mat, n):
    """construction of the phase transition frontier, i.e. first time the number of success goes below nbtest/2"""
    L = []
    N = len(mat)
    for s in range(int(n/2)):
        m = 0
        while mat[m,s]<nbtest/2 and m<N-1:
            m = m + 1
        L.append(m)
    return L

Draw a heatmap of the phase transition matrix together with the frontier

In [15]:
P_min, P_max, S_min, S_max = 0, n, 0, int(n/2)
fig = plt.imshow(mat[P_min:P_max, S_min:S_max], interpolation="gaussian",  
                 aspect='auto', origin = 'lower', extent=[S_min, S_max, P_min, P_max])
titre = "Gaussian measurements"
plt.title(titre)
plt.xlabel('sparsity')
plt.ylabel('number of measurements')

#empirical phase transition
X = range(int(n/2))
L = frontier(mat, n)
plot(X,L, linewidth=4, color = 'black', label='phase transition')
plt.legend(loc=4)
#Theoretical phase transition curve
#Y = [3*s*log(0.6*N/(s+1)) for s in X]
#plot(X,Y, linewidth=3, color = 'white')

#plt.savefig("gaussian_100.png",bbox_inches='tight')
Out[15]:
<matplotlib.legend.Legend at 0x10877c390>

repeat the construction of the frontier a hundred times to "smooth it"

nb_curves: number of phase transition curves constructed. Those curves are then averaged in order to "smooth" the effect of randomness and get a "stable" phase transition

In [16]:
n, nbiter, gamma, nbtest, nb_curves = 30, 20, 1, 10, 10
L = zeros(int(n/2))
for i in range(nb_curves):
    if (i % 10) == 0:
        print('step {} done'.format(i))
    mat = phase_transition_mat(n, nbiter, gamma, nbtest)
    F = frontier(mat, n)
    L = [sum(a) for a in zip(L,F)] 
L_gauss = [i/nb_curves for i in L]
step 0 done
line number 20 done
line number 20 done
line number 20 done
line number 20 done
line number 20 done
line number 20 done
line number 20 done
line number 20 done
line number 20 done
line number 20 done
In [19]:
#For next use, save the Gaussian phase transition frontier in the  'gaussian_phase_transition_n.p' file
import pickle
filename = 'gaussian_phase_transition_n_{}_nbtest_{}_nbcurves_{}.p'.format(n, nbtest, nb_curves)
#for saving
#with open(filename, "wb") as f:
#    pickle.dump(L_gauss, f)
#for loading
with open(filename, "rb") as f:
    L_gauss = pickle.load(f)

Draw the phase transition frontier for Gaussian measurements

In [20]:
X = range(len(L_gauss))
plot(X,L_gauss, linewidth=4, color = 'blue', label="Gaussian phase transition")
titre = "Gaussian measurements"
plt.title(titre)
plt.xlabel('sparsity')
plt.ylabel('number of measurements')
plt.legend(loc=4)
#plt.savefig("phase_transition_curve_gaussian_100.png",bbox_inches='tight')
Out[20]:
<matplotlib.legend.Legend at 0x1084c4d50>

Construction of (smoothed) phase transition frontiers for other type of measurements

In the two papers (available here)

S. Dirksen, G. Lecué and H. Rauhut. On the gap between RIP-properties and sparse recovery conditions

G. Lecué and S. Mendelson. Sparse recovery under weak moment assumptions

we proved that reconstruction properties (stability and robustness) of the basis pursuit algorithm goes far beyond Gaussian measurements. For instance, random matrices with i.i.d. components having only $\log n$ sub-gaussian moments have the same "reconstruction properties" has the Gaussian one ($n$ being the size of the signals).

The aim of this notebook is to highlight this "universality phenomenum".

Random measurements matrices with $\psi_\alpha$ coordinates for $0<\alpha\leq2$

We construct matrices with i.i.d. coordinates with exponential moments. Note that if $g$ is a standard Gaussian variable then $ {\rm sign}(g) |g|^{2/\alpha}$ is exactly a $\psi_\alpha$ random variable (with mean zero).

In [26]:
def mat_exp_power(m, n, alpha):
    A = randn(m, n)/ sqrt(m)
    return np.multiply(np.power(np.absolute(A),int(2/alpha)), sign(A))
In [27]:
def phase_transition_mat_exp_power(n, nbiter, gamma, nbtest, alpha):
    PTM = zeros((n,int(n/2)))
    set_ind_failure = []
    for m in range(20,n+1):#construct one line of the Phase transition matrix for a given number of measurements P
        A = mat_exp_power(m, n, alpha) 
        pA = pinv(A) # pseudo-inverse. Equivalent to pA = A.T.dot(inv(A.dot(A.T)))
        ind_failure = 0
        for sparsity in range(1,min(m, int(n/2))+1):
            nb_success = 0         
            for i in range(nbtest):
                xsharp = signal(n, sparsity)
                y = A.dot(xsharp)
                x_restored = DR(n, y, A, pA, nbiter, gamma)
                if norm(x_restored-xsharp, ord=2) <0.001:
                    nb_success = nb_success + 1
            PTM[m-1, sparsity-1] = nb_success
    return PTM
In [28]:
n, nbiter, gamma, nbtest, alpha = 40, 20, 1, 10, 1
mat = phase_transition_mat_exp_power(n, nbiter, gamma, nbtest, alpha)
In [29]:
P_min, P_max, S_min, S_max = 0, n, 0, int(n/2)
fig = plt.imshow(mat[P_min:P_max, S_min:S_max], interpolation="gaussian",  
                 aspect='auto', origin = 'lower', extent=[S_min, S_max, P_min, P_max])
titre = "Psi_alpha measurements for alpha = {}".format(alpha)
plt.title(titre)
plt.xlabel('sparsity')
plt.ylabel('number of measurements')

#empirical phase transition
X = range(int(n/2))
L = frontier(mat, n)
plot(X,L, linewidth=4, color = 'black', label='phase transition')
plt.legend(loc=4)
#filename = "exp_power_{}_{}.png".format(alpha, n)
#plt.savefig(filename,bbox_inches='tight')
Out[29]:
<matplotlib.legend.Legend at 0x108c6a1d0>
In [30]:
n, nbiter, gamma, nbtest, nb_curves = 40, 20, 1, 15, 20
L = zeros(int(n/2))
dict_exp_power = {2: L, 1.8: L, 1.5: L, 1.2: L, 1: L, 0.8: L, 0.5: L, 0.1: L, 0.05: L, 0.01: L}
for alpha in dict_exp_power.keys():
    print("power {} running".format(alpha))
    for i in range(nb_curves):            
        mat = phase_transition_mat_exp_power(n, nbiter, gamma, nbtest, alpha)
        F = frontier(mat, n)
        dict_exp_power[alpha] = [sum(a) for a in zip(dict_exp_power[alpha], F)] 
    dict_exp_power[alpha] = [ele/nb_curves for ele in dict_exp_power[alpha]]
power 1.5 running
power 1 running
power 2 running
power 1.8 running
power 0.1 running
power 0.8 running
power 1.2 running
power 0.05 running
power 0.01 running
power 0.5 running
In [6]:
#save the dictionary of phase transitions curves for later use
import pickle
#To save the dictionary
#with open('dictionnaries_exp_power_100_v3.p', 'wb') as fp:
#    pickle.dump(dict_exp_power, fp)
    
#to load the dictionary
#n, nbiter, gamma, nbtest, nb_curves = 100, 20, 1, 16, 30
#with open('dictionnaries_exp_power_100_v3.p', 'rb') as fp:
#    dict_exp_power = pickle.load(fp)
In [9]:
#N = 100
plt.figure(figsize=(8,5))
X = range(int(n/2))
for key in sort(dict_exp_power.keys()):
    L = dict_exp_power[key]
    text = 'exp power {}'.format(key)
    if key == 2:
        plot(X, L, 'r--', label = text, linewidth=4)
    else:
        plot(X, L, label = text)
plt.legend(loc=4)
plt.xlabel('sparsity')
plt.ylabel('number of measurements')
plt.title('phase transition curves -- exponential moments variables')
#plot(X,L_gauss, linewidth=3, color = 'blue', label="Gaussian phase transition")
#plt.legend(loc=4)
#plt.savefig("phase_transition_curves_exp_power_100_v3.png",bbox_inches='tight')

Random measurements matrices with coordinates in $L_p$

For this example, we construct $m\times n$ i.i.d. Student variables with $k$ degrees of liberty. It has $p$ moments if and only if $k>p$.

In [26]:
def mat_power(m ,n, p):
    return random.standard_t(p, size=(m, n))
In [27]:
def phase_transition_mat_power(n, nbiter, gamma, nbtest, p):
    PTM = zeros((n, int(n/2)))
    set_ind_failure = []
    for m in range(1, n+1):#construct one line of the Phase transition matrix for a given number of measurements P
        A = mat_power(m, n, p) 
        pA = pinv(A) # pseudo-inverse. Equivalent to pA = A.T.dot(inv(A.dot(A.T)))
        ind_failure = 0
        for sparsity in range(1, min(m, int(n/2))+1):
            nb_success = 0         
            for i in range(nbtest):
                xsharp = signal(n, sparsity)
                y = A.dot(xsharp)
                x_restored = DR(n, y, A, pA, nbiter, gamma)
                if norm(x_restored-xsharp, ord=2) <0.001:
                    nb_success = nb_success + 1
            PTM[m-1, sparsity-1] = nb_success
    return PTM
In [28]:
n, nbiter, gamma, nbtest, nb_curves = 30, 20, 1, 10, 20
L = zeros(int(n/2))
dict_power = {30: L, 20: L, 15: L, 10: L, 5: L, 4: L, 3: L, 2: L}
for p in dict_power.keys():
    print('power {} running'.format(p))
    for i in range(nb_curves):
        mat = phase_transition_mat_power(n, nbiter, gamma, nbtest, p)
        F = frontier(mat, n)
        dict_power[p] = [sum(a) for a in zip(dict_power[p], F)] 
    dict_power[p] = [ele/nb_curves for ele in dict_power[p]]
power 2 running
power 3 running
power 20 running
power 5 running
power 4 running
power 10 running
power 30 running
power 15 running
In [16]:
#save the dictionary of phase transitions curves to speed up later investigation
import pickle
#Save the dictionary
#with open('dictionnaries_power_100.p', 'wb') as fp:
#    pickle.dump(dict_power, fp)

#Load the dictionary
#with open('dictionnaries_power_100.p', 'rb') as fp:
#    dict_power = pickle.load(fp)
In [25]:
X = range(int(n/2))
plt.figure(figsize=(8,5))
for key in dict_power.keys():
    if key != 15:
        L = dict_power[key]
        text = 'degree {}'.format(key)
        plot(X, L, label = text)
plt.xlabel('sparsity')
plt.ylabel('number of measurements')
plt.title('phase transition curves - Student variables')
#Gaussian phase transition
plot(X,L_gauss, 'r--', linewidth=3, label="Gaussian phase transition")
#plt.legend(loc=4)
#plt.savefig("phase_transition_curves_student_100.png",bbox_inches='tight')

Theoretical guarentee on phase transition

Asymptotically when $n\to\infty$, existence of a phase transition, which is roughtly $$ m \geq 2e s \log\Big(\frac{n}{\sqrt{\pi}s}\Big)\approx 5.4 s \log\Big(\frac{0.6 n}{s}\Big)$$ has been proved in several papers:

D. L. Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete and Computational Geometry, 35(4):617–652, 2006.

D. L. Donoho and J. Tanner. Neighborliness of randomly projected simplices in high dimensions. Proceedings of the National Academy of Sciences of the United States of America, 102(27):9452–9457, 2005.

D. L. Donoho and J. Tanner. Counting faces of randomly projected polytopes when the projection radically lowers dimension. Journal of the American Mathematical Society, 22(1):1–53, 2009.

D. L. Donoho and J. Tanner. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906):4273–4293, 2009.

S. Oymak and J. A. Tropp. Universality laws for randomized dimension reduction, with applications

In [ ]: