#!/usr/bin/env python # coding: utf-8 # Phase transition diagrams of the Basis Pursuit algorithm for various random matrices with different "concentration properties" # =================================== # # $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ # $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ # $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ # $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ # $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ # $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ # $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$ # $\newcommand{\norm}[1]{\|#1\|}$ # $\newcommand{\abs}[1]{\left|#1\right|}$ # $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ # $\newcommand{\pa}[1]{\left(#1\right)}$ # $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ # $\newcommand{\qandq}{\quad\text{and}\quad}$ # $\newcommand{\qwhereq}{\quad\text{where}\quad}$ # $\newcommand{\qifq}{ \quad \text{if} \quad }$ # $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ # $\newcommand{\ZZ}{\mathbb{Z}}$ # $\newcommand{\CC}{\mathbb{C}}$ # $\newcommand{\RR}{\mathbb{R}}$ # $\newcommand{\R}{\mathbb{R}}$ # $\newcommand{\E}{\mathbb{E}}$ # $\newcommand{\EE}{\mathbb{E}}$ # $\newcommand{\Zz}{\mathcal{Z}}$ # $\newcommand{\Ww}{\mathcal{W}}$ # $\newcommand{\Vv}{\mathcal{V}}$ # $\newcommand{\Nn}{\mathcal{N}}$ # $\newcommand{\NN}{\mathcal{N}}$ # $\newcommand{\Hh}{\mathcal{H}}$ # $\newcommand{\Bb}{\mathcal{B}}$ # $\newcommand{\Ee}{\mathcal{E}}$ # $\newcommand{\Cc}{\mathcal{C}}$ # $\newcommand{\Gg}{\mathcal{G}}$ # $\newcommand{\Ss}{\mathcal{S}}$ # $\newcommand{\Pp}{\mathcal{P}}$ # $\newcommand{\Ff}{\mathcal{F}}$ # $\newcommand{\Xx}{\mathcal{X}}$ # $\newcommand{\Mm}{\mathcal{M}}$ # $\newcommand{\Ii}{\mathcal{I}}$ # $\newcommand{\Dd}{\mathcal{D}}$ # $\newcommand{\Ll}{\mathcal{L}}$ # $\newcommand{\Tt}{\mathcal{T}}$ # $\newcommand{\si}{\sigma}$ # $\newcommand{\al}{\alpha}$ # $\newcommand{\la}{\lambda}$ # $\newcommand{\ga}{\gamma}$ # $\newcommand{\Ga}{\Gamma}$ # $\newcommand{\La}{\Lambda}$ # $\newcommand{\si}{\sigma}$ # $\newcommand{\Si}{\Sigma}$ # $\newcommand{\be}{\beta}$ # $\newcommand{\de}{\delta}$ # $\newcommand{\De}{\Delta}$ # $\newcommand{\phi}{\varphi}$ # $\newcommand{\th}{\theta}$ # $\newcommand{\om}{\omega}$ # $\newcommand{\Om}{\Omega}$ # # # This notebook provides empirical evidences to the theoretical results from # # >"On the gap between RIP-properties and sparse recovery conditions" Sjoerd Dirksen, Guillaume Lecué, Holger Rauhut # # available at http://arxiv.org/abs/1504.05073 # In[1]: from __future__ import division import numpy as np get_ipython().run_line_magic('pylab', 'inline') #%load_ext autoreload #%autoreload 2 # Compressed Sensing Recovery with the Douglas-Rachford algorithm # ------------------------------------ # In Compressed sensing, one is given a highly underdetermined linear system with $m$ equations and $n$ unknown variables with $n>>m$: find $\hat x$ such that $y=A \hat x$ where $A\in\R^{m\times n}$ is a compression matrix, i.e. $n>m$. This problem is ill posed because $\hat x$ is higher dimensional than $y$ and so there is an infinite number of solutions. What saves the day is that the target vector $\hat x$ we want to recover is sparse in the sense that it has a short support. # # A natural idea is therefore to look for the vector with the shortest support in the space of solutions but this problem is usually NP-hard because the function "size of the support" of a vector is not convex. So one may look for the convex function the "closest" to the last one, which is here the $\ell_1$-norm. So we end up with a procedure that aims to find the solution with the smallest $\ell_1$ norm: # $$x_{recovered} \in \arg\min_{x\in\R^n} \norm{x}_1 \quad\mbox{s.t.}\quad Ax=y$$ # where the $\ell^1$ norm is defined as # $$\norm{x}_1 = \sum_{j=1}^n \abs{x_j}.$$ # This procedure is called the **basis pursuit**. # # # It is a convex optimization problem of a non-smooth function under affine # constraints. This can be shown to be equivalent to a linear programming # problem, for which various algorithms can be used (simplex, interior # points). We propose here to use the Douglas-Rachford algorithm. # Douglas-Rachford Algorithm # -------------------------- # The Douglas-Rachford (DR) algorithm is an iterative scheme designed to solve convex optimization of the form # $$\umin{x} f(x) + g(x)$$ # where $f$ and $g$ are convex functions, of which one is able to # compute the proximal operators. # # We recall that the proximal operator of a function $h$ is defined by # $$\mathrm{prox}_{h}(x) \in {\rm argmin}_{u\in\R^N}\Big(\frac{1}{2}\norm{x-u}_2^2+h(u)\Big)$$ # # The Douglas-Rachford algorithm takes an arbitrary initialization element $s^{(0)}$, a parameter $\ga>0$ and iterates, for $k=1,2,\ldots$ # # $$# \left|\begin{array}{l} # x^{(k)} = \mathrm{prox}_{\gamma f} (s^{(k-1)} )\\ # z^{(k)} = \text{prox}_{\ga g}( 2x^{(k)}-s^{(k-1)})\\ # s^{(k)} = s^{(k-1)}+ z^{(k)}-x^{(k)}. # \end{array}\right. #$$ # # It is of course possible to inter-change the roles of $f$ and $g$, # which defines a different algorithm. # # The iterates $x^{(k)}$ converge to a solution of the minimization problem $\min f+g$. DR algorithm can be understood as an approximation of a fixed point solution to # $$# s = F(s) \mbox{ where } F(s) = s + \mathrm{prox}_{\gamma f}(2\mathrm{prox}_{\gamma g}(s)- s) - \mathrm{prox}_{\gamma g}(s) #$$ # Implementation of the Basis Pursuit via the Douglas-Rachford algorithm # --------------------------------- # It is possible to recast the basis pursuit procedure as the minimization of a sum $f+g$ # where $g(x) = \norm{x}_1$ and $f(x)=\iota_{\Omega}$ where $\Omega = # \enscond{x}{Ax=y}$ is an affine space, and $\iota_\Omega$ is the indicator # function # $$\iota_\Omega(x) = \choice{ 0 \qifq x \in \Omega, \\ +\infty \qifq x \notin \Omega. }$$ # # # The proximal operator of the $\ell^1$ norm is a soft thresholding operator: # $$\text{prox}_{\gamma \norm{\cdot}_1}(x)_j = \max\pa{ 0, 1-\frac{\ga}{\abs{x_j}} } x_j, \quad j=1,\ldots,n$$ # # Phase transition diagrams # A phase transition diagram is a usefull (simulation) tool to answer the following question: # # > Given a compression matrix $A$ of size $m\times n$ and a reconstrcution algorithm (for instance the Basis Pursuit via Douglas-Rachford) what is the maximal sparsity parameter $s$ such that "many" $s$-sparse signal $\hat x\in\R^n$ can be exactly reconstructed from $y=A\hat x$ (and $A$) using the algorithm? # # To answer this question, we iterate over $s=1,2,\cdots,m$ (no need to go beyond $m$) and for each $s$ we simulate *nbtest* signals $\hat x$ with sparsity $s$, run the reconstruction algorithm with $y=A\hat x$ and check if the reconstruction is exact (that is, let say, $||x_{restored} - \hat x||_2\leq 0.001$). If the reconstruction is exact for the *nbtest* signals then we plot a white pixel if reconstruction failed for all *nbtest* signals then we plot a black pixel. For every result inbetween we plot a grey pixel. (in what follows, we use colour red in place of the white colour and colour blue in place of black colour). # # This gives a line of the diagram matrix for a given number of measurements $m$. Then, we repeat this construction for all $m=1,2,\ldots,n$. # # We end up with a $n\times n$ matrix and use some heatmap representation to draw the phase transition diagram. (Actually, we stop at $s\leq n/2$ since the proportional case $s\sim m \sim n$ is not very interesting in Compressed sensing (even though this regime, sometimes called the "Kashin regime", has been extensively studied in the theory of Banach spaces since Dvoretsky theorem). # ### Douglas-Rachford procedure # In[2]: #proximal operator of \ell_1 norm def prox_gamma_g(x, gamma): return x - x/maximum(abs(x)/gamma,1) # soft-thresholding #proximal operator of the indicator function of an affine space def prox_f(x, y, A, pA): return x + pA.dot(y-A.dot(x)) def DR(n, y, A, pA, nbiter, gamma): s = zeros(n) for iter in range(nbiter): # iter goes from 0 to nbiter-1 x = prox_f(s, y, A, pA) z = prox_gamma_g(2*x-s, gamma) s = s + z - x return x # ### Phase transition matrix # In[3]: #initialization of the seed for reproducibility random.seed(0) # In[4]: def signal(n, sparsity): sel = random.permutation(n) sel = sel[0:sparsity] # indices of the nonzero elements of xsharp xsharp = zeros(n) xsharp[sel] = 1 return xsharp # In[9]: def phase_transition_mat(n, nbiter, gamma, nbtest): """return a n.n matrix with the number of reconstruction success for every 1\leq m \leq n measurements and sparsity 1\leq s\leq n/2 n : ambiant dimension of the signals nbiter : number of iteration in the DR algorithm gamma : threshold parameter in the soft threshold proximal function nbtest : number of tests for each pixel""" PTM = zeros((n,int(n/2))) set_ind_failure = [] for m in range(1,n+1):#construct one line of the Phase transition matrix for a given number of measurements P if (m % 20) == 0: print("line number {} done".format(m)) A = randn(m,n) / sqrt(m) pA = pinv(A) # pseudo-inverse. Equivalent to pA = A.T.dot(inv(A.dot(A.T))) ind_failure = 0 for sparsity in range(1,min(m, int(n/2))+1): nb_success = 0 for i in range(nbtest): xsharp = signal(n, sparsity) y = A.dot(xsharp) x_restored = DR(n, y, A, pA, nbiter, gamma) if norm(x_restored-xsharp, ord=2) <0.001: nb_success = nb_success + 1 PTM[m-1, sparsity-1] = nb_success return PTM # ### Construction of the Phase transition matrix # # > n: the size of the ambiant space # # > nbiter: the number of iteration in the Douglas Rachford algorithm # # > gamma: theshold parameter in the soft thresholding operator # # > nbtest: number of test (=number of signals) constructed for each pixel # In[13]: n, nbiter, gamma, nbtest = 30, 20, 1, 10 mat = phase_transition_mat(n, nbiter, gamma, nbtest)# construction of the matrix with the number of success among nbtest # ### Construction of the 'frontier' # In[14]: def frontier(mat, n): """construction of the phase transition frontier, i.e. first time the number of success goes below nbtest/2""" L = [] N = len(mat) for s in range(int(n/2)): m = 0 while mat[m,s] nb_curves: number of phase transition curves constructed. Those curves are then averaged in order to "smooth" the effect of randomness and get a "stable" phase transition # In[16]: n, nbiter, gamma, nbtest, nb_curves = 30, 20, 1, 10, 10 L = zeros(int(n/2)) for i in range(nb_curves): if (i % 10) == 0: print('step {} done'.format(i)) mat = phase_transition_mat(n, nbiter, gamma, nbtest) F = frontier(mat, n) L = [sum(a) for a in zip(L,F)] L_gauss = [i/nb_curves for i in L] # In[19]: #For next use, save the Gaussian phase transition frontier in the 'gaussian_phase_transition_n.p' file import pickle filename = 'gaussian_phase_transition_n_{}_nbtest_{}_nbcurves_{}.p'.format(n, nbtest, nb_curves) #for saving #with open(filename, "wb") as f: # pickle.dump(L_gauss, f) #for loading with open(filename, "rb") as f: L_gauss = pickle.load(f) # ### Draw the phase transition frontier for Gaussian measurements # In[20]: X = range(len(L_gauss)) plot(X,L_gauss, linewidth=4, color = 'blue', label="Gaussian phase transition") titre = "Gaussian measurements" plt.title(titre) plt.xlabel('sparsity') plt.ylabel('number of measurements') plt.legend(loc=4) #plt.savefig("phase_transition_curve_gaussian_100.png",bbox_inches='tight') # # Construction of (smoothed) phase transition frontiers for other type of measurements # # In the two papers (available [here](http://lecueguillaume.github.io/research/)) # > S. Dirksen, G. Lecué and H. Rauhut. # *On the gap between RIP-properties and sparse recovery conditions* # # >G. Lecué and S. Mendelson. # *Sparse recovery under weak moment assumptions* # # we proved that reconstruction properties (stability and robustness) of the basis pursuit algorithm goes far beyond Gaussian measurements. For instance, random matrices with i.i.d. components having only $\log n$ sub-gaussian moments have the same "reconstruction properties" has the Gaussian one ($n$ being the size of the signals). # # The aim of this notebook is to highlight this "universality phenomenum". # ### Random measurements matrices with $\psi_\alpha$ coordinates for $0<\alpha\leq2$ # # We construct matrices with i.i.d. coordinates with exponential moments. Note that if $g$ is a standard Gaussian variable then ${\rm sign}(g) |g|^{2/\alpha}$ is exactly a $\psi_\alpha$ random variable (with mean zero). # In[26]: def mat_exp_power(m, n, alpha): A = randn(m, n)/ sqrt(m) return np.multiply(np.power(np.absolute(A),int(2/alpha)), sign(A)) # In[27]: def phase_transition_mat_exp_power(n, nbiter, gamma, nbtest, alpha): PTM = zeros((n,int(n/2))) set_ind_failure = [] for m in range(20,n+1):#construct one line of the Phase transition matrix for a given number of measurements P A = mat_exp_power(m, n, alpha) pA = pinv(A) # pseudo-inverse. Equivalent to pA = A.T.dot(inv(A.dot(A.T))) ind_failure = 0 for sparsity in range(1,min(m, int(n/2))+1): nb_success = 0 for i in range(nbtest): xsharp = signal(n, sparsity) y = A.dot(xsharp) x_restored = DR(n, y, A, pA, nbiter, gamma) if norm(x_restored-xsharp, ord=2) <0.001: nb_success = nb_success + 1 PTM[m-1, sparsity-1] = nb_success return PTM # In[28]: n, nbiter, gamma, nbtest, alpha = 40, 20, 1, 10, 1 mat = phase_transition_mat_exp_power(n, nbiter, gamma, nbtest, alpha) # In[29]: P_min, P_max, S_min, S_max = 0, n, 0, int(n/2) fig = plt.imshow(mat[P_min:P_max, S_min:S_max], interpolation="gaussian", aspect='auto', origin = 'lower', extent=[S_min, S_max, P_min, P_max]) titre = "Psi_alpha measurements for alpha = {}".format(alpha) plt.title(titre) plt.xlabel('sparsity') plt.ylabel('number of measurements') #empirical phase transition X = range(int(n/2)) L = frontier(mat, n) plot(X,L, linewidth=4, color = 'black', label='phase transition') plt.legend(loc=4) #filename = "exp_power_{}_{}.png".format(alpha, n) #plt.savefig(filename,bbox_inches='tight') # In[30]: n, nbiter, gamma, nbtest, nb_curves = 40, 20, 1, 15, 20 L = zeros(int(n/2)) dict_exp_power = {2: L, 1.8: L, 1.5: L, 1.2: L, 1: L, 0.8: L, 0.5: L, 0.1: L, 0.05: L, 0.01: L} for alpha in dict_exp_power.keys(): print("power {} running".format(alpha)) for i in range(nb_curves): mat = phase_transition_mat_exp_power(n, nbiter, gamma, nbtest, alpha) F = frontier(mat, n) dict_exp_power[alpha] = [sum(a) for a in zip(dict_exp_power[alpha], F)] dict_exp_power[alpha] = [ele/nb_curves for ele in dict_exp_power[alpha]] # In[6]: #save the dictionary of phase transitions curves for later use import pickle #To save the dictionary #with open('dictionnaries_exp_power_100_v3.p', 'wb') as fp: # pickle.dump(dict_exp_power, fp) #to load the dictionary #n, nbiter, gamma, nbtest, nb_curves = 100, 20, 1, 16, 30 #with open('dictionnaries_exp_power_100_v3.p', 'rb') as fp: # dict_exp_power = pickle.load(fp) # In[9]: #N = 100 plt.figure(figsize=(8,5)) X = range(int(n/2)) for key in sort(dict_exp_power.keys()): L = dict_exp_power[key] text = 'exp power {}'.format(key) if key == 2: plot(X, L, 'r--', label = text, linewidth=4) else: plot(X, L, label = text) plt.legend(loc=4) plt.xlabel('sparsity') plt.ylabel('number of measurements') plt.title('phase transition curves -- exponential moments variables') #plot(X,L_gauss, linewidth=3, color = 'blue', label="Gaussian phase transition") #plt.legend(loc=4) #plt.savefig("phase_transition_curves_exp_power_100_v3.png",bbox_inches='tight') # ### Random measurements matrices with coordinates in $L_p$ # For this example, we construct $m\times n$ i.i.d. Student variables with $k$ degrees of liberty. It has $p$ moments if and only if $k>p$. # In[26]: def mat_power(m ,n, p): return random.standard_t(p, size=(m, n)) # In[27]: def phase_transition_mat_power(n, nbiter, gamma, nbtest, p): PTM = zeros((n, int(n/2))) set_ind_failure = [] for m in range(1, n+1):#construct one line of the Phase transition matrix for a given number of measurements P A = mat_power(m, n, p) pA = pinv(A) # pseudo-inverse. Equivalent to pA = A.T.dot(inv(A.dot(A.T))) ind_failure = 0 for sparsity in range(1, min(m, int(n/2))+1): nb_success = 0 for i in range(nbtest): xsharp = signal(n, sparsity) y = A.dot(xsharp) x_restored = DR(n, y, A, pA, nbiter, gamma) if norm(x_restored-xsharp, ord=2) <0.001: nb_success = nb_success + 1 PTM[m-1, sparsity-1] = nb_success return PTM # In[28]: n, nbiter, gamma, nbtest, nb_curves = 30, 20, 1, 10, 20 L = zeros(int(n/2)) dict_power = {30: L, 20: L, 15: L, 10: L, 5: L, 4: L, 3: L, 2: L} for p in dict_power.keys(): print('power {} running'.format(p)) for i in range(nb_curves): mat = phase_transition_mat_power(n, nbiter, gamma, nbtest, p) F = frontier(mat, n) dict_power[p] = [sum(a) for a in zip(dict_power[p], F)] dict_power[p] = [ele/nb_curves for ele in dict_power[p]] # In[16]: #save the dictionary of phase transitions curves to speed up later investigation import pickle #Save the dictionary #with open('dictionnaries_power_100.p', 'wb') as fp: # pickle.dump(dict_power, fp) #Load the dictionary #with open('dictionnaries_power_100.p', 'rb') as fp: # dict_power = pickle.load(fp) # In[25]: X = range(int(n/2)) plt.figure(figsize=(8,5)) for key in dict_power.keys(): if key != 15: L = dict_power[key] text = 'degree {}'.format(key) plot(X, L, label = text) plt.xlabel('sparsity') plt.ylabel('number of measurements') plt.title('phase transition curves - Student variables') #Gaussian phase transition plot(X,L_gauss, 'r--', linewidth=3, label="Gaussian phase transition") #plt.legend(loc=4) #plt.savefig("phase_transition_curves_student_100.png",bbox_inches='tight') # ### Theoretical guarentee on phase transition # # Asymptotically when $n\to\infty$, existence of a phase transition, which is roughtly $$m \geq 2e s \log\Big(\frac{n}{\sqrt{\pi}s}\Big)\approx 5.4 s \log\Big(\frac{0.6 n}{s}\Big)$$ has been proved in several papers: # >D. L. Donoho. *High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension.* Discrete and Computational Geometry, 35(4):617–652, 2006. # # > D. L. Donoho and J. Tanner. *Neighborliness of randomly projected simplices in high dimensions.* Proceedings of the National Academy of Sciences of the United States of America, 102(27):9452–9457, 2005. # # > D. L. Donoho and J. Tanner. *Counting faces of randomly projected polytopes when the projection radically lowers dimension.* Journal of the American Mathematical Society, 22(1):1–53, 2009. # # >D. L. Donoho and J. Tanner. *Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing*. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906):4273–4293, 2009. # # > S. Oymak and J. A. Tropp. *Universality laws for randomized dimension reduction, with applications* # # In[ ]: