Notebook

Large-Scale Optimization¶

$\newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \newcommand{\prox}{\text{prox}} $

U5. PGD, AGD, and ADMM¶

** TU Dortmund University, Dr. Sangkyun Lee **

Composite Objective¶

Decomposition: $$ \min_{x\in\R^n} \;\; f(x) = \underbrace{F(x)}_{\text{convex, smooth}} + \underbrace{\Psi(x)}_{\text{convex, nonsmooth, simple}} $$

Ex. Convex regularization problems

LASSO: $f(x) = \frac12 \|y-Dx\|^2 + \lambda \|x\|_1$
SVM: $f(x) = \frac{1}{m} \sum_{i=1}^m \max\{0, 1 - y_i(\langle d_i, x \rangle + x_0)\} + \lambda \|x\|^2$

PGD¶

For $k=0,1,2,\dots$:

\begin{align*} x_{k+1} &= \text{prox}_{\alpha_k \Psi} ( x_k - \alpha_k \nabla F(x_k)) \end{align*}

Required Conditions:

$\alpha_k \le \frac{1}{L}$,
$L$ is the Lipschitz constant of $\nabla F$.

Monotone convergence with rate: $$ f(x_{k}) - f(x^*) \le \frac{\|x_0 - x^*\|^2}{2\alpha k} $$

AGD¶

For $k=0,1,2,\dots$:

\begin{align*} y_k &= (1-\theta_k) x_k + \theta_k u_k \\ x_{k+1} &= \text{prox}_{\alpha_k \Psi} ( y_k - \alpha_k \nabla F(y_k)) \\ u_{k+1} &= x_k + \frac{1}{\theta_k} (x_{k+1} - x_k) \end{align*}

Required Conditions:

$\alpha_{k+1} \le \alpha_k \le \frac{1}{L}$
$\theta_0$ = 1, $\frac{1-\theta_{k+1}}{\theta_{k+1}^2} \le \frac{1}{\theta_k^2}$
$u_0 = x_0$

Non-monotone convergence with rate with $\theta_k = 2/(k+2)$:

$$ f(x_k) - f(x^*) \le \frac{2\|x_0 - x^*\|^2}{\alpha (k+1)^2} $$

AGD (Simple Ver)¶

With $\theta_k = \frac{2}{k+2}$,

\begin{align*} y_k &= x_k + \frac{k-1}{k+2} (x_k - x_{k-1})\\ x_{k+1} &= \text{prox}_{\alpha_k \Psi} ( y_k - \alpha_k \nabla F(y_k)) \end{align*}

$\Leftrightarrow$

\begin{align*} y_k &= (1-\theta_k) x_k + \theta_k u_k \\ x_{k+1} &= \text{prox}_{\alpha_k \Psi} ( y_k - \alpha_k \nabla F(y_k)) \\ u_{k+1} &= x_k + \frac{1}{\theta_k} (x_{k+1} - x_k) \end{align*}

Does this $\theta_k$ satisfies $\frac{1-\theta_{k+1}}{\theta_{k+1}^2} \le \frac{1}{\theta_k^2}$ ?

In [6]:

using SymPy

@vars k

θ = 2/(k+2)
θp = 2/(k+3)

expr = (1-θp)/θp^2 - 1/θ^2

simplify(expr)

Out[6]:

$$- \frac{1}{4}$$

AGD (FISTA)¶

$y_1 = x_0 \in \R^n$, $t_1 = 1$

For $k=1,2,\dots$:

\begin{align*} x_k &= \prox_{\alpha_k\Psi} (y_k - \alpha_k\nabla F(y_k))\\ t_{k+1} &= \frac{1+\sqrt{1+4t_k^2}}{2}\\ y_{k+1} &= x_k + \left( \frac{t_k-1}{t_{k+1}} \right) (x_k - x_{k-1}) \end{align*}

Here, $\theta_k = \frac{t_{k+1}}{t_k - 1}$: does it satisfy $\frac{1-\theta_{k+1}}{\theta_{k+1}^2} \le \frac{1}{\theta_k^2}$ ?

In [11]:

tnext(t) = (1+sqrt(1+4t^2))/2

tseq = function(n)
    vals = zeros(n); vals[1] = 1
    for i=2:n 
        vals[i] = tnext(vals[i-1]) 
    end
    return vals
end

using SymPy, PyPlot

@vars t
tt = (1+sqrt(1+4t^2))/2
ttt = (1+sqrt(1+4tt^2))/2
θ = tt/(t-1)
θ2 = ttt/(tt-1)

expr = (1-θ2)/θ2^2 - 1/θ^2
simplify(expr)
tvals = tseq(100)
expr_vals = map(expr, tvals)
plot(tvals, N(expr_vals))
# xlabel("t")
# ylabel("expr")

Out[11]:

1-element Array{Any,1}:
 PyObject <matplotlib.lines.Line2D object at 0x31e9d2a90>

In [16]:

#
# Noisy Compressed sensing
#
using StatsBase, PyPlot

include("../../julia/lasso.ISTA.jl")
include("../../julia/lasso.FISTA.jl")

n=1000
s=10
x=zeros(n)
x[sample(1:n,s)] = rand([-1,1], s).*sqrt(log(n))

k=round(Int, 3*s*log(n/s)) #3*s*log(n/s))
A=randn(k,n)
println("A: k=$k, n=$n\n")

# noisy measurement
y=A*x + .1*randn(k)

lam=10
maxiter=1000
eps = 1e-4

# reconstruction
ista = lassoISTA(A, y, step="const", lam=lam, maxit=maxiter, eps=eps, repf=true, repg=true);
simple = lassoFISTA(A, y, step="simple", lam=lam, maxit=maxiter, eps=eps, repf=true, repg=true);
fista = lassoFISTA(A, y, step="const", lam=lam, maxit=maxiter, eps=eps, repf=true, repg=true);

z = ista[1]
println("True signal: sparsity=$(countnz(x)), dim=$n")
println("Recovered signal: sparsity=$(countnz(z)), mse=$(vecnorm(x-z,2)^2/n)")
sx=find(abs(x).>0)
sz=find(abs(z).>0)
print(sx)
print(sz)

subplot(411); plot(1:n, x); ylabel("True signal")
subplot(412); plot(1:n, ista[1]); ylabel("Recoverd (PGD)")
subplot(413); plot(1:n, simple[1]); ylabel("Recoverd (AGD:simple)")
subplot(414); plot(1:n, fista[1]); ylabel("Recoverd (AGD:fista)")
PyPlot.subplots_adjust(hspace=.5)

A: k=138, n=1000

L = 1896.6593157280447
L = 1896.6593157280447
L = 1896.6593157280447
True signal: sparsity=10, dim=1000
Recovered signal: sparsity=10, mse=4.057473641809406e-5
[152,162,423,424,556,609,613,852,875,990][152,162,423,424,556,609,613,852,875,990]

In [23]:

subplot(131)
PyPlot.semilogy(1:maxiter, ista[2], color="blue");
PyPlot.semilogy(1:maxiter, simple[2], color="green");
PyPlot.semilogy(1:maxiter, fista[2], color="red", linestyle="dashed");
xlabel("Iterations")
ylabel("Objective function value")
ax = gca()
ax[:set_ylim]([100, 1000])
ax[:legend](("PGD", "AGD (simple)", "AGD (fista)"), loc=1)

subplot(132)
PyPlot.semilogy(1:maxiter, ista[3], color="blue");
PyPlot.semilogy(1:maxiter, simple[3], color="green");
PyPlot.semilogy(1:maxiter, fista[3], color="red", linestyle="dashed");
xlabel("Iterations")
ylabel("Gradient inf norm")
ax = gca()
ax[:set_ylim]([0, 100])

subplot(133)
PyPlot.semilogy(1:maxiter, ista[4], color="blue");
PyPlot.semilogy(1:maxiter, simple[4], color="green");
PyPlot.semilogy(1:maxiter, fista[4], color="red", linestyle="dashed");
xlabel("Iterations")
ylabel("Sparsity")
ax = gca()
ax[:set_ylim]([0, n])

PyPlot.subplots_adjust(right=1.2, wspace=.4)

Fused Lasso¶

$$ \min_{x\in\R^n} \;\; \frac12 \|y - x\|^2 + \lambda \|Dx\|_1 $$

where $D \in R^{|E|\times |V|}$ is an ordered incidence matrix of a graph $G = (V,E)$,

$$ D_{ev} = \begin{cases} +1 & \text{if $v$ is the head of the edge $e$} \\ -1 & \text{if $v$ is the tail of the edge $e$} \\ 0 & \text{otherwise} \end{cases} $$

We denote by $n=|V|$ and $m=|E|$.

Fused Lasso by PGD¶

PGD:

$$ x_{k+1} = \prox_{\alpha_k \lambda \|D\cdot\|_1} (x_k + \alpha_k (y-x)) $$

Is this the same type of soft-thresholding operator we've seen for $\|x\|_1$?

Fused Lasso by ADMM¶

$$ \min_{x\in\R^n} \;\; \frac12 \|y - x\|^2 + \lambda \|Dx\|_1 $$

We like to re-use the prox operator for $\|x\|_1$. How would you change the formulation to do so?

Fused Lasso (ADMM)¶

\begin{align*} \min_{x\in\R^n, z \in \R^m} &\;\; \frac12 \|y - x\|^2 + \lambda \|z\|_1 \\ &\;\; Dx - z = 0 \end{align*}

The augmented Lagrangian (with multipliers $\alpha \in \R^m$),

\begin{align*} L_\rho(x,z; \alpha) = \frac12 \|y - x\|^2 + \lambda \|z\|_1 + \alpha^T(Dx-z) + \frac{\rho}{2} \|Dx-z\|^2 \end{align*}

ADMM¶

\begin{align*} L_\rho(x,z; \alpha) = \frac12 \|y - x\|^2 + \lambda \|z\|_1 + \alpha^T(Dx-z) + \frac{\rho}{2} \|Dx-z\|^2 \end{align*}$$ \begin{cases} x_{k+1} &= \argmin_{x} \; L_\rho(x,z_k; \alpha_k) = \argmin_x \;\frac12 \|y - x\|^2 + \alpha_k^T Dx + \frac{\rho}{2} \|Dx-z_k\|^2 \\ z_{k+1} &= \argmin_{z} \; L_\rho(x_{k+1},z; \alpha_k) = \argmin_z \; \lambda \|z\|_1 - \alpha_k^T z + \frac{\rho}{2} \|z - Dx_{k+1}\|^2 \\ \alpha_{k+1} &= \alpha_k + \rho(Dx_{k+1} - z_{k+1}) \end{cases} $$

x-update¶

$$ x_{k+1} = \argmin_x \;\frac12 \|y - x\|^2 + \alpha_k^T Dx + \frac{\rho}{2} \|Dx-z_k\|^2 $$

\begin{align*} & -(y-x_{k+1}) + D^T\alpha_k + \rho D^T(Dx_{k+1}-z_k) = 0 \\ & \Rightarrow (\rho D^TD + I)x_{k+1} = y - D^T\alpha_k + \rho D^Tz_k\\ & \Rightarrow x_{k+1} = (\rho D^TD + I)^{-1}(y - D^T\alpha_k + \rho D^Tz_k) \end{align*}

z-update¶

$$ z_{k+1} = \argmin_z \; \lambda \|z\|_1 - \alpha_k^T z + \frac{\rho}{2} \|z - Dx_{k+1}\|^2 $$

\begin{align*} z_{k+1} &= \argmin_z \; - \alpha_k^T z + \frac{\rho}{2} \|z - Dx_{k+1}\|^2 + \lambda \|z\|_1 \\ &= \argmin_z \; \frac{\rho}{2} \|z - Dx_{k+1} - \frac{1}{\rho}\alpha_k\|^2 + \lambda \|z\|_1 \\ &= \prox_{\lambda/\rho \|\cdot\|_1} \left( Dx_{k+1} + \frac{1}{\rho}\alpha_k \right) \end{align*}

In [58]:

using Images, ImageView, Colors, PyPlot
pic = load("./tu32.tiff");
pic

Out[58]:

In [59]:

y = separate(pic);
picx,picy = size(y)
y = y[:,:,1];
y = float(y[:]);

# noise
r = .05
y[rand(picx*picy) .> 1-r/2] = 1;  # salt
y[rand(picx*picy) .< r/2] = 0;    # pepper

print("picture dim = ($picx, $picy)\n")

yim = reshape(y, picx, picy);
Image(map(RGB, yim, yim, yim)')

picture dim = (32, 32)

Out[59]:

In [60]:

n = picx*picy
m = 2n - (picx + picy-2) - 2

print("m,n = $m, $n, length img vec = $(length(y))")

m,n = 1984, 1024, length img vec = 1024

In [83]:

D = spzeros(m,n);
e=1;

valid(coord) = (coord[1] >= 1 && coord[1] <= picx && coord[2] >= 1 && coord[2] <= picy)? true:false
c2i(coord) = (coord[2]-1)*picx + coord[1]

for i=1:picx, j=1:picy

    coords = ( (i,j+1), (i+1,j) )
    
    for c in coords
        if(valid(c)) 
            D[e, c2i((i,j))] = 1;
            D[e, c2i(c)] = -1;
            e=e+1;
        end
    end
end

print("total added edges = $(e-1)")

total added edges = 1984

In [139]:

# Soft-thresholding function
soft(v, lam) = (sign(v) .* max(abs(v) - lam, 0))

DD = D'*D

ρ = 1000; 
δ = 0;


@time L = chol(ρ*DD + (1+δ)*eye(n), Val{:L});

  0.044810 seconds (26 allocations: 40.085 MB, 19.31% gc time)

In [140]:

maxiter = 25000
λ = .3

x = y
z = D*x;
α = zeros(m)
Dalp = zeros(n)

prevX, prevZ, prevAlp = x, z, α

@printf "iter   primal    infeas      rho      diff_x    diff_z   diff_alp   pd_gap\n"

for k=1:maxiter
   
    tmp = y - Dalp + ρ*(D'*z);
    x = L'\(L\tmp);
    Dx = D*x;
    z = soft(Dx + (1/ρ)*α, λ/ρ)
    r = Dx-z
    α = α + ρ*r
    Dalp = D'*α
        
    if(k%1000==0)
        
        primal = .5*vecnorm(y-x)^2 + λ*vecnorm(z,1)
        pd_gap = (λ*vecnorm(Dx,1) + dot(Dalp,x))/max(1,primal)
        infeas = vecnorm(r,Inf)

        reld_x = vecnorm(x-prevX)/max(1,vecnorm(x))
        reld_z = vecnorm(z-prevZ)/max(1,vecnorm(z))
        reld_alp = vecnorm(α-prevAlp)/max(1,vecnorm(α))
        
        ρ = ρ*1.2;
        L = chol(ρ*DD + (1+δ)*eye(n), Val{:L});
        
        @printf "%04d: %8.3e %8.3e %8.3e %8.3e %8.3e %8.3e %8.3e\n" k primal infeas ρ reld_x reld_z reld_alp pd_gap
    end
   
    prevX, prevZ, prevAlp = x, z, α
    
end

iter   primal    infeas      rho      diff_x    diff_z   diff_alp   pd_gap
1000: 4.227e+01 1.550e-05 1.200e+03 7.269e-05 4.433e-04 4.209e-03 1.720e+00
2000: 3.572e+01 3.630e-05 1.440e+03 3.931e-05 3.553e-04 1.163e-02 1.444e+00
3000: 3.277e+01 4.115e-06 1.728e+03 2.458e-05 3.025e-04 1.833e-03 1.242e+00
4000: 3.123e+01 1.928e-07 2.074e+03 1.614e-05 2.399e-04 1.701e-04 1.101e+00
5000: 3.038e+01 1.674e-08 2.488e+03 1.124e-05 1.927e-04 1.945e-05 9.982e-01
6000: 2.985e+01 4.686e-08 2.986e+03 8.318e-06 1.571e-04 8.193e-05 9.222e-01
7000: 2.950e+01 6.449e-09 3.583e+03 6.216e-06 1.286e-04 1.319e-05 8.639e-01
8000: 2.926e+01 1.834e-07 4.300e+03 4.810e-06 1.053e-04 1.600e-04 8.188e-01
9000: 2.908e+01 3.258e-07 5.160e+03 3.755e-06 8.461e-05 8.298e-04 7.837e-01
10000: 2.895e+01 2.240e-09 6.192e+03 2.951e-06 6.876e-05 6.836e-06 7.565e-01
11000: 2.885e+01 1.385e-09 7.430e+03 2.357e-06 5.595e-05 5.478e-06 7.346e-01
12000: 2.878e+01 5.527e-09 8.916e+03 1.818e-06 3.962e-05 3.765e-05 7.181e-01
13000: 2.874e+01 1.976e-08 1.070e+04 1.264e-06 2.665e-05 1.595e-04 7.072e-01
14000: 2.871e+01 3.789e-08 1.284e+04 1.006e-06 2.052e-05 3.615e-04 6.996e-01
15000: 2.870e+01 3.046e-10 1.541e+04 7.436e-07 1.551e-05 3.784e-06 6.939e-01
16000: 2.869e+01 1.448e-10 1.849e+04 5.931e-07 1.269e-05 2.497e-06 6.897e-01
17000: 2.868e+01 1.512e-09 2.219e+04 4.819e-07 9.704e-06 1.878e-05 6.864e-01
18000: 2.867e+01 1.288e-10 2.662e+04 3.816e-07 7.972e-06 2.805e-06 6.838e-01
19000: 2.866e+01 4.606e-09 3.195e+04 2.892e-07 5.334e-06 9.588e-05 6.819e-01
20000: 2.866e+01 6.141e-11 3.834e+04 2.311e-07 4.390e-06 2.046e-06 6.806e-01
21000: 2.866e+01 4.456e-11 4.601e+04 1.869e-07 3.622e-06 1.516e-06 6.795e-01
22000: 2.866e+01 3.180e-11 5.521e+04 1.421e-07 2.555e-06 1.434e-06 6.787e-01
23000: 2.866e+01 2.162e-11 6.625e+04 1.155e-07 2.112e-06 1.087e-06 6.782e-01
24000: 2.866e+01 1.419e-11 7.950e+04 9.441e-08 1.749e-06 8.322e-07 6.777e-01
25000: 2.865e+01 7.384e-12 9.540e+04 7.612e-08 1.432e-06 5.896e-07 6.773e-01

In [141]:

xim = reshape(x, picx, picy);
recovered = Image(map(RGB, xim, xim, xim)')

Out[141]:

In [ ]: