Decomposition: $$ \min_{x\in\R^n} \;\; f(x) = \underbrace{F(x)}_{\text{convex, smooth}} + \underbrace{\Psi(x)}_{\text{convex, nonsmooth, simple}} $$
Ex. Convex regularization problems
For $k=0,1,2,\dots$:
\begin{align*} x_{k+1} &= \text{prox}_{\alpha_k \Psi} ( x_k - \alpha_k \nabla F(x_k)) \end{align*}Required Conditions:
Monotone convergence with rate: $$ f(x_{k}) - f(x^*) \le \frac{\|x_0 - x^*\|^2}{2\alpha k} $$
For $k=0,1,2,\dots$:
\begin{align*} y_k &= (1-\theta_k) x_k + \theta_k u_k \\ x_{k+1} &= \text{prox}_{\alpha_k \Psi} ( y_k - \alpha_k \nabla F(y_k)) \\ u_{k+1} &= x_k + \frac{1}{\theta_k} (x_{k+1} - x_k) \end{align*}Required Conditions:
Non-monotone convergence with rate with $\theta_k = 2/(k+2)$:
$$ f(x_k) - f(x^*) \le \frac{2\|x_0 - x^*\|^2}{\alpha (k+1)^2} $$With $\theta_k = \frac{2}{k+2}$,
\begin{align*} y_k &= x_k + \frac{k-1}{k+2} (x_k - x_{k-1})\\ x_{k+1} &= \text{prox}_{\alpha_k \Psi} ( y_k - \alpha_k \nabla F(y_k)) \end{align*}$\Leftrightarrow$
\begin{align*} y_k &= (1-\theta_k) x_k + \theta_k u_k \\ x_{k+1} &= \text{prox}_{\alpha_k \Psi} ( y_k - \alpha_k \nabla F(y_k)) \\ u_{k+1} &= x_k + \frac{1}{\theta_k} (x_{k+1} - x_k) \end{align*}Does this $\theta_k$ satisfies $\frac{1-\theta_{k+1}}{\theta_{k+1}^2} \le \frac{1}{\theta_k^2}$ ?
using SymPy
@vars k
θ = 2/(k+2)
θp = 2/(k+3)
expr = (1-θp)/θp^2 - 1/θ^2
simplify(expr)
$y_1 = x_0 \in \R^n$, $t_1 = 1$
For $k=1,2,\dots$:
\begin{align*} x_k &= \prox_{\alpha_k\Psi} (y_k - \alpha_k\nabla F(y_k))\\ t_{k+1} &= \frac{1+\sqrt{1+4t_k^2}}{2}\\ y_{k+1} &= x_k + \left( \frac{t_k-1}{t_{k+1}} \right) (x_k - x_{k-1}) \end{align*}Here, $\theta_k = \frac{t_{k+1}}{t_k - 1}$: does it satisfy $\frac{1-\theta_{k+1}}{\theta_{k+1}^2} \le \frac{1}{\theta_k^2}$ ?
tnext(t) = (1+sqrt(1+4t^2))/2
tseq = function(n)
vals = zeros(n); vals[1] = 1
for i=2:n
vals[i] = tnext(vals[i-1])
end
return vals
end
using SymPy, PyPlot
@vars t
tt = (1+sqrt(1+4t^2))/2
ttt = (1+sqrt(1+4tt^2))/2
θ = tt/(t-1)
θ2 = ttt/(tt-1)
expr = (1-θ2)/θ2^2 - 1/θ^2
simplify(expr)
tvals = tseq(100)
expr_vals = map(expr, tvals)
plot(tvals, N(expr_vals))
# xlabel("t")
# ylabel("expr")
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x31e9d2a90>
#
# Noisy Compressed sensing
#
using StatsBase, PyPlot
include("../../julia/lasso.ISTA.jl")
include("../../julia/lasso.FISTA.jl")
n=1000
s=10
x=zeros(n)
x[sample(1:n,s)] = rand([-1,1], s).*sqrt(log(n))
k=round(Int, 3*s*log(n/s)) #3*s*log(n/s))
A=randn(k,n)
println("A: k=$k, n=$n\n")
# noisy measurement
y=A*x + .1*randn(k)
lam=10
maxiter=1000
eps = 1e-4
# reconstruction
ista = lassoISTA(A, y, step="const", lam=lam, maxit=maxiter, eps=eps, repf=true, repg=true);
simple = lassoFISTA(A, y, step="simple", lam=lam, maxit=maxiter, eps=eps, repf=true, repg=true);
fista = lassoFISTA(A, y, step="const", lam=lam, maxit=maxiter, eps=eps, repf=true, repg=true);
z = ista[1]
println("True signal: sparsity=$(countnz(x)), dim=$n")
println("Recovered signal: sparsity=$(countnz(z)), mse=$(vecnorm(x-z,2)^2/n)")
sx=find(abs(x).>0)
sz=find(abs(z).>0)
print(sx)
print(sz)
subplot(411); plot(1:n, x); ylabel("True signal")
subplot(412); plot(1:n, ista[1]); ylabel("Recoverd (PGD)")
subplot(413); plot(1:n, simple[1]); ylabel("Recoverd (AGD:simple)")
subplot(414); plot(1:n, fista[1]); ylabel("Recoverd (AGD:fista)")
PyPlot.subplots_adjust(hspace=.5)
A: k=138, n=1000 L = 1896.6593157280447 L = 1896.6593157280447 L = 1896.6593157280447 True signal: sparsity=10, dim=1000 Recovered signal: sparsity=10, mse=4.057473641809406e-5 [152,162,423,424,556,609,613,852,875,990][152,162,423,424,556,609,613,852,875,990]
subplot(131)
PyPlot.semilogy(1:maxiter, ista[2], color="blue");
PyPlot.semilogy(1:maxiter, simple[2], color="green");
PyPlot.semilogy(1:maxiter, fista[2], color="red", linestyle="dashed");
xlabel("Iterations")
ylabel("Objective function value")
ax = gca()
ax[:set_ylim]([100, 1000])
ax[:legend](("PGD", "AGD (simple)", "AGD (fista)"), loc=1)
subplot(132)
PyPlot.semilogy(1:maxiter, ista[3], color="blue");
PyPlot.semilogy(1:maxiter, simple[3], color="green");
PyPlot.semilogy(1:maxiter, fista[3], color="red", linestyle="dashed");
xlabel("Iterations")
ylabel("Gradient inf norm")
ax = gca()
ax[:set_ylim]([0, 100])
subplot(133)
PyPlot.semilogy(1:maxiter, ista[4], color="blue");
PyPlot.semilogy(1:maxiter, simple[4], color="green");
PyPlot.semilogy(1:maxiter, fista[4], color="red", linestyle="dashed");
xlabel("Iterations")
ylabel("Sparsity")
ax = gca()
ax[:set_ylim]([0, n])
PyPlot.subplots_adjust(right=1.2, wspace=.4)
where $D \in R^{|E|\times |V|}$ is an ordered incidence matrix of a graph $G = (V,E)$,
$$ D_{ev} = \begin{cases} +1 & \text{if $v$ is the head of the edge $e$} \\ -1 & \text{if $v$ is the tail of the edge $e$} \\ 0 & \text{otherwise} \end{cases} $$We denote by $n=|V|$ and $m=|E|$.
PGD:
$$ x_{k+1} = \prox_{\alpha_k \lambda \|D\cdot\|_1} (x_k + \alpha_k (y-x)) $$Is this the same type of soft-thresholding operator we've seen for $\|x\|_1$?
We like to re-use the prox operator for $\|x\|_1$. How would you change the formulation to do so?
The augmented Lagrangian (with multipliers $\alpha \in \R^m$),
\begin{align*} L_\rho(x,z; \alpha) = \frac12 \|y - x\|^2 + \lambda \|z\|_1 + \alpha^T(Dx-z) + \frac{\rho}{2} \|Dx-z\|^2 \end{align*}using Images, ImageView, Colors, PyPlot
pic = load("./tu32.tiff");
pic
y = separate(pic);
picx,picy = size(y)
y = y[:,:,1];
y = float(y[:]);
# noise
r = .05
y[rand(picx*picy) .> 1-r/2] = 1; # salt
y[rand(picx*picy) .< r/2] = 0; # pepper
print("picture dim = ($picx, $picy)\n")
yim = reshape(y, picx, picy);
Image(map(RGB, yim, yim, yim)')
picture dim = (32, 32)
n = picx*picy
m = 2n - (picx + picy-2) - 2
print("m,n = $m, $n, length img vec = $(length(y))")
m,n = 1984, 1024, length img vec = 1024
D = spzeros(m,n);
e=1;
valid(coord) = (coord[1] >= 1 && coord[1] <= picx && coord[2] >= 1 && coord[2] <= picy)? true:false
c2i(coord) = (coord[2]-1)*picx + coord[1]
for i=1:picx, j=1:picy
coords = ( (i,j+1), (i+1,j) )
for c in coords
if(valid(c))
D[e, c2i((i,j))] = 1;
D[e, c2i(c)] = -1;
e=e+1;
end
end
end
print("total added edges = $(e-1)")
total added edges = 1984
# Soft-thresholding function
soft(v, lam) = (sign(v) .* max(abs(v) - lam, 0))
DD = D'*D
ρ = 1000;
δ = 0;
@time L = chol(ρ*DD + (1+δ)*eye(n), Val{:L});
0.044810 seconds (26 allocations: 40.085 MB, 19.31% gc time)
maxiter = 25000
λ = .3
x = y
z = D*x;
α = zeros(m)
Dalp = zeros(n)
prevX, prevZ, prevAlp = x, z, α
@printf "iter primal infeas rho diff_x diff_z diff_alp pd_gap\n"
for k=1:maxiter
tmp = y - Dalp + ρ*(D'*z);
x = L'\(L\tmp);
Dx = D*x;
z = soft(Dx + (1/ρ)*α, λ/ρ)
r = Dx-z
α = α + ρ*r
Dalp = D'*α
if(k%1000==0)
primal = .5*vecnorm(y-x)^2 + λ*vecnorm(z,1)
pd_gap = (λ*vecnorm(Dx,1) + dot(Dalp,x))/max(1,primal)
infeas = vecnorm(r,Inf)
reld_x = vecnorm(x-prevX)/max(1,vecnorm(x))
reld_z = vecnorm(z-prevZ)/max(1,vecnorm(z))
reld_alp = vecnorm(α-prevAlp)/max(1,vecnorm(α))
ρ = ρ*1.2;
L = chol(ρ*DD + (1+δ)*eye(n), Val{:L});
@printf "%04d: %8.3e %8.3e %8.3e %8.3e %8.3e %8.3e %8.3e\n" k primal infeas ρ reld_x reld_z reld_alp pd_gap
end
prevX, prevZ, prevAlp = x, z, α
end
iter primal infeas rho diff_x diff_z diff_alp pd_gap 1000: 4.227e+01 1.550e-05 1.200e+03 7.269e-05 4.433e-04 4.209e-03 1.720e+00 2000: 3.572e+01 3.630e-05 1.440e+03 3.931e-05 3.553e-04 1.163e-02 1.444e+00 3000: 3.277e+01 4.115e-06 1.728e+03 2.458e-05 3.025e-04 1.833e-03 1.242e+00 4000: 3.123e+01 1.928e-07 2.074e+03 1.614e-05 2.399e-04 1.701e-04 1.101e+00 5000: 3.038e+01 1.674e-08 2.488e+03 1.124e-05 1.927e-04 1.945e-05 9.982e-01 6000: 2.985e+01 4.686e-08 2.986e+03 8.318e-06 1.571e-04 8.193e-05 9.222e-01 7000: 2.950e+01 6.449e-09 3.583e+03 6.216e-06 1.286e-04 1.319e-05 8.639e-01 8000: 2.926e+01 1.834e-07 4.300e+03 4.810e-06 1.053e-04 1.600e-04 8.188e-01 9000: 2.908e+01 3.258e-07 5.160e+03 3.755e-06 8.461e-05 8.298e-04 7.837e-01 10000: 2.895e+01 2.240e-09 6.192e+03 2.951e-06 6.876e-05 6.836e-06 7.565e-01 11000: 2.885e+01 1.385e-09 7.430e+03 2.357e-06 5.595e-05 5.478e-06 7.346e-01 12000: 2.878e+01 5.527e-09 8.916e+03 1.818e-06 3.962e-05 3.765e-05 7.181e-01 13000: 2.874e+01 1.976e-08 1.070e+04 1.264e-06 2.665e-05 1.595e-04 7.072e-01 14000: 2.871e+01 3.789e-08 1.284e+04 1.006e-06 2.052e-05 3.615e-04 6.996e-01 15000: 2.870e+01 3.046e-10 1.541e+04 7.436e-07 1.551e-05 3.784e-06 6.939e-01 16000: 2.869e+01 1.448e-10 1.849e+04 5.931e-07 1.269e-05 2.497e-06 6.897e-01 17000: 2.868e+01 1.512e-09 2.219e+04 4.819e-07 9.704e-06 1.878e-05 6.864e-01 18000: 2.867e+01 1.288e-10 2.662e+04 3.816e-07 7.972e-06 2.805e-06 6.838e-01 19000: 2.866e+01 4.606e-09 3.195e+04 2.892e-07 5.334e-06 9.588e-05 6.819e-01 20000: 2.866e+01 6.141e-11 3.834e+04 2.311e-07 4.390e-06 2.046e-06 6.806e-01 21000: 2.866e+01 4.456e-11 4.601e+04 1.869e-07 3.622e-06 1.516e-06 6.795e-01 22000: 2.866e+01 3.180e-11 5.521e+04 1.421e-07 2.555e-06 1.434e-06 6.787e-01 23000: 2.866e+01 2.162e-11 6.625e+04 1.155e-07 2.112e-06 1.087e-06 6.782e-01 24000: 2.866e+01 1.419e-11 7.950e+04 9.441e-08 1.749e-06 8.322e-07 6.777e-01 25000: 2.865e+01 7.384e-12 9.540e+04 7.612e-08 1.432e-06 5.896e-07 6.773e-01
xim = reshape(x, picx, picy);
recovered = Image(map(RGB, xim, xim, xim)')