Introduction to scientific computing with Python

Maxime Sangnier

September, 2020

Part 6: A practical work about convex optimization

Table of contents

  1. Exercises

Exercises

Exercise 1

In [1]:
import numpy as np
import matplotlib.pyplot as plt

def plot_contour(Q, c):
    """
    Plot the contour of a quadratic function 1/2 x^T Q x + c^T x.
    
    Input:
        - Q: a 2x2 numpy array.
        - c: a numpy array of size 2.
    """
    f = lambda x: x.T.dot(Q.dot(x))/2 + c.dot(x)
    sol = np.linalg.solve(Q, -c)
    sol_x, sol_y = sol[0], sol[1]

    X, Y = np.meshgrid(np.linspace(-10+sol_x, 10+sol_x, 100), np.linspace(-10+sol_y, 10+sol_y, 100))
    X, Y = X.ravel(), Y.ravel()
    Z = np.asarray([f(np.r_[x, y]) for x, y in zip(X, Y)])

    X.shape = (int(np.sqrt(X.size)), -1)
    Y.shape = (int(np.sqrt(Y.size)), -1)
    Z.shape = (int(np.sqrt(Z.size)), -1)

    levels = np.logspace(0, np.log10(Z.max()-Z.min()), 20) + Z.min()
    plt.contour(X, Y, Z, levels)

def plot_path(p):
    """
    Plot the path [x_0, x_1, … x_k] of an optimization algorithm.
    
    Input:
        - p: a list of numpy arrays of size 2 (the points x_t, t \in [0, k]).
    """
    plt.plot([x for x, y in p], [y for x, y in p], '*-')

Let us consider the optimization problem: $$ \operatorname{minimize}_{x \in \mathbb R ^d} \frac 1 2 x^\top Q x + c^\top x, $$ where $Q$ is a symmetric positive definite matrix and $c$ a vector. These parameters can be randomly generated (here in dimension 2):

In [2]:
Q = np.random.randn(2, 2)
Q = Q.dot(Q.T)
c = np.random.randn(2)
c *= 10 / np.linalg.norm(np.linalg.solve(Q, -c))

plot_contour(Q, c)

Question 1. Define a function gradient_descent(Q, c, eps=1e-3) that:

  • performs a gradient descent for the optimization problem of interest with fixed step size;
  • terminates when the norm of the current gradient is less than eps;
  • returns the list of iterates $(x_k)_k$.

Plot the path of the descent.

In [ ]:
# Answer

Question 2. Let $L$ be the coefficient of Lipschitz-continuity of the gradient of the objective function.Try different step sizes between $\frac{0.1}{L}$ and $\frac{2.5}{L}$. What do you observe?

In [ ]:
# Answer

Question 3. Implement a line search with Armijo's rule and compare the path obtained.

In [ ]:
# Answer

Question 4. Show that the exact line search boils down to $\gamma_k = \frac{\|v_k\|_2^2}{v_k^\top Q v_k}$, where $v_k$ is the gradient of the objective function at $x_k$.

Implement a gradient descent with this step size and compare to both others.

In [ ]:
# Answer

Question 5. Compare with a Newton method.

In [ ]:
# Answer

Exercise 2

In [9]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Let us consider the problem of least absolute deviations: $$ \operatorname{minimize}_{\beta \in \mathbb R ^d} \left\| Y - X\beta \right\|_1, $$ where $Y \in \mathbb R^n$ and $X \in \mathbb R^{n \times d}$.

The matrices $X$ and $Y$ can be randomly generated:

In [10]:
n, d = 200, 2  # Sample size, dimension

X = np.random.randn(n, d-1)
X = np.concatenate((X, np.ones((n, 1))), axis=1)  # Last column of 1 (intercept)

beta = np.random.randn(d)
# beta[1:-1] = 0  # Only the first and the last components are nonzero

Y = X.dot(beta) + np.random.randn(n)*5e-1

Question 1. Rewrite the previous optimization problem as a linear program.

Answer:

Once the optimization problem is rewritten as a linear program, it can be solved with standard libraries such as cvxopt.

In [11]:
from cvxopt import matrix, solvers

# Variables: beta, t
c = matrix(np.r_[np.zeros(d), np.ones(n)])  # Linear objective
G = matrix(np.r_[np.c_[-X, -np.eye(n)],
                 np.c_[X, -np.eye(n)]])  # Inequality lhs
h = matrix(np.r_[-Y, Y])  # Inequality rhs

sol = solvers.lp(c, G, h)
     pcost       dcost       gap    pres   dres   k/t
 0:  0.0000e+00  2.0539e-15  5e+02  3e+00  1e-15  1e+00
 1:  3.4129e+01  3.4135e+01  7e+01  4e-01  2e-15  2e-01
 2:  7.2083e+01  7.2085e+01  1e+01  7e-02  4e-15  3e-02
 3:  7.8320e+01  7.8321e+01  2e+00  1e-02  2e-15  5e-03
 4:  7.9162e+01  7.9162e+01  8e-01  4e-03  8e-15  2e-03
 5:  7.9531e+01  7.9531e+01  2e-01  9e-04  1e-14  4e-04
 6:  7.9610e+01  7.9610e+01  2e-02  1e-04  1e-14  5e-05
 7:  7.9619e+01  7.9619e+01  3e-04  1e-06  6e-15  6e-07
 8:  7.9620e+01  7.9620e+01  3e-06  1e-08  8e-15  6e-09
Optimal solution found.
In [12]:
beta_sol = np.asarray(sol['x'][:d]).ravel()
obj_sol = np.linalg.norm(Y - X.dot(beta_sol), ord=1)

print("True beta:", beta)
print("Estimation of beta:", beta_sol)
print("Minimum objective:", obj_sol)
True beta: [-0.34926163 -0.74205859]
Estimation of beta: [-0.38284007 -0.68895343]
Minimum objective: 79.61957600076465

Question 2. Scatter the observations and draw the linear model. Add the predictions obtained from the estimator beta_sol.

In [ ]:
# Answer

Question 3. Define a function subgradient_descent(Y, X, n_it=100, step=1., mode='fixed') that:

  • performs a subgradient descent for the original optimization problem (we note $g_k$ a subgradient at iteration $k$) with either
    • a fixed step length: $\frac{\textrm{step}}{\|g_{k}\|}$;
    • a $\sqrt \cdot$-diminishing step size: $\frac{\textrm{step}}{\sqrt k}$;
    • a linearly diminishing step size: $\frac{\textrm{step}}{k}$;
  • terminates after n_it iterations;
  • returns the list of iterates $(\beta_k)_k$.

Run the algorithm for different fixed step lengths and plot, with respect to the iterations, the objective values $f(x_k)$ normalized by $\frac{f(x_k) - \textrm{obj_sol}}{\textrm{obj_sol}}$ (use a y-log scale).

In [ ]:
# Answer

Question 4. Choose a value for step and compare the three ways of defining the step size (fixed length, $\sqrt \cdot$, linear) for a large number of iterations by plotting the objective values. What do you observe?

In [ ]:
# Answer

Question 5. In the same manner as for the previous question, draw the norm of the difference between beta_sol and the iterates $(\beta_k)_k$ of the subgradient methods.

In [ ]:
# Answer