Problem Sheet 1 Part A - Solutions

$\newcommand{\vct}[1]{\mathbf{#1}}$ $\newcommand{\mtx}[1]{\mathbf{#1}}$ $\newcommand{\e}{\varepsilon}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\minimize}{\text{minimize}\quad}$ $\newcommand{\maximize}{\text{maximize}\quad}$ $\newcommand{\subjto}{\quad\text{subject to}\quad}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\trans}{T}$ $\newcommand{\ip}[2]{\langle {#1}, {#2} \rangle}$ $\newcommand{\zerovct}{\vct{0}}$ $\newcommand{\diff}[1]{\mathrm{d}{#1}}$

Solution to Problem 1.1

(a) The function $f(x)=x^4$ has a strict minimum at $x=0$, but the second derivative satisfies $f''(0)=0$.

(b) We construct a function that has a strict minimizer $x^*$, but such that every open neighourhood $U$ of $x^*$ contains other local minimizers. One such function is

\begin{equation*} f(x) = \begin{cases} x^4 (\cos(1/x)+2) & \text{ for } x\neq 0\\ 0 & \text{ for } x=0. \end{cases} \end{equation*}

In [3]:
import matplotlib.pyplot as plt
import numpy as np

xx = np.linspace(-0.01,0.01,1000)
def f(x):
    return (x**4)*(np.cos(1/x)+2)

% matplotlib inline

We explain the construction of this function:

  • Start out with $g(x) = \cos(1/x)+2$ for $x\neq 0$ and $g(0)=1$. This function has minimizers $x_0=0$ and $x_k = 1/(\pi k)$ for $k>0$, with values $g(x_k)=1$ at all minimizers. Therefore, any open interval around $0$ contains (infinitely many) local minimizers $x_k$ other than $x_0=0$.
  • Multipy $x^4$ to the function: $f(x) = x^4g(x)$. This ensures that $f(0)=0$ and $f(x)>0$ for $x\neq 0$. There are still local minima in every neighbourhood of $0$. To see this, compute the derivative

    \begin{equation*} f'(x) = x^2(4x\cos(1/x)+\sin(1/x)+8x). \end{equation*}

Set $z_m=1/(\pi/2+m\pi)$ for $m>0$. Since $\sin(1/z_m)=\sin(\pi/2+m\pi)= 1$ for $m$ even and $-1$ for $m$ odd, for $m$ sufficiently large the derivative changes signs between successive $z_m$. Since $f'(x)$ is continuous, it has roots between any $z_m$ and $z_{m+1}$ for large enough $m$, and these correspond to maxima and minima of $f$.

The function is in $C^2(\R)$. For $x\neq 0$ this is clear, and to verify this at $x=0$, one shows that the right and left limits as $x\to 0$ of $f'(x)$ and $f''(x)$ coincide (they are in fact $0$).

Note the subtle point that one minimizer $x^*$ can have local minimizers that are arbitrary close: while each open interval $I$ surrounding $x^*$ has another local minimizer $\tilde{x}$, every such $\tilde{x}$ has an interval $\tilde{I}$ surrounding it where this $\tilde{x}$ is the only minimizer!

Solution to Problem 1.2

We want to show that the function

\begin{equation*} f(\vct{x}) = \frac{1}{2}\vct{x}^{\trans}\mtx{A}\vct{x}+\vct{b}^{\trans}\vct{x}+c \end{equation*}

is convex if and only $\mtx{A}$ is positive semidefinite. To see this, we compute the partial derivatives and the Hessian of $f$. The parts $\vct{b}^{\trans}\vct{x}$ and $c$ disappear when computing second derivatives. The function $\vct{x}^{\trans}\mtx{A}\vct{x}$ can be written as

\begin{equation*} \vct{x}^{\trans}\mtx{A}\vct{x} = \sum_{i=1}^m\sum_{j=1}^n a_{ij}x_ix_j, \end{equation*}

so that the first derivative is

\begin{equation*} \frac{\partial f}{\partial x_i} = \frac{1}{2} \sum_{j\neq i} (a_{ij}+a_{ji}) x_j + a_{ii}x_i+b_i = \sum_{j=1}^n a_{ij}x_j+b_i, \end{equation*}

where we used the symmetry of $\mtx{A}$ (i.e., $a_{ij}=a_{ji}$). The gradient and Hessian are therefore just given by

\begin{equation*} \nabla f(\vct{x}) = \mtx{A}\vct{x}+\vct{b}, \quad \nabla^2f(\vct{x}) = \mtx{A}. \end{equation*}

An interesting special case is when the quadratic function arises in the form

\begin{equation*} f(\vct{x}) = \frac{1}{2}\norm{\mtx{A}\vct{x}-\vct{b}}_2^2. \end{equation*}

The quadratic equation then has the form

\begin{equation*} \norm{\mtx{A}\vct{x}-\vct{b}}_2^2 = (\mtx{A}\vct{x}-\vct{b})^{\trans}(\vct{A}\vct{x}-\vct{b}) = \vct{x}^{\trans}\mtx{A}^{\trans}\mtx{A}\vct{x}-2\vct{b}^{\trans}\mtx{A}\vct{x}+\norm{\vct{b}}_2^2. \end{equation*}

The matrix $\mtx{A}^{\trans}\mtx{A}$ is always symmetric and positive semidefinite:

\begin{equation*} (\mtx{A}^{\trans}\mtx{A})^{\trans} = \mtx{A}^{\trans}(\mtx{A}^{\trans})^{\trans} = \mtx{A}^{\trans}\mtx{A}, \quad \text{ and } \quad \vct{x}^{\trans}\mtx{A}^\trans\mtx{A}\vct{x} = \norm{\mtx{A}\vct{x}}_2^2>0 \text{ if and only if } \vct{x}\neq \zerovct, \end{equation*}

so that the least squares function is convex. We also see that the derivative is

\begin{equation*} \mtx{A}^{\trans}(\mtx{A}\vct{x}-\vct{b}). \end{equation*}

Solution to Problem 1.3

(a) This set is not convex: take $\vct{x}=(1,0,0)^{\trans}$ and $\vct{y}=(-1,0,0)^{\trans}$, then $\frac{1}{2}\vct{x}+\frac{1}{2}\vct{y}=\zerovct \not\in S$.

(b) This set is convex: if $\vct{x},\vct{y}\in S$, then $1\leq x_1-x_2<2$ and $1\leq y_1-y_2<2$, and

\begin{equation*} \lambda x_1+(1-\lambda)y_1-\lambda x_2-(1-\lambda)y_2 = \lambda (x_1-x_2)+(1-\lambda)(y_1-y_2)<\lambda 2+(1-\lambda)2 = 2, \end{equation*}

with the same argument giving the lower bound.

(c) This set is convex. In fact, $S$ is the unit ball of the $1$-norm

\begin{equation*} \norm{\vct{x}}_1 = \sum_{i=1}^d |x_i|. \end{equation*}

Given $\vct{x},\vct{y}\in S$,

\begin{equation*} \norm{\lambda \vct{x}+(1-\lambda)\vct{y}}_1 \leq \lambda \norm{\vct{x}}_1+(1-\lambda)\norm{\vct{y}}_1 \leq \lambda \cdot 1+(1-\lambda)\cdot 1 = 1. \end{equation*}

(d) This set is convex. Here, one needs to show that convex combinations preserve symmetry and positive definiteness of a matrix. The symmetry is clear. As for the positive definiteness, let $\vct{x}\neq \zerovct$ be given. Then

\begin{equation*} \vct{x}^{\trans}(\lambda \mtx{A}+(1-\lambda)\mtx{B})\vct{x} = \lambda \vct{x}^{\trans}\mtx{A}\vct{x}+(1-\lambda)\vct{x}^{\trans}\mtx{B}\vct{x} \geq 0, \end{equation*}

which shows that positive definiteness is also preserved.

Solution to Problem 1.4

(a) This function is not convex. There are various ways of deriving this. For example, one can verify that the Hessian, or second derivative, is $-1/x^2$, which is not positive semidefinite.

Alternatively, one can also prove the statement using a pedestrian approach. We have to show that there are points $y\neq x$ and $\lambda \in [0,1]$ such that

\begin{equation*} \log(\lambda x+(1-\lambda)y) > \lambda \log(x)+(1-\lambda)\log(y). \end{equation*}

Let's choose $y=0$. Then what needs to be shown is that for the points $\vct{p}_1=(1,0)$ and $\vct{p}_2=(x,\log(x))$, the line joining $\vct{p}_1$ and $\vct{p}_2$ lies {\em below} the curve $(t,\log(t))$ between $1$ and $x$. The line is given by the equation

\begin{equation*} \ell(t) = \frac{\log(x)}{x-1}(t-1). \end{equation*}

Evaluating this, for example, at $x=2$ and $t=1.5$, one sees that $\ell(t)>\log(t)$, which is enough evidence that $\log(t)$ is not convex. With a little more effort one can deduce that the function is actually concave.

(b) The function $f(x)=x^4$ is convex, as we will verify using Theorem 2.4. First, note that the derivative $4x^3$ is an increasing function with $x$. Given two points $(x,x^4)$ and $(y,y^4)$ with $y>x$, the line connecting them has slope $(y^4-x^4)/(y-x)$. By the mean value theorem, there exists a $z\in (x,y)$ such that

\begin{equation*} \frac{y^4-x^4}{y-x} = f'(z) = 4z^3 \geq 4x^3. \end{equation*}

Rearranging this inequality, we get

\begin{equation*} f(y)-f(x) = y^4-x^4 \geq 4x^3(y-x) = f'(x)(y-x), \end{equation*}

which is precisely the criterium for convexity in Theorem 2.4(1).

(c) Using Theorem 2.4(2), we compute the Hessian as

\begin{equation*} \nabla^2f(\vct{x}) = \begin{pmatrix} 0 & 1\\ 1 & 0 \end{pmatrix}. \end{equation*}

This matrix is positive semidefinite on $\R^2_{++}$, since for all $\vct{x}\in \R^2_{++}$ we have

\begin{equation*} \vct{x}^{\trans}\nabla^2f(\vct{x})\vct{x} = 2x_1x_2>0. \end{equation*}

It follows that the function $f(\vct{x})=x_1x_2$ is convex.

(d) The Hessian matrix of $f(\vct{x})=x_1/x_2$ is

\begin{equation*} \nabla^2f(\vct{x}) = \begin{pmatrix} 0 & -\frac{1}{x_2^2}\\ -\frac{1}{x_2^2} & 2\frac{x_1}{x_2^3} \end{pmatrix}. \end{equation*}

This matrix is not positive semidefinite for all valid values of $\vct{x}$ (take for example $\vct{x}=(1,1)^{\trans}$, which leads to a negative eigenvalue).

(e) The function $e^x-1$ is convex, as is easily seen using Theorem 2.4(2) by computing the second derivative.

(f) The function $f(\vct{x})=\max_i x_i$ is convex. Here, we can't use the criteria from Theorem 2.4 since the function is not differentiable, so we have to verify convexity directly: \begin{equation*} \max_i \lambda x_i+(1-\lambda)y_i \leq \lambda \max_i x_i+(1-\lambda) \max_i y_i. \end{equation*}