Notebook

In [27]:

%matplotlib inline
import numpy as np;
from matplotlib import pyplot as plt

$\LaTeX \text{ commands here} \newcommand{\R}{\mathbb{R}} \newcommand{\im}{\text{im}\,} \newcommand{\norm}[1]{||#1||} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\span}{\mathrm{span}} \newcommand{\proj}{\mathrm{proj}} $

Georgia Tech, CS 4540

L5: Convex Functions & Multivariate Differentiation¶

Jacob Abernethy

Tuesday, September 03, 2019

Quiz password: longweekend

Multivariate Chain Rule¶

For a good description, see these notes.

Imagine we have $f : \R^n \to \R^m$ and $g : \R^m \to \R^k$. Then we want to compute the gradient (Jacobian) of the function $f \circ g$. Is there any easy way to do this? The answer is YES, according to the multivariate chain rule: $$ D_{f \circ g}[x] = D_f[g(x)] \cdot D_g[x]$$ This is a powerful statement, although it may not be clear why. It says we can obtain the gradient of the composed functions, by simply composing the gradients, treated as linear transformations.

Problem¶

Let $f(x) = \sum_{i=1}^m -\log (A_i \cdot x - b_i)$ where $A$ is an $m\times n$ matrix. Notice that we can write $f(x) = h(g(x)) = h \circ g (x)$ where $h(y) = -\sum \log y_i$ and $g(x) = A x - b$

Use the multivariate chain rule to compute $\nabla_x f$.

Answer¶

We can see that

$D_h[y] = \nabla h(y) = [-1/y_1, \ldots, -1/y_m]$
$D_g[x] = \nabla g(x) = A$, a constant matrix

So this means \begin{eqnarray*} D_f[x] & = & D_h[g(x)] \cdot D_g[x] = [-1/(A_1 x - b_1), \ldots, -1/(A_m x - b_m)] \cdot A \\ & = & \sum_{i=1}^m \frac{-1}{A_i x - b_i}A_i \end{eqnarray*}

Problem:¶

Let $f(x) = A(2x + \mathbf{1})$ where $A$ is a matrix, and $\mathbf{1}$ is the all ones vector.

Let's write $f(x) = h \circ g (x)$ where $g(x) = 2x + \mathbf{1}$ and $h(x) = A x$. What is the gradient of $f$?

Answer¶

$D_g[x] = \nabla g(x) = 2I$

$D_h[x] = \nabla h(x) = A$

$D_f[x] = D_h[g(x)] \cdot D_g[x] = 2AI = 2A$

Some Calc. Problems¶

Let $w$ be an $n$-dim vector. Calculate the first and second derivatives of the following functions:

$f(w) = \frac 1 2 w^\top w$
$f(w) = w^\top v$ where $v$ is some fixed vector

Answer¶

$f(w) = \frac 1 2 w^\top w$:
- $\nabla f(w) = w$
- $\nabla^2 f(w) = I$, the identity matrix
$f(w) = w^\top v$
- $\nabla f(w) = v$
- $\nabla^2 f(w) = 0$, the zero matrix

More Calc. Problems¶

Let $w$ be an $n$-dim vector. Calculate the derivatives of the following functions:

$f(w) = \|w\|_2$
$f(w) = \| w \|_1$

Answer¶

$f(w) = \|w\|_2 \implies \nabla f(w) = \frac{w}{\|w\|_2}$
$f(w) = \| w \|_1 \implies \nabla f(w) = [\text{sign}(w_i)]_{i=1\ldots n}$

More Calc. Problems¶

Let $w$ be an $n$-dim vector. Calculate the derivatives of the following functions:

$f(w) = \frac 1 2 \|w - v\|_2^2$ where $v$ is some fixed vector
$f(w) = \frac 1 2 w^\top M w$ (calculate the hessian for this one as well!)
$f(w) = \exp(w^\top M w)$ where $M$ is some symmetric $n \times n $ matrix

Answer¶

$f(w) = \frac 1 2 \|w - v\|_2^2 \implies \nabla f(w) = (w - v)^\top w$
$f(w) = \frac 1 2 w^\top M w \implies \nabla f(w) = \frac 1 2 w^\top(M + M^\top)$
$f(w) = \exp(w^\top M w) \implies \nabla f(w) = \exp(w^\top M w)Mw$

Positive Semidefinite Matrices¶

Definition: A matrix $M \in \R^{n\times n}$ is positive semidefinite if $x^\top M x \geq 0\; \forall x \in \R^n$

Commonly, we work with matrices $M$ that are symmetric (that is, $M = M^T$). In this case, the following are equivalent:

$M$ is positive semidefinite
The eigenvalues of $M$ are all $\geq 0$
The matrix $M$ can be written as $B^\top B$ for some $B \in R^n$

Problem: Show that (3) $\implies$ (1) $\implies$ (2)

Answer¶

If $M$ can be written as $B^TB$ for $B \in R^n$, then:

$$x^TMx = x^TB^TBx = (Bx)^T Bx = (Bx)^2 \geq 0$$

This shows (1).

To show (1) implies (2), consider the eigenvectors of $M$ $v_i$ such that $Mv_i = \lambda_i v_i$. Since $M$ is positive semidefinite:

$$v_i^T M_i v_i = v_i^T \lambda_i v_i = \lambda v_i^T v_i = \lambda \|v_i\|_2^2 \geq 0$$

Since $\|v_i\|_2^2 \geq 0$, then it must be the case $\lambda_i \geq 0$.

Convex Functions¶

A function $f : \R^d \to \R$ is convex if, for any $x,y \in \text{dom}(f)$ and any $\theta \in [0,1]$, we have $$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta) f(y)$$

First-order Condition When $f$ is differentiable, then $f$ is convex if and only if $$f(y) \geq f(x) + \nabla f(x)^\top(y - x) \quad \text{ for any } x,y \in \text{dom}(f)$$

Second-order Condition When $f$ is twice differentiable, then $f$ is convex if and only if $$u^\top (\nabla^2f(x)) u \geq 0 \quad \text{ for any } x \in \text{dom}(f) \text{ and any } u \in \R^d$$ This last condition $\equiv$ "$f$ is convex if its hessian is always positive semi-definite"

Problem¶

First-order Condition When $f$ is differentiable, then $f$ is convex if and only if $$f(y) \geq f(x) + \nabla f(x)^\top(y - x) \quad \text{ for any } x,y \in \text{dom}(f)$$

Use the first order condition to show that, for any convex and differentiable $f$, we have $$(\nabla f(y) - \nabla f(x))^\top(y - x) \geq 0 \text{ for any } x,y \in \text{dom}(f)$$

Answer¶

Let's apply the first order condition twice, once at $x$ and once at $y$: \begin{eqnarray*} f(y) & \geq & f(x) + \nabla f(x)^\top(y - x) \\ f(x) & \geq & f(y) + \nabla f(y)^\top(x - y). \end{eqnarray*} Add these two inequalities together, and subtract $f(x) + f(y)$ from both sides and you are done!

Problem¶

Find conditions under which the following function is convex $f(x) = \frac 1 2 x^\top A x$ for a symmetric matrix $A \in \R^{d \times d}$

Answer¶

As we computed earlier in lecture, for $f(x) = \frac 1 2 x^\top A x$, the hessian is $\nabla^2 f(x) = A$. We know that a twice-differentiable function is convex if its hessian is positive semi-definite. Therefore, $f$ is convex if and only if $A$ is positive semi-definite.