%matplotlib inline
import numpy as np;
from matplotlib import pyplot as plt
$\LaTeX \text{ commands here} \newcommand{\R}{\mathbb{R}} \newcommand{\im}{\text{im}\,} \newcommand{\norm}[1]{||#1||} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\span}{\mathrm{span}} \newcommand{\proj}{\mathrm{proj}} $
Georgia Tech, CS 4540
Jacob Abernethy
Tuesday, September 03, 2019
Quiz password: longweekend
For a good description, see these notes.
Imagine we have $f : \R^n \to \R^m$ and $g : \R^m \to \R^k$. Then we want to compute the gradient (Jacobian) of the function $f \circ g$. Is there any easy way to do this? The answer is YES, according to the multivariate chain rule: $$ D_{f \circ g}[x] = D_f[g(x)] \cdot D_g[x]$$ This is a powerful statement, although it may not be clear why. It says we can obtain the gradient of the composed functions, by simply composing the gradients, treated as linear transformations.
Let $f(x) = \sum_{i=1}^m -\log (A_i \cdot x - b_i)$ where $A$ is an $m\times n$ matrix. Notice that we can write $f(x) = h(g(x)) = h \circ g (x)$ where $h(y) = -\sum \log y_i$ and $g(x) = A x - b$
Use the multivariate chain rule to compute $\nabla_x f$.
We can see that
So this means \begin{eqnarray*} D_f[x] & = & D_h[g(x)] \cdot D_g[x] = [-1/(A_1 x - b_1), \ldots, -1/(A_m x - b_m)] \cdot A \\ & = & \sum_{i=1}^m \frac{-1}{A_i x - b_i}A_i \end{eqnarray*}
Let $f(x) = A(2x + \mathbf{1})$ where $A$ is a matrix, and $\mathbf{1}$ is the all ones vector.
Let's write $f(x) = h \circ g (x)$ where $g(x) = 2x + \mathbf{1}$ and $h(x) = A x$. What is the gradient of $f$?
$D_g[x] = \nabla g(x) = 2I$
$D_h[x] = \nabla h(x) = A$
$D_f[x] = D_h[g(x)] \cdot D_g[x] = 2AI = 2A$
Let $w$ be an $n$-dim vector. Calculate the first and second derivatives of the following functions:
Let $w$ be an $n$-dim vector. Calculate the derivatives of the following functions:
Let $w$ be an $n$-dim vector. Calculate the derivatives of the following functions:
Definition: A matrix $M \in \R^{n\times n}$ is positive semidefinite if $x^\top M x \geq 0\; \forall x \in \R^n$
Commonly, we work with matrices $M$ that are symmetric (that is, $M = M^T$). In this case, the following are equivalent:
Problem: Show that (3) $\implies$ (1) $\implies$ (2)
If $M$ can be written as $B^TB$ for $B \in R^n$, then:
$$x^TMx = x^TB^TBx = (Bx)^T Bx = (Bx)^2 \geq 0$$This shows (1).
To show (1) implies (2), consider the eigenvectors of $M$ $v_i$ such that $Mv_i = \lambda_i v_i$. Since $M$ is positive semidefinite:
$$v_i^T M_i v_i = v_i^T \lambda_i v_i = \lambda v_i^T v_i = \lambda \|v_i\|_2^2 \geq 0$$Since $\|v_i\|_2^2 \geq 0$, then it must be the case $\lambda_i \geq 0$.
A function $f : \R^d \to \R$ is convex if, for any $x,y \in \text{dom}(f)$ and any $\theta \in [0,1]$, we have $$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta) f(y)$$
First-order Condition When $f$ is differentiable, then $f$ is convex if and only if $$f(y) \geq f(x) + \nabla f(x)^\top(y - x) \quad \text{ for any } x,y \in \text{dom}(f)$$
Second-order Condition When $f$ is twice differentiable, then $f$ is convex if and only if $$u^\top (\nabla^2f(x)) u \geq 0 \quad \text{ for any } x \in \text{dom}(f) \text{ and any } u \in \R^d$$ This last condition $\equiv$ "$f$ is convex if its hessian is always positive semi-definite"
First-order Condition When $f$ is differentiable, then $f$ is convex if and only if $$f(y) \geq f(x) + \nabla f(x)^\top(y - x) \quad \text{ for any } x,y \in \text{dom}(f)$$
Use the first order condition to show that, for any convex and differentiable $f$, we have $$(\nabla f(y) - \nabla f(x))^\top(y - x) \geq 0 \text{ for any } x,y \in \text{dom}(f)$$
Let's apply the first order condition twice, once at $x$ and once at $y$: \begin{eqnarray*} f(y) & \geq & f(x) + \nabla f(x)^\top(y - x) \\ f(x) & \geq & f(y) + \nabla f(y)^\top(x - y). \end{eqnarray*} Add these two inequalities together, and subtract $f(x) + f(y)$ from both sides and you are done!
Second-order Condition When $f$ is twice differentiable, then $f$ is convex if and only if $$u^\top (\nabla^2f(x)) u \geq 0 \quad \text{ for any } x \in \text{dom}(f) \text{ and any } u \in \R^d$$
Find conditions under which the following function is convex $f(x) = \frac 1 2 x^\top A x$ for a symmetric matrix $A \in \R^{d \times d}$
As we computed earlier in lecture, for $f(x) = \frac 1 2 x^\top A x$, the hessian is $\nabla^2 f(x) = A$. We know that a twice-differentiable function is convex if its hessian is positive semi-definite. Therefore, $f$ is convex if and only if $A$ is positive semi-definite.