In [2]:

```
%matplotlib inline
import numpy as np;
import matplotlib
import matplotlib.pyplot as plt
np.random.seed(1337)
kwargs = {'linewidth' : 3.5}
font = {'weight' : 'normal', 'size' : 24}
matplotlib.rc('font', **font)
def error_plot(ys, yscale='log'):
plt.figure(figsize=(8, 8))
plt.xlabel('Step')
plt.ylabel('Error')
plt.yscale(yscale)
plt.plot(range(len(ys)), ys, **kwargs)
```

$\LaTeX \text{ commands here} \newcommand{\R}{\mathbb{R}} \newcommand{\im}{\text{im}\,} \newcommand{\norm}[1]{||#1||} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\span}{\mathrm{span}} \newcommand{\proj}{\mathrm{proj}} \newcommand{\OPT}{\mathrm{OPT}} $

**Georgia Tech, CS 4540**

Jake Abernethy & Benjamin Bray

*Thursday, September 13, 2018*

\begin{align}
\max_{x \in \R^n} &\quad c^T x \\
\text{such that} &\quad Ax \leq b & \text{(only $\leq$ constraints)} \\
&\quad x \geq 0 & \text{(variables nonnegative)}
\end{align}

- Decision variables $x \in \R^n$
- Linear objective $c^T x$ for $c \in \R^n$
- Constraint matrix $A \in \R^{m \times n}$ and vector $b \in \R^m$. Each row corresponds to a constraint.

A company has $m$ different resources which can be used to manufacture $n$ different products.

- $x = (x_1,\dots,x_n) \in \R^n$ is the amount of each product manufactured
- $r = (r_1,\dots,r_m) \in \R^m$ is the supply of each resource
- $c = (c_1,\dots,c_n) \in \R^n$ is the profit generated by each product
- $A = [a_{ij}] \in \R^{m \times n}$, where $a_{ij}$ is the amount of resource $i$ needed to make product $j$

$$\max_{x \in \R^n} \quad c^T x \quad
\text{s.t.} \quad A x \leq r \quad\text{and}\quad x \geq 0$$

If it helps, assume the amount of each resource available is measured in

kilograms, and our profit is measured ineuros€. Maybe we're manufacturing rope, so the amount of each product is measured inmeters.

Using our rope factory example,

- $r_i$ is measured in $(kg)$
- $x_j$ is measured in $(m)$
- $c_j$ is measured in $(€/m)$
- $a_{ij}$ is measured in $(kg/m)$

The dual program has the constraint $y^T A \geq c$, which has terms like $y_i a_{ij} \geq c_j$. The units are

$$ \bigg( ? \bigg) \times \bigg( \frac{kg}{m} \bigg) \geq \bigg( \frac{€}{m} \bigg) $$- For this to make sense, $y_i$ needs to have units $(€/kg)$ or
**euros per kilogram**. - Then the dual objective $y^T r$ has units $(€/kg)\times (kg) = (€)$.

We decided that $y_i$ has units $(€/kg)$, and $y^T r$ has units $(€)$.

- Imagine that we also have the option of
**selling our resources**directly to a buyer, for a price of $y_i$ euros per kilogram, rather than turning them into rope. - This is only worth it to us if the resources used to make each rope are
**more valuable**than the ropes themselves. This is the dual constraint! - The buyer knows this, so he'll try to
**minimize the amount he pays for our raw materials**by choosing as small $y_i$ as possible.

$$
\begin{align}
\min_{y \in \R^m} &\quad y^T r \\
\text{such that} &\quad y^T A \geq c^T & \text{(dominates objective)} \\
&\quad y \geq 0 & \text{(nonnegative combinations)}
\end{align}
$$

**Lemma.** (Farkas) Let $A \in \R^{m \times n}$ and $b \in \R^m$. Then exactly one of the following two statements is true:

- There exists $x \in \R^n$ such that $Ax \geq b$ and $x \geq 0$.
- There exists $y \in \R^m$ such that $y^T A \leq 0$ and $b^T y > 0$ and $y \geq 0$

**Prove** that strong duality holds using the Farkas Lemma (v2). That is, the following two are *equal*:

**Definition**: A matrix $M \in \R^{n\times n}$ is *positive semidefinite* (PSD) if $x^\top M x \geq 0\; \forall x \in \R^n$

Commonly, we work with matrices $M$ that are *symmetric* (that is, $M = M^\top$). In this case, the following are equivalent:

- $M$ is PSD
- The eigenvalues of $M$ are all $\geq 0$
- The matrix $M$ can be written as $B^\top B$ for some $B \in \R^{n \times n}$

Show that (3) $\implies$ (1) $\implies$ (2)

- (3) $\implies$ (1): If $M = B^\top B$ then for any $x$ we have $$x^\top M x = x^\top (B^\top B) x = (x^\top B^\top) (B x) = (B x)^\top (B x) = \|B x\|_2^2 \geq 0$$
- (1) $\implies$ (2): If $M$ is PSD, then for any vector $x$ we have $x^\top M x \geq 0$. Let's take an vector $v$ of $M$, with eigenvalue $\lambda$, hence $M v = \lambda v$. We need to show that $\lambda \geq 0$. Notice that $v \ne 0$, hence, if we multiply on the left hand side by $v$, we use the fact that $M$ is PSD to get $$ 0 \leq v^\top M v = v^\top(\lambda v) = \lambda v^\top v = \lambda \| v\|^2.$$ The fact that $0 \leq \lambda \| v \|^2$ can only occur if $\lambda \geq 0$.

We often use the notation $M \succcurlyeq 0$ to denote that $M$ is PSD. The notation $M \succcurlyeq N$ is equivalent to $M - N \succcurlyeq 0$, i.e. $M - N$ is PSD.

Let $M \in \R^{n \times n}$ be an arbitrary symmetric matrix. Let $P \in \R$ be any nonsingular matrix. Show that $$M \succcurlyeq 0 \Longleftrightarrow P^\top M P \succcurlyeq 0$$

Show that the set $S_+^n := \{ M \in \R^{n\times n}: M = M^\top, M \succcurlyeq 0 \}$ forms a convex cone.

- (Problem A)
- First assume that $M \succcurlyeq 0$. Then $x^\top M x \geq 0$ for all $x$. Now take an arbitrary vector $y$, and observe that $y^\top (P^\top M P) y = (P y)^\top M (P y)$. Of course, $P y$ is just some arbitrary vector, hence $(P y)^\top M (P y) \geq 0$ as desired
- For the second direction, assume that for all $y$ we have $y^\top (P^\top M P) y \geq 0$. We need to show that for all $x$, $x^\top M x \geq 0$. Notice that $P$ is invertible, hence we can set $y = P^{-1} x$, and observe that $$ 0 \leq (P^{-1} x)^\top (P^\top M P) (P^{-1} x) = x^\top (P^{-1})^\top P^\top M P P^{-1} x = x^\top M x$$ as desired.

- (Problem B) Let us quickly check convexity. Let $N,M \in S_+^n$, and observe that for any $x$ we have $x^\top M x \geq 0$ and $x^\top N x \geq 0$. Hence, if we take a convex combination $\theta M + (1-\theta)$, with $\theta \in [0,1]$ we have $$ x^\top (\theta M + (1-\theta) N) x = \theta x^\top M x + (1-\theta) x^\top N x$$ which is nonnegative since both terms are nonnegative

A function $f : \R^d \to \R$ is convex if, for any $x,y \in \text{dom}(f)$ and any $\theta \in [0,1]$, we have $$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta) f(y)$$

**First-order Condition** When $f$ is differentiable, then $f$ is convex if and only if
$$f(y) \geq f(x) + \nabla f(x)^\top(y - x) \quad \text{ for any } x,y \in \text{dom}(f)$$

**First-order Condition** When $f$ is differentiable, then $f$ is convex if and only if
$$f(y) \geq f(x) + \nabla f(x)^\top(y - x) \quad \text{ for any } x,y \in \text{dom}(f)$$

Use the first order condition to show that, for any convex and differentiable $f$, we have $$(\nabla f(y) - \nabla f(x))^\top(y - x) \geq 0 \text{ for any } x,y \in \text{dom}(f)$$

Let's apply the first order condition twice, once at $x$ and once at $y$: \begin{eqnarray*} f(y) & \geq & f(x) + \nabla f(x)^\top(y - x) \\ f(x) & \geq & f(y) + \nabla f(y)^\top(x - y). \end{eqnarray*} Add these two inequalities together, and subtract $f(x) + f(y)$ from both sides and you are done!

**Second-order Condition** When $f$ is twice differentiable, then $f$ is convex if and only if
the hessian satisfies $\nabla^2f(x) \succcurlyeq 0$.

Find conditions under which the following function is convex $f(x) = \frac 1 2 x^\top A x$ for a symmetric matrix $A \in \R^{d \times d}$

As we computed earlier in lecture, for $f(x) = \frac 1 2 x^\top A x$, the hessian is $\nabla^2 f(x) = A$. We know that a twice-differentiable function is convex if its hessian is positive semi-definite. Therefore, $f$ is convex if and only if $A$ is positive semi-definite.