%matplotlib inline
import numpy as np;
from matplotlib import pyplot as plt
$\LaTeX \text{ commands here} \newcommand{\R}{\mathbb{R}} \newcommand{\im}{\text{im}\,} \newcommand{\norm}[1]{||#1||} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\span}{\mathrm{span}} \newcommand{\proj}{\mathrm{proj}} \newcommand{\E}{\mathbb{E}} $
Georgia Tech, CS 4540
Jake Abernethy & Benjamin Bray
Thursday, October 3, 2019
In many fields, especially Machine Learning, we deal with functions that have randomized inputs. For example, assume we have some distribution $D$, we often consider functions of the form $$ F(\theta) := \E_{\xi \sim D} [f(\theta;\xi)]. $$ Here, $\theta$ are the parameters we want to optimize, and $\xi$ is some random input, such as a datapoint. But we'd like to optimize $\theta$ "on average", i.e. in expectation over a random sample of $\xi \sim D$.
We can optimize functions, defined through expectations above, using randomized gradients.
Theorem (Hazan book): As long as $f(\cdot;\xi)$ is convex, $\|\nabla f(\cdot; \cdot) \| \leq G$ and $\text{diam}(K) \leq D$, then we have $$\E[F(\bar \theta_T)] - \min_{\theta \in K} F(\theta) \leq \frac{3GD}{2\sqrt{T}}$$
Let $F(\theta) := \frac 1 n \sum_{i=1}^n f(\theta; \xi_i)$ be some objective function, defined as an average over samples $\xi_1, \ldots \xi_n$. Consider two algorithms for minimizing $F$:
Note: this is material for HW4 and will NOT be on the midterm!
For vectors $u,v \in \R^m$, define elementwise inequalities as: $$ \begin{align} u \leq v &\iff u_i \leq b_i & (i=1,\dots,m) \end{align} $$ Similarly, let $x \in \R^n$ and $b \in \R^m$. If $A \in \R^{m \times n}$ is a matrix with rows $a_1^T, \dots, a_m^T \in \R^{1 \times n}$, then $$ \begin{align} A x \leq b &\iff a_i^T x \leq b_i & (i=1,\dots,m) \\ \end{align} $$
A linear program consists of:
The reading contained the following example:
Remember that a hyperplane is a level set of the linear functional $f(x) = a^T x$ for some $a \in \R^n$, and a half-space is the region on either side of a hyperplane: $$ \begin{align} \text{hyperplane:} \quad& \{ x \in \R^n \mid a^T x = b \} \\ \text{half-space:} \quad& \{ x \in \R^n \mid a^T x \leq b \} \end{align} $$
The feasible regions is a polytope or polyhedron, since it's an intersection of half-spaces. This means it's also convex!
Using the resources available, how much of each product should be manufactured to maximize profits?
For convenience, define
Notice that:
The total profit earned from production plan $x = (x_1,\dots,x_n) \in \R^n$ is $$ c^T x = c_1 x_1 + c_2 x_2 + \cdots + c_n x_n $$ In doing so, we expend the following amount of each resource $j=1,\dots,m$: $$ a_i^T x = a_{i1} x_1 + a_{i2} x_2 + \cdots + a_{in} x_n $$ We only have $r_i$ units of resource $i$, so we need to enforce $\boxed{a_i^T x \leq r_i}$. We also can't produce a negative amount of any product, so $\boxed{x \geq 0}$.
Written more compactly, the linear program we want to solve is:
Notice that we can rewrite the constraints as $Ax \leq r$.
Our goal is to meet the needs of all the factories at minimum transportation cost.
The standard form of a linear program for $c,x \in \R^n$, $b \in \R^m$ and $A \in \R^{m \times n}$ is
We will show that every linear program can be converted to standard form! Only $\leq$ constraints and nonnegative variables are necessary.
The following linear program is equivalent, but all the constraints are now $\leq$:
$$ \begin{align} \max_{x \in \R^n} &\quad c^T x \\ \text{such that} \quad-& a^T x \leq -\alpha \\ \quad & b^T x \leq \beta \\ \quad-& b^T x \leq \beta \\ \end{align} $$Hint: Any $x \in \R$ can be written as $x = a - b$ where $a,b \geq 0$.
Split each unconstrained decision variable into two nonnegative decision variables: $$ \begin{align} x_1 &= y_1 - y_2 &(y_1,y_2 \geq 0)\\ x_2 &= y_3 - y_4 &(y_3,y_4 \geq 0)\\ \end{align} $$
Let $y = (y_1,y_2,y_3,y_4)$. Replace $x_1$ and $x_2$ with the expressions above to obtain: $$ \begin{align} \max_{x \in \R^2} \quad& (y_1 - y_2) - (y_3 - y_4) \\ \text{such that} \quad & (y_1 - y_2) + (y_3 - y_4) \leq 1 \\ \quad-& (y_1 - y_2) + (y_3 - y_4) \leq 1 \\ \quad & y \geq 0 \end{align} $$
By finding an optimal solution for the linear program for $y$, we can reconstruct an optimal solution for $x$'s linear program!
The previous two problems show that any linear program can be converted to standard form! From now on, we'll assume every linear program is written in this form.
We will work towards proving the following Lemma:
Hint: Use the separating hyperplane theorem, where one set is $\mathrm{cone}(a_1,\dots,a_k)$.