In this lecture we begin our study of the theory underlying constrained convex optimization. One way to define a convex optimization problem is
\begin{align*} \begin{split} \minimize & f(\vct{x})\\ \subjto & f_1(\vct{x})\leq 0\\ & \cdots \\ & f_m(\vct{x})\leq 0\\ & \vct{x}\in \Omega \end{split} \end{align*}where $f,f_1,\dots,f_m\colon \R^n\to \R$ are convex functions and $\Omega\subseteq \R^n$ is a convex set. The special case where the $f$ and the $f_i$ are linear functions and $\Omega=\R^n$ is known as linear programming, and is studied first. Before embarking on the study of models and algorithms for convex optimization, we need to study convex sets in more depth.
We recall the definition of a convex set.
Definition. A set $C\subseteq \R^n$ is a convex set, if for all $\vct{x},\vct{y}\in C$ and $\lambda\in [0,1]$, $\lambda\vct{x}+(1-\lambda)\vct{y}\in C$. In words, for every two points in $C$, the line joining them is also in $C$. A compact (closed and bounded) convex set is called a convex body.
We will denote by $\mathcal{C}(\R^n)$ the collection of convex sets and by $\mathcal{K}(\R^n)$ the collection of convex bodies. The following Lemma is left as an exercise.
Lemma. Let $C,D\in \mathcal{C}(\R^n)$ be convex sets. Then the following are also convex.
The convex hull $\conv{S}$ of a set $S$ is the intersection of all convex sets containing $S$. Clearly, if $S$ is convex, then $S=\conv{S}$.
Example. Let $S=\{(1,1)^{\trans},(1,-1)^{\trans},(-1,1)^{\trans},(-1,-1)^{\trans},(0,0)^{\trans}\}$. The convex hull of this set is the square.
A convex combination of points $\vct{x}_1,\dots,\vct{x}_k$ is a linear combination
\begin{equation*} \sum_{i=1}^k \lambda_i \vct{x}_i \end{equation*}such that $\lambda_i\geq 0$ and $\sum_{i=1}^k \lambda_i = 1$. It can be shown inductively that convex sets are closed under convex combinations: any convex combination of points in $C\in \mathcal{C}(\R^n)$ is still in $C$. In fact, the set of all convex combinations of points in a set $S$ is the convex hull of $S$.
Lemma. Let $S$ be a set. Then \begin{equation*} \conv{S} = \{\vct{x}\in \R^n \mid \vct{x}=\sum_{i=1}^k \lambda_i \vct{x}_k, \ \vct{x}_i\in S, \ \sum_{i=1}^k \lambda_i = 1, \ \lambda_i\geq 0\}. \end{equation*}
Example. A hyperplane, defined as the solution set of one linear equation,
\begin{equation*} H = \{\vct{x} \mid \ip{\vct{a}}{\vct{x}} = b\}, \end{equation*}is a convex set. Define the halfspaces $H_+$ and $H_-$ as the two sides that $H$ divides $\R^n$ into:
\begin{equation*} H_- = \{\vct{x}\mid \ip{\vct{a}}{\vct{x}}\leq b\}, \quad H_+ = \{\vct{x}\mid \ip{\vct{a}}{\vct{x}}\geq b\} \end{equation*}These are also convex sets.
Example. Euclidean balls and ellipsoids are common examples of convex sets. Let $\mtx{P}$ be a positive semidefinite symmetric matrix. Then an ellipsoid with center $\vct{x}_0$ is a set of the form
\begin{equation*} \mathcal{E} = \{\vct{x} \mid \ip{\vct{x}-\vct{x}_0}{\mtx{P}^{-1}(\vct{x}-\vct{x}_0)}\leq 1\}. \end{equation*}
A Euclidean unit ball is the special case $\mtx{P}=\mtx{I}$.
A convex cone is a set $C$ such that for all $\vct{x},\vct{y}$ and $\lambda\geq 0, \mu\geq 0$, $\lambda \vct{x}+\mu \vct{y}\in C$. It is easily verified that such a set is convex. Three important cones are the following:
The non-negative orthant $\R^n_{+}=\{\vct{x}\in \R^n \mid x_i\geq 0, 1\leq i\leq n\}$,
The second order (ice cream) cone (or Lorentz cone)
\begin{equation*} C_{\alpha} = \{\vct{x} \mid \sum_{i=1}^{n-1}x_i^2\leq x_n^2\}, \end{equation*}The cone $\mathcal{S}_{+}^n$ of positive semidefinite symmetric matrices.
Possibly the most important result in convex geometry is the {\em hyperplane separation theorem}. We first need the following.
Lemma. Let $C$ be a non-empty convex set and $\vct{x}\not\in C$. Then there exists a point $\vct{y}\in C$ that minimizes the distance $\norm{\vct{x}-\vct{y}}$. Moreover, for all $\vct{z}\in C$ we have
\begin{equation*} \ip{\vct{z}-\vct{y}}{\vct{x}-\vct{y}}\leq 0. \end{equation*}
In words, the vectors $\vct{z}-\vct{y}$ and $\vct{x}-\vct{y}$ form an obtuse angle.
Proof. Since $C\neq \emptyset$, there exists $r>0$ such that the ball $B(\vct{x},r):=\{\vct{y}\in \R^n \mid \norm{\vct{y}-\vct{x}}\leq \e\}$ intersected with $C$ is not empty. Since $K:=C\cap B(\vct{x},r)$ is compact (closed and bounded) and the function $\norm{\vct{y}-\vct{x}}$ is continuous on $K$, it has a minimizer $\vct{y}\in K$. For the second claim, note that since $C$ is convex, for every $\lambda\in [0,1]$,
\begin{equation*} \vct{w} = \lambda \vct{z}+(1-\lambda)\vct{y} \in C. \end{equation*}
For the distance between $\vct{z}$ and $\vct{x}$ we then get
\begin{align*} \norm{\vct{w}-\vct{x}}^2 &= \norm{\lambda\vct{z}+(1-\lambda) \vct{y}-\vct{x}}^2 = \norm{\lambda (\vct{z}-\vct{y})-(\vct{x}-\vct{y})}^2\\ &= \lambda^2\norm{\vct{z}-\vct{y}}^2-2\lambda \ip{\vct{z}-\vct{y}}{\vct{x}-\vct{y}}+\norm{\vct{x}-\vct{y}}^2. \end{align*}We now prove the claim by contradition. Assume $\ip{\vct{z}-\vct{y}}{\vct{x}-\vct{y}}>0$. Then we can choose $\lambda$ such that
\begin{equation*} 0< \lambda < \min\left\{ \frac{2\ip{\vct{x}-\vct{y}}{\vct{z}-\vct{y}}}{\norm{\vct{z}-\vct{y}}^2} , 1\right\}. \end{equation*}With such a $\lambda$ we get
\begin{equation*} \norm{\vct{w}-\vct{x}}^2 = \lambda^2\norm{\vct{z}-\vct{y}}^2-2\lambda \ip{\vct{z}-\vct{y}}{\vct{x}-\vct{y}}+\norm{\vct{x}-\vct{y}}^2 < \norm{\vct{x}-\vct{y}}^2. \end{equation*}This inequality, however, contradicts the assumption that $\vct{y}$ is a closest point, so that $\ip{\vct{z}-\vct{y}}{\vct{x}-\vct{y}}\leq 0$ has to hold.
In what follows write $\inter S$ for the interior of a set $S$.
Theorem. Let $C$ be a closed convex set and $\vct{x}\not\in C$. Then there exists a hyperplane $H$ such that $C\subset \inter H_-$ and $\vct{x}\in \inter H_+$.
Proof. Let $\vct{y}\in C$ be a nearest point to $\vct{x}$ in $C$, i.e., a point such that for all other $\vct{z}\in C$, $\norm{\vct{x}-\vct{y}}\leq \norm{\vct{x}-\vct{z}}$. Define
\begin{equation*} \vct{a}= \vct{x}-\vct{y}, \quad b = (\norm{\vct{x}}^2-\norm{\vct{y}}^2)/2. \end{equation*}
We aim to show that $\ip{\vct{a}}{\vct{x}} = b$ defines a separating hyperplane.
For this we have to show that
For (1), note that
\begin{equation*} \ip{\vct{a}}{\vct{x}} = \ip{\vct{x}-\vct{y}}{\vct{x}}>\ip{\vct{x}-\vct{y}}{\vct{x}}-\frac{1}{2}\norm{\vct{x}-\vct{y}}^2 = \frac{1}{2}(\norm{\vct{x}}^2-\norm{\vct{y}}^2) = b. \end{equation*}To prove (2), assume on the contrary that there exists a $\vct{z}\in C$ such that $\ip{\vct{a}}{\vct{z}}\geq b$. We know that the point $\vct{y}\in C$ satisfies the inequality (2), since
\begin{equation*} \ip{\vct{a}}{\vct{y}} < \ip{\vct{a}}{\vct{y}}+\frac{1}{2}\norm{\vct{a}}^2 = \ip{\vct{a}}{\vct{y}}+\frac{1}{2}\norm{\vct{x}-\vct{y}}^2 = b. \end{equation*}Therefore,
\begin{equation*} \ip{\vct{a}}{\vct{z}-\vct{y}} = \ip{\vct{a}}{\vct{z}}-\ip{\vct{a}}{\vct{y}} > b-b = 0, \end{equation*}but this contradicts the above Lemma. We therefore conclude $\ip{\vct{a}}{\vct{z}}<b$. The separating hyperplane $H$ is thus defined by the equation $\ip{\vct{a}}{\vct{x}}=b$.