$\newcommand{\vct}[1]{\mathbf{#1}}$ $\newcommand{\mtx}[1]{\mathbf{#1}}$ $\newcommand{\e}{\varepsilon}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\minimize}{\text{minimize}\quad}$ $\newcommand{\maximize}{\text{maximize}\quad}$ $\newcommand{\subjto}{\quad\text{subject to}\quad}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\trans}{T}$ $\newcommand{\ip}[2]{\langle {#1}, {#2} \rangle}$ $\newcommand{\zerovct}{\vct{0}}$ $\newcommand{\diff}[1]{\mathrm{d}{#1}}$ $\newcommand{\conv}{\operatorname{conv}}$ $\newcommand{\inter}{{\operatorname{int}}}$
In this final lecture we address several topics related to the exam. The main talking points are listed below. Other than that, the past exam, the midterm test, and Part A of the problem sheets should serve as good guidance.
It is essential that you are comfortable with manipulating vectors and matrices, and be able to easily compute gradients of multivariate functions. If, for example, $\mtx{A}\in \R^{n\times n}$, $\vct{w}\in \R^n$ and $\vct{x}\in \R^n$, then
\begin{equation*} \vct{w}^{\trans}\mtx{A}\vct{x} = \vct{x}^{\trans}\mtx{A}^{\trans}\vct{w} = \ip{\vct{w}}{\mtx{A}\vct{x}} = \ip{\mtx{A}^{\trans}\vct{w}}{\vct{x}}. \end{equation*}The gradient with respect to $\vct{x}$ of $f(\vct{x}) = \vct{w}^{\trans}\mtx{A}\vct{x}$ is
\begin{equation*} \nabla_{\vct{x}} f(\vct{x}) = \mtx{A}^{\trans}\vct{w}, \end{equation*}Note that in particular, the gradient $\nabla_{\vct{x}}$ of a linear form $\vct{w}^{\trans}\vct{x}$ is $\vct{w}$, and the gradient $\nabla_{\vct{w}}$ is $\vct{x}$. For a quadratic function $f(\vct{x})=\vct{x}^{\trans}\mtx{A}\vct{x}$,
\begin{equation*} \nabla_{\vct{x}} = 2\mtx{A}\vct{x}. \end{equation*}It is also recommended to have a look at the Preliminaries document, which is included as appendix in the complete set of lecture notes. If in doubt, you may want to write the functions out and work as in Example 2.11
We have seen various criteria for sets to be convex, and for functions to be convex.
Convexity of functions.
Convexity of sets.
It helps to have a collection of functions and sets in mind that are convex. When determining whether a set in $\R^2$ is convex (or a function of one variable), a small sketch may help.
For convex functions taking values in $\R^2$, one should be able to sketch the level sets.
We have seen a class of algorithms called descent algorithms, of which the most important are
Note that we encountered Newton's method in two guises: as a minimization algorithm, and as a root-finding algorithm. The minimization version of Newton's algorithm for a function $f$ is merely the root-finding version of Newton's method for the gradient $\nabla f$.
For the descent methods considered, it will be important to
Associated to iterative algorithms is the notion of Rates of Convergence. You should be able to recognise the rates of convergence of the algorithms discusses, as well as for sequences of numbers. When do Gradient Descent or Newton's Method fail to converge?
We have seen LP from the point of view of theory and from the point of view of applications.
Applications.
One should be able to easily switch between primal and dual formulations.
Algorithms / solving.
Theory.
One should also be familiar in working with convex or conic hulls of points. For example, given vectors $\vct{a}_1,\dots,\vct{a}_p\in \R^n$, the set of points \begin{equation*} \sum_{i=1}^p x_i \vct{a}_i, \quad x_i\geq 0 \end{equation*} is a convex set, a convex cone, and can be written as $\mtx{A}\vct{x}$, $\vct{x}\geq \zerovct$.
Also, if the matrix $\mtx{A}$ consists of rows $\vct{a}_i^{\trans}$, then the set of $\vct{x}$ such that $\mtx{A}\vct{x}\leq \zerovct$ is the same as the set of $\vct{x}$ such that $\ip{\vct{x}}{\vct{a}_i}\leq 0$ for all $i$.
An important point to notice here is the analogy / similarity to linear programming theory.
Theory (in order of complexity).
The above tasks can only be
It can be useful to remember some special cases. These include:
The definition of the barrier function is a feature that we haven't strictly seen in LP and is important in the context of non-linear convex optimization. It can be helpful to be able to switch quickly between matrix notation and just listing all the constraints individually!
There are two things to remember with respect to semidefinite programming:
As for examples, one should know how to transfrom simple problems into semidefinite ones. For example:
Dispersed throughout the course we had a look at various applications. These included
It is not so crucial to remember every detail in the treatment of these examples, but one should be able to explain the main ideas. In particular, how exactly the different optimization types enter, what are some the problems that can arise, and what are potential concrete applications.
Thought experiment. Imagine you need to explain to someone who is reasonably mathematically literate some of these applications, and how the different optimization types (gradient descent, quadratic programming, SDP) enter the picture. That is about the level of detail you need to remember.