Notebook

Problem 1 [55 points] Convex sets¶

(a) [5 x 5 = 25 points] Prove or disprove set convexity¶

For each of the following, state clearly if the set is convex or not. Then prove your statement.

(i) Rectangle: (vector inequality is to be understood elementwise)

$$\{x\in\mathbb{R}^{n} \mid \alpha \leq x \leq \beta, \quad \alpha,\beta\in\mathbb{R}^{n}\}.$$

(ii) Wedge:

$$\{x\in\mathbb{R}^{n} \mid a_{1}^{\top}x \leq b_{1}, \quad a_{2}^{\top}x \leq b_{2}, \quad a_{1},a_{2}\in\mathbb{R}^{n}, \quad b_{1},b_{2}\in\mathbb{R}\}.$$

(iii) Set of points closer to a given point than a given set:

$$\{x\in\mathbb{R}^{n} \mid \| x - x_{0}\|_{2} \:\leq\: \| x - y\|_{2}, \quad \forall\: y\in\mathcal{S}\subseteq\mathbb{R}^{n}, \quad x_{0}\in\mathbb{R}^{n}\}.$$

(iv) Set of points closer to one set than another:

$$\{x\in\mathbb{R}^{n} \mid \text{dist}\left(x,\mathcal{A}\right)\leq \text{dist}\left(x,\mathcal{B}\right), \quad \mathcal{A},\mathcal{B}\subset\mathbb{R}^{n}\}, \quad \text{dist}\left(x,\mathcal{A}\right):= \underset{a\in\mathcal{A}}{\inf}\|x-a\|_{2}.$$

(v) The set: $$\{c\in\mathbb{R}^{n+1} \mid c_{0} + c_{n} = 1, \big|c_{0} + c_{1}x + c_{2}x^{2} + ... + c_{n}x^{n}\big| \leq 1 \;\text{for}\; 2020\leq x\leq 2021\}.$$

Solution for 1(a):¶

(i) Convex. In fact, it is a polyhedron, since the set is an intersection of $2n$ halfspaces in $\mathbb{R}^{n}$.

(ii) Convex. In fact, a polyhedron since it is an intersection of 2 halfspaces. It is a convex (polyhedral) cone for $b_1 = b_2 =0$.

(iii) Convex. This is beacuse the set can be expressed as $$\bigcap_{y\in\mathcal{S}}\{x\in\mathbb{R}^{n} \:\mid\: \parallel x - x_{0}\parallel_{2} \:\leq\: \parallel x - y\parallel_{2}\},$$ that is, an intersection of halfspaces. Notice that for fixed $y\in\mathbb{R}^{n}$, the set $\{x\in\mathbb{R}^{n} \:\mid\: \parallel x - x_{0}\parallel_{2} \:\leq\: \parallel x - y\parallel_{2}\}$ is a halfspace.

(iv) In general, this set is nonconvex. To disprove convexity, consider counterexample: $n=1$ with $\mathcal{A}=\{-1,1\}$ and $\mathcal{B}=\{0\}$. Then $$\{x\in\mathbb{R} \:\mid\: {\rm{dist}}(x,\mathcal{A})\leq {\rm{dist}}(x,\mathcal{B}), \quad \mathcal{A},\mathcal{B}\subseteq\mathbb{R}^{n}\} = \{x \leq -1/2\} \cup \{x \geq 1/2\},$$ which is not convex.

(v) Convex. For fixed $x\in[2020,2021]$, let $$\mathcal{S}_{x} := \{c\in\mathbb{R}^{n+1}\mid c_{0}+c_{n}=1, \big|c_{0} + c_{1}x + c_{2}x^{2} + ... + c_{n}x^{n}\big| \leq 1\}.$$ For each fixed $x$, the set $\mathcal{S}_{x}$ is convex since it is an intersection of 2 halfspaces and 1 hyperplane in $\mathbb{R}^{n+1}$. The given set is $\bigcap_{2020\leq x\leq2021}\mathcal{S}_{x}$, that is, an uncountable intersection of convex sets, and hence convex (see Lec. 6, p. 2).

(b) [10 + 10 + 10 = 30 points] Support function¶

The support function of a compact convex set $\mathcal{X}\subset\mathbb{R}^{n}$ is defined as

$$h(y) := \sup_{x\in\mathcal{X}}\:\{x^{\top}y \:\mid y\in\mathbb{R}^{n}\},$$

which can be geometrically interpreted as the perpendicular distane from the origin to the supporting hyperplane of $\mathcal{X}$. Therefore, the support function uniquely determines a convex set.

(i) Suppose two compact convex sets $\mathcal{X}_{1},\mathcal{X}_{2}\subset\mathbb{R}^{n}$ have support functions $h_1(\cdot)$ and $h_2(\cdot)$, respectively. Derive the support function $h(\cdot)$ for the set sum/Minkowski sum $\mathcal{X}_{1}+\mathcal{X}_{2}$ (which is a convex set, see Lec. 6, p. 4). In other words, express $h(\cdot)$ in terms of $h_{1}(\cdot)$ and $h_2(\cdot)$.

Solution for 1(b)(i):¶

By definition, the Minkowski sum $$\mathcal{X} := \mathcal{X}_{1} + \mathcal{X}_{2} := \{x\in\mathbb{R}^{n} \:\mid\: x = x_{1} + x_{2}, \: x_{1} \in \mathcal{X}_{1} \subset \mathbb{R}^{n}, \: x_{2} \in \mathcal{X}_{2}\subset \mathbb{R}^{n}\}.$$ Therefore, its support function $$h(y) = \sup_{x = (x_{1} + x_{2})\in\mathcal{X}}\:\{x^{\top}y \:\mid y\in\mathbb{R}^{n}\} = \sup_{x_{1} \in \mathcal{X}_{1}, x_{2}\in\mathcal{X}_{2}}\:\{(x_{1} + x_{2})^{\top}y \:\mid y\in\mathbb{R}^{n}\} = \sup_{x_{1} \in \mathcal{X}_{1}}\:\{x_{1}^{\top}y \:\mid y\in\mathbb{R}^{n}\} \:+\: \sup_{x_{2}\in\mathcal{X}_{2}}\:\{x_{2}^{\top}y \:\mid y\in\mathbb{R}^{n}\} = h_{1}(y) + h_{2}(y).$$

(ii) Prove that $h(y+z) \leq h(y) + h(z)$ for any compact convex set $\mathcal{X}\subset\mathbb{R}^{n}$, and any $y,z\in\mathbb{R}^{n}$.

Solution for 1(b)(ii):¶

Using the definition of support function, it is clear that we need to establish: $\underset{x\in\mathcal{X}}{\sup}\{x^{\top}y + x^{\top}z\} \leq \underset{x\in\mathcal{X}}{\sup}\{x^{\top}y\} + \underset{x\in\mathcal{X}}{\sup}\{x^{\top}z\}$.

Let us establish more general result: $$\underset{x\in\mathcal{X}}{\sup}\{f(x) + g(x)\} \leq \left(\underset{x\in\mathcal{X}}{\sup}f(x)\right) + \left(\underset{x\in\mathcal{X}}{\sup}g(x)\right), \quad\text{for any}\;f,g.$$ Suppose $\widetilde{x}$ is the supremizer for the above LHS. But any function evaluation must not exceed its supremal/maximal evaluation: $$f(\widetilde{x}) \leq \underset{x\in\mathcal{X}}{\sup}f(x), \qquad g(\widetilde{x}) \leq \underset{x\in\mathcal{X}}{\sup}g(x).$$ Adding the above two, the result follows.

(iii) Derive the support function $h(y)$ for the Euclidean ball $\mathcal{B}(0,r)\subset\mathbb{R}^{n}$ (see Lec. 5, p. 12) with center at origin and radius $r>0$. (Hint: Cauchy-Schwarz inequality)

Solution for 1(b)(iii):¶

For any $x\in\mathcal{B}(0,r)\subset\mathbb{R}^{n}$, we have $\|x\|_{2}\leq r$. By the Cauchy-Schwarz inequality, we get

$$ |x^{\top} y| \leq \|x\|_{2} \|y\|_{2} \leq r \|y\|_{2} \quad \Rightarrow\quad -r \|y\|_{2}\leq x^{\top} y \leq r \|y\|_{2}.$$

But the specific choice $x = r\frac{y}{\|y\|_{2}}$ yields $x^{\top} y = r \|y\|_{2}$, thus achieving the upper bound above. Therefore,

$$h(y) := \sup_{x\in\mathcal{X}}\:\{x^{\top}y \:\mid y\in\mathbb{R}^{n}\} = r \|y\|_{2}.$$

Problem 2 [3 x 15 = 45 points] Convex functions¶

For each of the following, state clearly if the function is convex or concave or both or neither. Then prove your statement.

(i) $f(x,t) = \log\left(\dfrac{1}{t^{2} - x^{\top}x}\right)$ over ${\rm{\mathbf{dom}}}(f) = \{(x,t) \in \mathbb{R}^{n} \times \mathbb{R} \:\mid\: \parallel x\parallel_{2} \:<\: t\}$.

(ii) $f(p,q) = \displaystyle\sum_{i=1}^{n} p_{i}\log\left(\displaystyle\frac{p_{i}}{q_{i}}\right)$ over ${\rm{\mathbf{dom}}}(f) = \{(p,q)\in\mathbb{R}_{\geq 0}^{n}\times\mathbb{R}_{\geq 0}^{n} \:\mid\: \mathbf{1}^{\top}p = \mathbf{1}^{\top}q = 1\}$.

(iii) $f(x) = \dfrac{1}{x}\displaystyle\int_{0}^{x}\!g(t)\:{\mathrm{d}}t$ over ${\rm{\mathbf{dom}}}(f) = \mathbb{R}_{>0}$, where $g:\mathbb{R}\mapsto\mathbb{R}$ is convex and differentiable.

Solution for 2(i):¶

$f(x,t)$ is a convex function.

First, notice that $\textbf{dom}(f)$ is the second order/Lorentz/ice cream cone, which is a convex set (see Lec. 5, p. 8-9). An alternative way to show that $\textbf{dom}(f)$ is a convex set is to notice that

$$\textbf{dom}(f) = \,\text{preimage of the unit 2-norm ball}\:\{x\in\mathbb{R}^{n}\mid \|x\|_{2}\leq 1\}\,\text{under the perspective function (Lec. 6, p. 4-6)}.$$

Since the unit 2-norm ball is convex, hence by Lec. 6, p. 6, $\textbf{dom}(f)$ is a convex set.

Having established that the set convexity requirement for the domain checks out, there are two ways to argue the rest:

Approach 1: identify the operations preserving function convexity

Write: $$f(x,t) = - \log(t^{2} - x^{\top}x) = -\log t - \log\left(t - \frac{x^{\top}x}{t}\right).$$ The first term: $-\log t$ is convex over $t>0$ (Lec. 7, p. 15). Let us examine the second term above. The quadratic-over-linear function $x^{\top}x/t$ is convex on $\{(x,t)\:|\: t>0\}$ following the same arguments in Lec. 7, p. 17. Therefore, its negative $-x^{\top}x/t$ is concave. Consequently, $(t - x^{\top}x/t)$ is sum of concaves (linear is both convex and concave), and hence concave. Since $\log(\cdot)$ is concave increasing, hence by composition rules in textbook p. 84 (we mentioned this Lec. 8, p. 16), the composite function $\log(t - x^{\top}x/t)$ is concave, whose negative is $f(x,t)$. So we have shown that $f(x,t)$ is negative of a concave function, hence is convex in $(x,t)$.

Approach 2: keep calm and do calculus

Let us check the second order (Hessian) condition. For $h:\mathbb{R}\mapsto\mathbb{R}$, $g:\mathbb{R}^{m}\mapsto\mathbb{R}$, notice that the chain rule for Hessian of the composition $f(z) = h \circ g (z)$ is

$$\text{Hess}\left(f\right) = h^{\prime}\left(g(z)\right)\:\text{Hess}(g) \: + \: h^{\prime\prime}(g(z))\:\left(\nabla g\right)\left(\nabla g\right)^{\top}, \quad \text{which is an $m\times m$ symmetric matrix}.$$

For us, $h(.) = -\log(.)$, $g(z) = z^{\top}Dz$ where $z=(t,x)\in\mathbb{R}^{n+1}$, and the $(n+1)\times(n+1)$ diagonal matrix $D={\rm{diag}}(1,-\mathbf{1})$. Direct application of the above chain rule gives

$$\text{Hess}\left(f\right) = \displaystyle\frac{2}{\left(z^{\top}Dz\right)^2}\left[2Dzz^{\top}D \: - \: \left(z^{\top}Dz\right)D\right].$$

Since the pre-factor of the Hessian above is positive, all that remains to show is: $y^{\top}\left[2Dzz^{\top}D \: - \: \left(z^{\top}Dz\right)D\right]y \geq 0$ for all $y\in\mathbb{R}^{n+1}$. This follows from noting that

\begin{aligned} y^{\top}\left[2Dzz^{\top}D \: - \: \left(z^{\top}Dz\right)D\right]y &= 2 \left(y^{\top}Dz\right)^{2} \: - \: \left(y^{\top}Dy\right) \left(z^{\top}Dz\right)\\ &= 2\left(y_{1}z_{1} \:-\: y_{2}z_{2} \:-\: y_{3}z_{3} \:-\: ... \:-\: y_{n+1}z_{n+1}\right)^{2} \: - \: \left(y_{1}^{2} \:-\: y_{2}^{2} \:-\: y_{3}^{2} \:-\: ... \:-\: y_{n+1}^{2}\right)\left(z_{1}^{2} \:-\: z_{2}^{2} \:-\: z_{3}^{2} \:-\: ... \:-\: z_{n+1}^{2}\right)\\ &= \left(y_{1}z_{1} \:-\: y_{2}z_{2} \:-\: y_{3}z_{3} \:-\: ... \:-\: y_{n+1}z_{n+1}\right)^{2} \: + \: \displaystyle\sum_{i\neq j} \left(y_{i}z_{j}\right)^{2}\\ &\geq 0. \end{aligned}

Solution for 2(ii):¶

$f(p,q)$ is a convex function.

First, notice that $\textbf{dom}(f)$ is the Cartesian product of two standard/probability simplices (Lec. 5, p. 15). Each of the simplices being an intersection of 2 hyperplanes and 2 halfspaces, is a convex set, in fact a polyhedron. Since the Cartesian product of convex sets is convex (lEC. 6, P. 3), $\textbf{dom}(f)$ is a convex set.

Having established that the set convexity requirement for the domain checks out, there are two ways to argue the rest:

Approach 1: identify the operations preserving function convexity

See Example 3.19 in textbook.

Approach 2: check the zeroth order condition (Lec. 7, p. 7-8)

Choose two diffrent pairs of probability vectors $(p_{1},q_{1})\in\textbf{dom}(f)$ and $(p_{2},q_{2})\in\textbf{dom}(f)$. For any $0\leq\theta\leq 1$, let $$p = \theta p_{1} + (1-\theta)p_{2}, \quad q = \theta q_{1} + (1-\theta)q_{2}.$$ We will now prove that $$f(p,q) \leq \theta f(p_{1},q_{1}) \:+\: (1-\theta) f(p_{2},q_{2}).$$ We start with the following Lemma.

Lemma (Log sum inequality): Let $(a,b)\in\mathbb{R}^{n}_{\geq 0} \times \mathbb{R}^{n}_{\geq 0}$. Then

$$\displaystyle\sum_{i=1}^{n} a_{i}\log\displaystyle\frac{a_{i}}{b_{i}} \geq \left(\displaystyle\sum_{i=1}^{n}a_{i}\right)\log\displaystyle\frac{\displaystyle\sum_{i=1}^{n}a_{i}}{\displaystyle\sum_{i=1}^{n}b_{i}}.$$

Proof: For notational ease, let us denote $\sum_{i}a_{i}=\alpha$, and $\sum_{i}b_{i}=\beta$. Recalling that $g(x) = x\log x$ is a convex function (which can be easily checked), we have

$$ \displaystyle\sum_{i=1}^{n} a_{i}\log\displaystyle\frac{a_{i}}{b_{i}} = \displaystyle\sum_{i=1}^{n} b_{i} g\left(\displaystyle\frac{a_{i}}{b_{i}}\right) = \beta \displaystyle\sum_{i=1}^{n} \displaystyle\frac{b_{i}}{\beta} g\left(\displaystyle\frac{a_{i}}{b_{i}}\right) \geq \beta g\left(\displaystyle\sum_{i=1}^{n}\displaystyle\frac{b_{i}}{\beta}\displaystyle\frac{a_{i}}{b_{i}}\right) \geq \beta g\left(\displaystyle\frac{1}{\beta}\displaystyle\sum_{i=1}^{n}a_{i}\right) = \beta g\left(\frac{\alpha}{\beta}\right) = \alpha \log\frac{\alpha}{\beta},$$

where the inequality follows from the Jensen's inequality applied to the convex function $g(.)$. We have also used $b_{i}/\beta \geq 0$, $\sum_{i}b_{i}/\beta = 1$.

Now back to our function Kullback-Leibler divergence, let $a_{1i}=\theta p_{1i}$, $a_{2i}=(1-\theta) p_{2i}$, $b_{1i}=\theta q_{1i}$, $b_{2i}=(1-\theta) q_{2i}$. Then, $$\begin{aligned} f(p,q) = \displaystyle\sum_{i=1}^{n} \left(\theta p_{1i} + (1-\theta)p_{2i}\right)\log\displaystyle\frac{\theta p_{1i} + (1-\theta)p_{2i}}{\theta q_{1i} + (1-\theta)q_{2i}} &= \displaystyle\sum_{i=1}^{n} \left(a_{1i} + a_{2i}\right) \log\displaystyle\frac{a_{1i}+a_{2i}}{b_{1i}+b_{2i}}\\ &\leq \displaystyle\sum_{i=1}^{n} \left(a_{1i}\log\displaystyle\frac{a_{1i}}{b_{1i}} + a_{2i}\log\displaystyle\frac{a_{2i}}{b_{2i}}\right) \qquad \text{(using the log sum inequality from above lemma)}\\ &= \displaystyle\sum_{i=1}^{n} \left(\theta p_{1i}\log\displaystyle\frac{\theta p_{1i}}{\theta q_{1i}} \: + \: (1-\theta)p_{2i}\log\displaystyle\frac{(1-\theta)p_{2i}}{(1-\theta)q_{2i}}\right) \\ &= \theta\:f(p_{1},q_{1}) \: + \: (1-\theta)\:f(p_{2},q_{2}). \end{aligned}$$

Solution for 2(iii):¶

$f(x)$ is a convex function.

First, $\textbf{dom}(f)$, being a halfspace, is a convex set.

Next, we check the second order condition for $f(x)$ as follows: \begin{align} f^{\prime}(x) &= -\dfrac{1}{x^2}\int_{0}^{x}g(t)\:{\mathrm{d}}t \:+\: \frac{1}{x}g(x)\\ \Rightarrow f^{\prime\prime}(x) &= \frac{2}{x^3} \bigg\{\left(\int_{0}^{x}g(t)\:{\mathrm{d}}t\right) - xg(x) + \frac{x^2}{2}g^{\prime}(x)\bigg\} \\ &= \frac{2}{x^3} \int_{0}^{x}\left(g(t) - g(x) -g^{\prime}(x)(t-x)\right){\mathrm{d}}t \geq 0 \end{align} where the last inequality is due to the first order condition of convexity for the differentiable function $g(\cdot)$, which makes the above integrand, and hence the integral $\geq 0$.