versioninfo()
Julia Version 1.6.2 Commit 1b93d53fc4 (2021-07-14 15:36 UTC) Platform Info: OS: macOS (x86_64-apple-darwin18.7.0) CPU: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
using Pkg
Pkg.activate("../..")
Pkg.status()
Activating environment at `~/Dropbox/class/M1399.000200/2021/M1399_000200-2021fall/Project.toml`
Status `~/Dropbox/class/M1399.000200/2021/M1399_000200-2021fall/Project.toml` [7d9fca2a] Arpack v0.4.0 [6e4b80f9] BenchmarkTools v1.2.0 [1e616198] COSMO v0.8.1 [f65535da] Convex v0.14.16 [a93c6f00] DataFrames v1.2.2 [31a5f54b] Debugger v0.6.8 [31c24e10] Distributions v0.24.18 [e2685f51] ECOS v0.12.3 [f6369f11] ForwardDiff v0.10.21 [28b8d3ca] GR v0.61.0 [c91e804a] Gadfly v1.3.3 [bd48cda9] GraphRecipes v0.5.7 [2e9cd046] Gurobi v0.9.14 [f67ccb44] HDF5 v0.14.3 [82e4d734] ImageIO v0.5.8 [6218d12a] ImageMagick v1.2.1 [916415d5] Images v0.24.1 [b6b21f68] Ipopt v0.7.0 [42fd0dbc] IterativeSolvers v0.9.1 [4076af6c] JuMP v0.21.10 [b51810bb] MatrixDepot v1.0.4 [6405355b] Mosek v1.2.1 [1ec41992] MosekTools v0.9.4 [76087f3c] NLopt v0.6.3 [47be7bcc] ORCA v0.5.0 [a03496cd] PlotlyBase v0.4.3 [f0f68f2c] PlotlyJS v0.15.0 [91a5bcdd] Plots v1.22.6 [438e738f] PyCall v1.92.3 [d330b81b] PyPlot v2.10.0 [dca85d43] QuartzImageIO v0.7.3 [6f49c342] RCall v0.13.12 [ce6b1742] RDatasets v0.7.5 [c946c3f1] SCS v0.7.1 [276daf66] SpecialFunctions v1.7.0 [2913bbd2] StatsBase v0.33.12 [f3b207a7] StatsPlots v0.14.28 [b8865327] UnicodePlots v2.4.6 [0f1e0344] WebIO v0.8.16 [8f399da3] Libdl [2f01184e] SparseArrays [10745b16] Statistics
where $c>0$ and $a_i \in \mathbb{R}$, is called a monomial.
where $c_k > 0$, is called a posynomial.
Examples
Monomials are closed under multiplication and division.
Posynomials are closed under addition, multiplication, and positive scaling.
where $f_0, \ldots, f_m$ are posynomials and $h_1, \ldots, h_p$ are monomials. The constraint $\mathbf{x} > \mathbf{0}$ is implicit.
Q: Is GP a convex optimization problem?
With change of variable $y_i = \log x_i$, a monomial
can be written as \begin{eqnarray*} f(\mathbf{x}) = f(e^{y_1}, \ldots, e^{y_n}) = c (e^{y_1})^{a_1} \cdots (e^{y_n})^{a_n} = e^{\mathbf{a}^T \mathbf{y} + b}, \end{eqnarray*} where $b = \log c$. Similarly, we can write a posynomial as \begin{eqnarray*} f(\mathbf{x}) &=& \sum_{k=1}^K c_k x_1^{a_{1k}} x_2^{a_{2k}} \cdots x_n^{a_{nk}} \\ &=& \sum_{k=1}^K e^{\mathbf{a}_k^T \mathbf{y} + b_k}, \end{eqnarray*} where $\mathbf{a}_k = (a_{1k}, \ldots, a_{nk})$ and $b_k = \ln c_k$.
where $\mathbf{a}_{ik}, \mathbf{g}_i \in \mathbb{R}^n$. Taking log of both objective and constraint functions, we obtain the geometric program in convex form \begin{eqnarray*} &\text{minimize}& \log \left(\sum_{k=1}^{K_0} e^{\mathbf{a}_{0k}^T \mathbf{y} + b_{0k}}\right) \\ &\text{subject to}& \log \left(\sum_{k=1}^{K_i} e^{\mathbf{a}_{ik}^T \mathbf{y} + b_{ik}}\right) \le 0, \quad i = 1,\ldots,m \\ & & \mathbf{g}_i^T \mathbf{y} + h_i = 0, \quad i=1,\ldots,p. \end{eqnarray*}
*Why do we require monomials for equality constraints?*
Let a random variable $X$ has a multinomial distribution with parameters $n$ and $\mathbf{p} = (p_1, \dotsc , p_r)$ with order constraints $p_1 \le \dotsb \le p_r$.
Observed frequencies of the classes are $(n_1, \dotsc, n_r)$.
The MLE is obtained by solving
This is not a GP (why?).
(why is this a GP? Why equivalence?)
The MLE solves \begin{eqnarray*} &\text{minimize}& \prod_{i:y_i=1} e^{-\mathbf{x}_i^T \beta} \prod_{i=1}^n \left( 1 + e^{\mathbf{x}_i^T \beta} \right). \end{eqnarray*} Let $z_j = e^{\beta_j}$, $j=1,\ldots,p$. The objective becomes \begin{eqnarray*} \prod_{i:y_i=1} \prod_{j=1}^p z_j^{-x_{ij}} \prod_{i=1}^n \left( 1 + \prod_{j=1}^p z_j^{x_{ij}} \right). \end{eqnarray*} This leads to a GP \begin{eqnarray*} &\text{minimize}& \prod_{i:y_i=1} s_i \prod_{i=1}^n t_i \\ &\text{subject to}& \prod_{j=1}^p z_j^{-x_{ij}} \le s_i, \quad i = 1,\ldots,m \\ & & 1 + \prod_{j=1}^p z_j^{x_{ij}} \le t_i, \quad i = 1, \ldots, n, \end{eqnarray*} in variables $\mathbf{s} \in \mathbb{R}^{m}$, $\mathbf{t} \in \mathbb{R}^n$, and $\mathbf{z} \in \mathbb{R}^p$. Here $m$ is the number of observations with $y_i=1$.
Lim, J., Kim, S.J. and Wang, X., 2009. Estimating stochastically ordered survival functions via geometric programming. Journal of Computational and Graphical Statistics, 18(4), pp.978-994. https://doi.org/10.1198/jcgs.2009.06140
Function $f$ is a generalized posynomial if it can be formed using addition, multiplication, positive real power, and maximum, starting from posynomials (hence closed for these operations). For example,
A generalized geometric program is of form
where $f_0, \ldots, f_m$ are generalized posynomials and $h_1, \ldots, h_p$ are monomials. The constraint $\mathbf{x} > \mathbf{0}$ is implicit.
is equivalent to GP
$$ \begin{array}{ll} \text{minimize} & t_1 \\ \text{subject to} & x+z \le t_1, 1+t_2^{1/2} \le t_1, \\ & y + z \le t_2, \\ & t_3 + t_4 \le 1, \\ & y \le t_3, z_2 \le t_3, \\ & yz \le t_4, 0.3 \le t_4, \\ & 3xy/z = 1. \end{array} $$Mosek is capable of solving GP. cvx
has a GP mode that recognizes and transforms GP problems.
Unfortunately, Convex.jl
does not directly suports GP. However, it supports more general exponential cone representable problems.
Doubly unfortunately, Mosek does not support general exponential cone constraints. So we have to turn to either SCS, COSMO, or ECOS.
Exponential cone representability is defined in a usual sense.
The log-sum-exp function $f(\mathbf{x})=\log(\sum_{i=1}^n e^{x_i})$ is exponential cone representable:
using Convex
## Use SCS solver
#using SCS
#opt = () -> SCS.Optimizer(verbose=0)
## Use COSMO solver
#using COSMO
#opt = () -> COSMO.Optimizer(max_iter=10000, verbose=false)
# Use ECOS solver
using ECOS
opt = () -> ECOS.Optimizer(verbose=true)
n=[4, 2, 374] # trinomial with n=380
x = Variable(length(n)) # x = log(p)
problem = maximize(dot(n, x))
for i=1:length(n)-1
problem.constraints += x[i] - x[i+1] <= 0
end
problem.constraints += logsumexp(x) <= 0
# Solve the problem
@time solve!(problem, opt)
22.295463 seconds (51.02 M allocations: 2.919 GiB, 6.02% gc time) ECOS 2.0.5 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS It pcost dcost gap pres dres k/t mu step sigma IR | BT 0 +0.000e+00 -1.910e+00 +1e+01 7e-01 7e-01 1e+00 1e+00 --- --- 0 0 - | - - 1 -9.937e+00 -8.408e+00 +3e+00 5e-01 8e-01 4e+00 2e-01 0.7926 2e-02 2 1 2 | 0 0 2 -5.120e+01 -3.728e+01 +7e-01 6e-01 8e-01 2e+01 5e-02 0.7833 5e-02 1 1 1 | 2 1 3 -2.176e+02 -1.567e+02 +1e-01 6e-01 7e-01 6e+01 1e-02 0.7833 1e-02 2 1 2 | 1 1 4 -2.998e+02 -2.211e+02 +8e-02 6e-01 6e-01 8e+01 6e-03 0.5013 1e-01 1 1 1 | 2 3 5 -3.175e+02 -2.426e+02 +5e-02 7e-01 5e-01 8e+01 4e-03 0.5013 2e-01 2 1 1 | 3 3 6 -2.350e+02 -1.957e+02 +2e-02 6e-01 3e-01 4e+01 1e-03 0.9791 4e-01 1 1 1 | 5 0 7 -1.219e+02 -1.036e+02 +4e-03 2e-01 8e-02 2e+01 3e-04 0.7833 4e-02 1 1 1 | 1 1 8 -2.498e+01 -1.373e+01 +1e-03 1e-01 4e-02 1e+01 1e-04 0.7833 1e-01 1 1 1 | 3 1 9 +1.584e+01 +1.910e+01 +3e-04 3e-02 1e-02 3e+00 2e-05 0.9791 2e-01 1 1 1 | 4 0 10 +2.285e+01 +2.584e+01 +1e-04 3e-02 8e-03 3e+00 1e-05 0.9791 6e-01 1 1 1 | 8 0 11 +2.859e+01 +2.984e+01 +5e-05 1e-02 3e-03 1e+00 4e-06 0.6266 6e-02 2 1 1 | 2 2 12 +3.249e+01 +3.294e+01 +2e-05 4e-03 1e-03 5e-01 1e-06 0.9791 3e-01 1 1 1 | 5 0 13 +3.424e+01 +3.440e+01 +4e-06 1e-03 4e-04 2e-01 3e-07 0.7833 1e-01 2 1 1 | 3 1 14 +3.474e+01 +3.479e+01 +1e-06 4e-04 1e-04 5e-02 1e-07 0.9791 3e-01 2 0 0 | 5 0 15 +3.495e+01 +3.496e+01 +3e-07 1e-04 3e-05 1e-02 2e-08 0.7833 9e-03 2 0 0 | 1 1 16 +3.499e+01 +3.499e+01 +7e-08 2e-05 7e-06 3e-03 5e-09 0.7833 1e-02 1 0 0 | 1 1 17 +3.500e+01 +3.500e+01 +3e-08 9e-06 3e-06 1e-03 2e-09 0.6266 5e-02 1 0 0 | 2 2 18 +3.500e+01 +3.500e+01 +6e-09 2e-06 7e-07 2e-04 5e-10 0.7833 9e-03 2 0 0 | 1 1 19 +3.500e+01 +3.500e+01 +2e-09 8e-07 3e-07 1e-04 2e-10 0.6266 5e-02 2 0 0 | 2 2 20 +3.500e+01 +3.500e+01 +6e-10 2e-07 6e-08 2e-05 4e-11 0.7833 9e-03 2 0 0 | 1 1 21 +3.500e+01 +3.500e+01 +2e-10 7e-08 2e-08 9e-06 2e-11 0.6266 5e-02 2 0 0 | 2 2 22 +3.500e+01 +3.500e+01 +5e-11 2e-08 6e-09 2e-06 4e-12 0.7833 9e-03 0 0 0 | 1 1 23 +3.500e+01 +3.500e+01 +2e-11 6e-09 2e-09 8e-07 2e-12 0.6266 5e-02 0 0 0 | 2 2 OPTIMAL (within feastol=6.5e-09, reltol=5.7e-13, abstol=2.0e-11). Runtime: 0.000265 seconds.
exp.(x.value) # recover trinomial probabilities
3×1 Matrix{Float64}: 0.00789473078831494 0.007894736202882328 0.9842105440627644
sum(exp.(x.value))
1.0000000110539617
### https://github.com/JuliaOpt/Convex.jl/blob/master/docs/examples_literate/general_examples/logistic_regression.jl
using DataFrames
using Plots
using RDatasets
iris = dataset("datasets", "iris")
## outcome variable: +1 for versicolor, -1 otherwise
Y = [species == "versicolor" ? 1.0 : -1.0 for species in iris.Species]
## create data matrix with one column for each feature (first column corresponds to offset)
X = hcat(ones(size(iris, 1)), iris.SepalLength, iris.SepalWidth, iris.PetalLength, iris.PetalWidth);
## solve the logistic regression problem
n, p = size(X)
beta = Variable(p)
problem = minimize(logisticloss(-Y.*(X*beta)))
solve!(problem, opt)
ECOS 2.0.5 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS It pcost dcost gap pres dres k/t mu step sigma IR | BT 0 +0.000e+00 -2.236e+02 +1e+03 6e-01 2e+00 1e+00 1e+00 --- --- 0 0 - | - - 1 -4.981e+00 -1.187e+02 +5e+02 3e-01 1e+00 5e-01 4e-01 0.6266 1e-01 2 1 1 | 3 2 2 +2.098e+01 -4.162e+01 +2e+02 2e-01 6e-01 3e-01 2e-01 0.6266 2e-01 2 1 1 | 4 2 3 +4.578e+01 +1.822e+01 +9e+01 6e-02 2e-01 1e-01 8e-02 0.6266 5e-02 2 1 1 | 2 2 4 +6.577e+01 +5.882e+01 +2e+01 1e-02 6e-02 4e-02 2e-02 0.7833 1e-02 2 1 1 | 1 1 5 +7.113e+01 +6.945e+01 +4e+00 3e-03 1e-02 9e-03 4e-03 0.7833 1e-02 1 1 1 | 1 1 6 +7.193e+01 +7.124e+01 +2e+00 1e-03 6e-03 4e-03 2e-03 0.6266 6e-02 2 1 1 | 2 2 7 +7.223e+01 +7.187e+01 +9e-01 7e-04 3e-03 2e-03 8e-04 0.9791 5e-01 1 1 1 | 7 0 8 +7.240e+01 +7.223e+01 +4e-01 3e-04 1e-03 1e-03 4e-04 0.6266 1e-01 1 1 1 | 3 2 9 +7.244e+01 +7.232e+01 +3e-01 2e-04 1e-03 7e-04 3e-04 0.9791 7e-01 2 1 1 | 10 0 10 +7.253e+01 +7.253e+01 +8e-03 7e-06 3e-05 2e-05 8e-06 0.9791 9e-03 1 1 1 | 1 0 11 +7.253e+01 +7.253e+01 +8e-03 6e-06 3e-05 2e-05 7e-06 0.1643 6e-01 2 1 1 | 8 8 12 +7.253e+01 +7.253e+01 +2e-03 1e-06 6e-06 4e-06 2e-06 0.7833 9e-03 1 1 1 | 1 1 13 +7.253e+01 +7.253e+01 +7e-04 6e-07 2e-06 2e-06 7e-07 0.6266 5e-02 2 1 0 | 2 2 14 +7.253e+01 +7.253e+01 +2e-04 1e-07 6e-07 4e-07 2e-07 0.7833 9e-03 2 0 1 | 1 1 15 +7.253e+01 +7.253e+01 +6e-05 5e-08 2e-07 2e-07 6e-08 0.6266 5e-02 2 0 0 | 2 2 16 +7.253e+01 +7.253e+01 +1e-05 1e-08 5e-08 4e-08 1e-08 0.7833 9e-03 1 0 0 | 1 1 17 +7.253e+01 +7.253e+01 +6e-06 5e-09 2e-08 1e-08 5e-09 0.6266 5e-02 1 0 0 | 2 2 18 +7.253e+01 +7.253e+01 +1e-06 1e-09 5e-09 3e-09 1e-09 0.7833 9e-03 1 0 0 | 1 1 19 +7.253e+01 +7.253e+01 +5e-07 4e-10 2e-09 1e-09 5e-10 0.6266 5e-02 1 0 0 | 2 2 OPTIMAL (within feastol=1.8e-09, reltol=7.1e-09, abstol=5.1e-07). Runtime: 0.006829 seconds.
using Plots
logistic(x::Real) = inv(exp(-x) + one(x))
perm = sortperm(vec(X*beta.value))
plot(1:n, (Y[perm] .+ 1)/2, st=:scatter)
plot!(1:n, logistic.(X*beta.value)[perm])
Boyd, S., Kim, S.J., Vandenberghe, L. and Hassibi, A., 2007. A tutorial on geometric programming. Optimization and Engineering, 8(1), p.67. https://doi.org/10.1007/s11081-007-9001-7
Glineur, F., 2000. An extended conic formulation for geometric optimization. Foundation of Computing and Decision Sciences, 25(3), p.161. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.9181
Chandrasekaran, V. and Shah, P., 2017. Relative entropy optimization and its applications. Mathematical Programming, 161(1-2), pp.1-32. https://doi.org/10.1007/s10107-016-0998-2
Many parts of this lecture note is based on Dr. Hua Zhou's 2019 Spring Statistical Computing course notes available at http://hua-zhou.github.io/teaching/biostatm280-2019spring/index.html.