The section "Sum-of-Squares approach" of notebook uses features of SumOfSquares.jl and PolyJuMP.jl that are not yet released. Please do the following to use the "master" branch
Pkg.checkout("SumOfSquares")
Pkg.checkout("PolyJuMP")
You can undo these with the following two lines
Pkg.free("SumOfSquares")
Pkg.free("PolyJuMP")
Consider the polynomial optimization problem of minimizing the polynomial $x^3 - x^2 + 2xy -y^2 + y^3$ over the polyhedron defined by the inequalities $x \ge 0, y \ge 0$ and $x + y \geq 1$.
using DynamicPolynomials
@polyvar x y
p = x^3 - x^2 + 2x*y -y^2 + y^3
using SemialgebraicSets
S = @set x >= 0 && y >= 0 && x + y >= 1
p(x=>1, y=>0), p(x=>1//2, y=>1//2), p(x=>0, y=>1)
(0, 1//4, 0)
The optimal solutions are $(x, y) = (1, 0)$ and $(x, y) = (0, 1)$ with objective value $0$ but Ipopt only finds the local minimum $(1/2, 1/2)$ with objective value $1/4$.
using JuMP
using Ipopt
m = Model(solver=IpoptSolver(print_level=0))
@variable m a >= 0
@variable m b >= 0
@constraint m a + b >= 1
@NLobjective(m, Min, a^3 - a^2 + 2a*b - b^2 + b^3)
status = solve(m)
@show status
@show getvalue(a)
@show getvalue(b)
@show getobjectivevalue(m);
****************************************************************************** This program contains Ipopt, a library for large-scale nonlinear optimization. Ipopt is released as open source code under the Eclipse Public License (EPL). For more information visit http://projects.coin-or.org/Ipopt ****************************************************************************** status = :Optimal getvalue(a) = 0.49999999667053496 getvalue(b) = 0.4999999966706694 getobjectivevalue(m) = 0.2499999950059033
Surprisingly, with the following equivalent model, Ipopt finds the correct optimal solution.
using JuMP
using Ipopt
m = Model(solver=IpoptSolver(print_level=0))
@variable m a >= 0
@variable m b >= 0
@constraint m a + b >= 1
peval(a, b) = p(x=>a, y=>a)
JuMP.register(m, :peval, 2, peval, autodiff=true)
@NLobjective(m, Min, peval(a, b))
status = solve(m)
@show status
@show getvalue(a)
@show getvalue(b)
@show getobjectivevalue(m);
status = :Optimal getvalue(a) = 0.00011889415775002371 getvalue(b) = 1.0256125965753107 getobjectivevalue(m) = 3.361333003660522e-12
using CSDP
solver = CSDPSolver(printlevel=0);
using Mosek
solver = MosekSolver(LOG=0);
A Sum-of-Squares certificate that $p \ge \alpha$ over the domain S
, ensures that $\alpha$ is a lower bound to the polynomial optimization problem.
The following program searches for the largest upper bound and find zero.
using JuMP
using SumOfSquares
m = SOSModel(solver = solver)
@variable m α
@objective m Max α
c3 = @constraint m p >= α domain = S
status = solve(m)
@show status
@show getobjectivevalue(m);
status = :Optimal getobjectivevalue(m) = -2.0093032793155885e-10
Using the solution $(1/2, 1/2)$ found by Ipopt of objective value $1/4$
and this certificate of lower bound $0$ we know that the optimal objective value is in the interval $[0, 1/4]$
but we still do not know what it is (if we consider that we did not try the solutions $(1, 0)$ and $(0, 1)$ as done in the introduction).
If the dual of the constraint c3
was atomic, its atoms would have given optimal solutions of objective value $0$ but that is not the case.
using MultivariateMoments
μ3 = -1 * getdual(c3) # /!\ Need to multiply by -1 because the constraint is ">=" and not "<="
X3 = certificate_monomials(PolyJuMP.getdelegate(c3))
ν3 = matmeasure(μ3, X3)
extractatoms(ν3, 1e-3) # Returns nothing as the dual is not atomic
Nullable{MultivariateMoments.AtomicMeasure{Float64,Array{DynamicPolynomials.PolyVar{true},1}}}()
Fortunately, there is a hierarchy of programs with increasingly better programs that can be solved until we get one with atom dual variables.
This comes from the way the Sum-of-Squares constraint with domain S
is formulated.
The polynomial $p - \alpha$ is guaranteed to be nonnegative over the domain S
if there exists Sum-of-Squares polynomials $s_0$, $s_1$, $s_2$, $s_3$ such that
$$ p - \alpha = s_0 + s_1 x + s_2 y + s_3 (x + y - 1). $$
Indeed, in the domain S
, $x$, $y$ and $x + y - 1$ are nonnegative so the right-hand side is a sum of squares hence is nonnegative.
Once the degrees of $s_1$, $s_2$ and $s_3$ have been decided, the degree needed for $s_0$ will be determined but we have a freesom in choosing the degrees of $s_1$, $s_2$ and $s_3$.
By default, they are chosen so that the degrees of $s_1 x$, $s_2 y$ and $s_3 (x + y - 1)$ match those of $p - \alpha$ but this can be overwritten using the $maxdegree$ keyword argument.
The maximum total degree (i.e. maximum sum of the exponents of $x$ and $y$) of the monomials of $p$ is 3 so the constraint in the program above is equivalent to @constraint m p >= α domain = S maxdegree = 3
..
That is, since $x$, $y$ and $x + y - 1$ have total degree 1, the sum of squares polynomials $s_1$, $s_2$ and $s_3$ have been chosen with maximum total degree $2$.
Since these polynomials are sums of squares, their degree must be even so the next maximum total degree to try is 4.
For this reason, the keywords maxdegree = 4
and maxdegree = 5
have the same effect in this example.
In general, if the polynomials in the domain are not all odd or all even, each value of maxdegree
has different effect in the choice of the maximum total degree of $s_i$.
using JuMP
using SumOfSquares
m = SOSModel(solver = solver)
@variable m α
@objective m Max α
c5 = @constraint m p >= α domain = S maxdegree = 5
status = solve(m)
@show status
@show getobjectivevalue(m);
status = :Optimal getobjectivevalue(m) = -8.670577589242612e-10
This time, the dual variable is atomic as it is the moments of the measure $$0.5 \delta(x-1, y) + 0.5 \delta(x, y-1)$$ where $\delta(x, y)$ is the dirac measure centered at $(0, 0)$. Therefore the program provides both a certificate that $0$ is a lower bound and a certificate that it is also an upper bound since it is attained at the global minimizers $(1, 0)$ and $(0, 1)$.
using MultivariateMoments
μ5 = -1 * getdual(c5) # /!\ Need to multiply by -1 because the constraint is ">=" and not "<="
X5 = certificate_monomials(PolyJuMP.getdelegate(c5))
ν5 = matmeasure(μ5, X5)
extractatoms(ν5, 1e-3)
Nullable{MultivariateMoments.AtomicMeasure{Float64,Array{DynamicPolynomials.PolyVar{true},1}}}(Atomic measure on the variables DynamicPolynomials.PolyVar{true}[x, y] with 2 atoms: at [-0.00109071, 1.00109] with weight 0.4990820208096022 at [0.99992, 8.03849e-5] with weight 0.5006917404344113 )
The extractatoms
function requires a ranktol
argument that we have set to 1e-3
in the preceding section.
This argument is used to transform the dual variable into a system of polynomials equations whose solutions give the atoms.
This transformation uses the SVD decomposition of the matrix of moments and discard the equations corresponding to a singular value lower than ranktol
.
When this system of equation has an infinite number of solutions, extractatoms
concludes that the measure is not atomic.
For instance, with maxdegree = 3
, we obtain the system
$$x + y = 1$$
which contains a whole line of solution.
This explains extractatoms
returned nothing
.
ν3 = matmeasure(μ3, X3)
MultivariateMoments.computesupport!(ν3, 1e-3)
Algebraic Set defined by 1 equality -x - 0.9999999999999996y + 1.0000000000569829 == 0
With maxdegree = 5
, we obtain the system
\begin{align}
x + y & = 1\\
y^2 & = y\\
xy & = 0\\
x^2 + y & = 1
\end{align}
ν5 = matmeasure(μ5, X5)
MultivariateMoments.computesupport!(ν5, 1e-3)
Algebraic Set defined by 4 equalities -x - 1.0000000000000002y + 1.0000000002403162 == 0 -y^2 + 1.0011710975756178y - 8.047253705778523e-5 == 0 -xy - 3.157029709751049e-12y + 0.0001853598953271854 == 0 -x^2 - 1.0011710975546133y + 1.001090625265065 == 0
This system can be reduced to the equivalent system \begin{align} x + y & = 1\\ y^2 & = y \end{align} which has the solutions $(0, 1)$ and $(1, 0)$.
SemialgebraicSets.computegröbnerbasis!(ideal(get(ν5.support)))
get(ν5.support)
Algebraic Set defined by 2 equalities x + 1.0000000000000002y - 1.0000000002403162 == 0 y^2 - 1.0011710975756178y + 8.047253705778523e-5 == 0
The function extractatoms
then reuse the matrix of moments to find the weights $1/2$, $1/2$ corresponding to the diracs centered respectively at $(0, 1)$ and $(1, 0)$.
This details the how the function obtained the result
$$0.5 \delta(x-1, y) + 0.5 \delta(x, y-1)$$
given in the previous section.