NLopt
with autodiff and numerical gradients¶Presented by Chiyoung Ahn (https://github.com/chiyahn)
This notebook demonstrates how nonlinear optimization problems in NLopt
can be solved without the need of specifying analytic formulae for gradients by autodifferentiation from ForwardDiff
and Flux
or numerical gradients.
] add NLopt BenchmarkTools ForwardDiff NLSolversBase DiffResults Flux
using NLopt, BenchmarkTools, ForwardDiff, NLSolversBase, DiffResults, Flux
using Flux.Tracker: gradient_
NLopt
¶First, define the objective function f
and the corresponding gradient g!
. In NLopt
, evaluation of f
and execution of g!
take place in the same time:
# call g! then return f(x)
function fg!(x::Vector, grad::Vector)
if length(grad) > 0 # gradient of f(x)
grad[1] = -2*x[1]*(x[1]^2 + x[2]^2)
grad[2] = -2*x[2]*(x[1]^2 + x[2]^2)
end
return -(x[1]^2 + x[2]^2)
end
fg! (generic function with 1 method)
and the corresponding optimization problem:
opt = Opt(:LD_LBFGS, 2) # 2 indicates the length of `x`
lower_bounds!(opt, [-1.0, -1.0]) # find `x` above -2.0
upper_bounds!(opt, [2.0, 2.0]) # find `x` below 2.0
min_objective!(opt, fg!) # specifies that optimization problem is on minimization
and solve it!
(minf,minx,ret) = @btime optimize($opt, [1.0, 1.0])
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
3.429 ms (21 allocations: 976 bytes) got -8.0 at [2.0, 2.0] after 2 iterations (returned SUCCESS)
Define fg!
first:
function fg!(x::Vector, grad::Vector)
if length(grad) > 0 # gradient of f(x)
grad[1] = 0
grad[2] = 0.5/sqrt(x[2])
end
return sqrt(x[2]) # f(x)
end
fg! (generic function with 1 method)
and the corresponding optimization problem:
opt = Opt(:LD_SLSQP, 2) # 2 indicates the length of `x`
lower_bounds!(opt, [-Inf, 0.]) # forces `x` to have a non-negative value
min_objective!(opt, fg!) # specifies that optimization problem is on minimization
xtol_rel!(opt,1e-4) # set a lower relative xtol for convergence criteria
Similarly for constraint, where constraint_f(x) <= 0
is imposed for all x
function constraint_f(x::Vector, a, b)
(a*x[1] + b)^3 - x[2] # constraint_f(x); constraint_f(x) <= 0 is imposed
end
function constraint_g!(x::Vector, grad::Vector, a, b)
grad[1] = 3a * (a*x[1] + b)^2
grad[2] = -1
end
function constraint_fg!(x::Vector, grad::Vector, a, b)
if length(grad) > 0 # gradient of constraint_f(x)
constraint_g!(x, grad, a, b)
end
return constraint_f(x, a, b)
end
constraint_fg! (generic function with 1 method)
Here, a
and b
are added to allow variants of constraint_f(x)
in a handy way. For instance, to impose
(2*x[1] + 0)^3 - x[2] <= 0
AND
(-1*x[1] + 1)^3 - x[2] <= 0
one can simply run the following two lines:
inequality_constraint!(opt, (x,g) -> constraint_fg!(x,g,2,0), 1e-8)
inequality_constraint!(opt, (x,g) -> constraint_fg!(x,g,-1,1), 1e-8)
Ready to roll out:
(minf,minx,ret) = @btime optimize($opt, [1.234, 5.678])
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
14.756 μs (242 allocations: 9.03 KiB) got 0.5443310539518157 at [0.333333, 0.296296] after 13 iterations (returned XTOL_REACHED)
It might be desirable to have a vector-valued constraint function to have all constraints at once instead of manually adding each constraint function. This can be done by calling inequality_constraint
with a vectorized tol
parameter that has the same length as a constraint function. For instance, the above two constraints
inequality_constraint!(opt, (x,g) -> constraint_fg!(x,g,2,0), 1e-8)
inequality_constraint!(opt, (x,g) -> constraint_fg!(x,g,-1,1), 1e-8)
can be instead added by:
function fg!(x::Vector, grad::Vector)
if length(grad) > 0 # gradient of f(x)
grad[1] = 0
grad[2] = 0.5/sqrt(x[2])
end
return sqrt(x[2]) # f(x)
end
opt = Opt(:LD_SLSQP, 2) # 2 indicates the length of `x`
lower_bounds!(opt, [-Inf, 0.]) # forces `x` to have a non-negative value
min_objective!(opt, fg!) # specifies that optimization problem is on minimization
xtol_rel!(opt,1e-4) # set a lower relative xtol for convergence criteria
# define a vectorized constraint
function constraints_fg!(result, x, jacobian_t, a, b)
if length(jacobian_t) > 0 # transpose of the Jacobian matrix
jacobian_t[1,1] = 3a[1] * (a[1]*x[1] + b[1])^2
jacobian_t[2,1] = -1
jacobian_t[1,2] = 3a[2] * (a[2]*x[1] + b[2])^2
jacobian_t[2,2] = -1
end
result[:] = [constraint_f(x,a[1],b[1]);
constraint_f(x,a[2],b[2])]
end
# add a vectorized constraint
inequality_constraint!(opt, (result, x, jacobian_t) -> constraints_fg!(result, x, jacobian_t, [2; -1], [0; 1]),
[1e-8; 1e-8])
Note that gradient
field is now replaced by the transpose of a Jacobian matrix as the constraint function is now vector-valued.
Running optimization routine yields the identical solution as above:
(minf,minx,ret) = @btime optimize($opt, [1.234, 5.678])
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
13.473 μs (203 allocations: 10.86 KiB) got 0.5443310539518157 at [0.333333, 0.296296] after 13 iterations (returned XTOL_REACHED)
NLopt
without analytic formulae for gradients¶Suppose that f
you want to optimize is written in a fairly complex form and you do not have an access to the analytic formula of the gradient of f
. Here is a solution using automatic differentiation:
ForwardDiff
¶function f(x)
return -(x[1]^2 + x[2])^2
end
# compute gradient by forward automatic differentiation
function g!(G::Vector, x::Vector)
ForwardDiff.gradient!(G, f, x)
end
function fg!(x::Vector, grad::Vector)
if length(grad) > 0 # gradient of f(x)
g!(grad, x)
end
f(x)
end
fg! (generic function with 1 method)
Solve:
# define the optimization problem
opt = Opt(:LD_LBFGS, 2) # 2 indicates the length of `x`
lower_bounds!(opt, [-1.0, -1.0]) # find `x` above -2.0
upper_bounds!(opt, [2.0, 2.0]) # find `x` below 2.0
min_objective!(opt, fg!) # specifies that optimization problem is on minimization
# solve the optimization problem
(minf,minx,ret) = @btime optimize($opt, [1.0, 1.0])
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
3.536 ms (32 allocations: 1.67 KiB) got -36.0 at [2.0, 2.0] after 3 iterations (returned SUCCESS)
Since g!
requires evaluation of f
as well, it might be redundant to compute both values separately. In case evaluating f
requires a huge amount of computation, one can alternatively use DiffResults
to call the saved values:
function fg!(x::Vector, grad::Vector)
result = DiffResults.GradientResult(x) # gen a result object
result = ForwardDiff.gradient!(result, f, x) # update
grad[:] = DiffResults.gradient(result) # run g!(x)
return DiffResults.value(result) # return f(x)
end
fg! (generic function with 1 method)
# define the optimization problem
opt = Opt(:LD_LBFGS, 2) # 2 indicates the length of `x`
lower_bounds!(opt, [-1.0, -1.0]) # find `x` above -2.0
upper_bounds!(opt, [2.0, 2.0]) # find `x` below 2.0
min_objective!(opt, fg!) # specifies that optimization problem is on minimization
# solve the optimization problem
(minf,minx,ret) = @btime optimize($opt, [1.0, 1.0])
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
3.456 ms (41 allocations: 2.09 KiB) got -36.0 at [2.0, 2.0] after 3 iterations (returned SUCCESS)
Flux
¶function fg!(x::Vector, grad::Vector)
val = f(x)
grad[:] = gradient_(f, x)[1]
return val
end
# define the optimization problem
opt = Opt(:LD_LBFGS, 2) # 2 indicates the length of `x`
lower_bounds!(opt, [-1.0, -1.0]) # find `x` above -2.0
upper_bounds!(opt, [2.0, 2.0]) # find `x` below 2.0
min_objective!(opt, fg!) # specifies that optimization problem is on minimization
# solve the optimization problem
(minf,minx,ret) = @btime optimize($opt, [1.0, 1.0])
numevals = opt.numevals # the number of function evaluations
println("got $minf after $numevals iterations (returned $ret)")
3.481 ms (146 allocations: 5.28 KiB) got -36.0 after 3 iterations (returned SUCCESS)
NLSolversBase
interface¶Add the following definition for NLoptAdapter
to exploit the native support for numerical derivatives and autodifferentiation from NLSolversBase
:
struct NLoptAdapter{T} <: Function where T <: AbstractObjective
nlsolver_base::T
end
# implement fg!; note that the order is reversed
(adapter::NLoptAdapter)(x, df) = adapter.nlsolver_base.fdf(df, x)
(adapter::NLoptAdapter)(result, x, jacobian_transpose) = adapter.nlsolver_base.fdf(result, jacobian_transpose', x)
# constructors
NLoptAdapter(f, x, autodiff = :forward) = NLoptAdapter(OnceDifferentiable(f, x, autodiff = autodiff))
NLoptAdapter(f!, x::Vector, F::Vector, autodiff = :forward) = NLoptAdapter(OnceDifferentiable(f!, x, F, autodiff = autodiff))
NLoptAdapter
Let's roll out:
f_opt = NLoptAdapter(x -> -(x[1]^2 + x[2])^2, zeros(2), :forward)
# define the optimization problem
opt = Opt(:LD_LBFGS, 2) # 2 indicates the length of `x`
lower_bounds!(opt, [-1.0, -1.0]) # find `x` above -2.0
upper_bounds!(opt, [2.0, 2.0]) # find `x` below 2.0
min_objective!(opt, fg!) # specifies that optimization problem is on minimization
# solve the optimization problem
(minf,minx,ret) = @btime optimize($opt, [1.0, 1.0])
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
3.507 ms (146 allocations: 5.28 KiB) got -36.0 at [2.0, 2.0] after 3 iterations (returned SUCCESS)
Compare this with the performance from automatic differentiation example above.
myfunc(x) = sqrt(x[2])
x0 = [1.234, 5.678]
function myconstraint(x, a, b)
(a*x[1] + b)^3 - x[2]
end
# define objective and constraint, using NLoptAdapter
f_opt = NLoptAdapter(myfunc, x0)
c_1_opt = NLoptAdapter(x -> myconstraint(x,2,0), x0)
c_2_opt = NLoptAdapter(x -> myconstraint(x,-1,1), x0)
# define the optimization problem
opt = Opt(:LD_MMA, 2)
lower_bounds!(opt, [-Inf, 0.])
xtol_rel!(opt,1e-4)
min_objective!(opt, f_opt)
inequality_constraint!(opt, c_1_opt, 1e-8)
inequality_constraint!(opt, c_2_opt, 1e-8)
# solve
(minf,minx,ret) = @btime optimize($opt, $x0)
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
100.090 μs (271 allocations: 9.17 KiB) got 0.5443310477213124 at [0.333333, 0.296296] after 11 iterations (returned XTOL_REACHED)
Works for central-difference methods too:
# define objective and constraint, using NLoptAdapter
f_opt = NLoptAdapter(myfunc, x0, :central)
c_1_opt = NLoptAdapter(x -> myconstraint(x,2,0), x0, :central)
c_2_opt = NLoptAdapter(x -> myconstraint(x,-1,1), x0, :central)
# define the optimization problem
opt = Opt(:LD_MMA, 2)
lower_bounds!(opt, [-Inf, 0.])
xtol_rel!(opt,1e-4)
min_objective!(opt, f_opt)
inequality_constraint!(opt, c_1_opt, 1e-8)
inequality_constraint!(opt, c_2_opt, 1e-8)
# solve
(minf,minx,ret) = @btime optimize($opt, $x0)
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
99.769 μs (205 allocations: 7.63 KiB) got 0.5443310476210945 at [0.333333, 0.296296] after 11 iterations (returned XTOL_REACHED)
When vectorized constraints are passed, myconstraints!
should be used for assignment of evaluated constraints, rather than return them.
function myconstraints!(F, x)
F[:] = [myconstraint(x,2,0); myconstraint(x,-1,1)]
end
# define objective and constraint, using NLoptAdapter
f_opt = NLoptAdapter(myfunc, x0, :central)
c_opt = NLoptAdapter(myconstraints!, x0, zeros(2), :central) # 2 is the length of myconstraints
(::NLoptAdapter{OnceDifferentiable{Array{Float64,1},Array{Float64,2},Array{Float64,1}}}) (generic function with 2 methods)
The rest of the procedure is similar:
# define the optimization problem
opt = Opt(:LD_MMA, 2)
lower_bounds!(opt, [-Inf, 0.])
xtol_rel!(opt,1e-4)
min_objective!(opt, f_opt)
inequality_constraint!(opt, c_opt, fill(1e-8, 2))
# solve
(minf,minx,ret) = @btime optimize($opt, $x0)
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
98.806 μs (205 allocations: 11.41 KiB) got 0.5443310476210945 at [0.333333, 0.296296] after 11 iterations (returned XTOL_REACHED)
The wrapper works for derivative-free methods too. First, consider the following vanilla nlopt
implementation:
function myfunc(x::Vector, grad::Vector)
return sqrt(x[1]^2 + x[2]^2)
end
opt = Opt(:LN_NELDERMEAD, 2)
lower_bounds!(opt, [0.0, 0.0])
xtol_rel!(opt,1e-4)
min_objective!(opt, myfunc)
(minf,minx,ret) = optimize(opt, [1.234, 5.678])
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
got 0.0 at [0.0, 0.0] after 11 iterations (returned XTOL_REACHED)
Here's one using NLoptAdapter
:
f_opt = NLoptAdapter(x -> sqrt(x[1]^2 + x[2]^2), x0, :central)
opt = Opt(:LN_NELDERMEAD, 2)
lower_bounds!(opt, [0.0, 0.0])
xtol_rel!(opt,1e-4)
min_objective!(opt, f_opt)
(minf,minx,ret) = optimize(opt, [1.234, 5.678])
numevals = opt.numevals # the number of function evaluations
println("got $minf at $minx after $numevals iterations (returned $ret)")
got 0.0 at [0.0, 0.0] after 11 iterations (returned XTOL_REACHED)
which returns the identical result.