In this lecture we introduce a few of the Julia libraries that we’ve found particularly useful for quantitative work in economics
using InstantiateFromURL
github_project("QuantEcon/quantecon-notebooks-julia", version = "0.3.0")
using LinearAlgebra, Statistics
using ForwardDiff, Flux, Optim, JuMP, Ipopt, BlackBoxOptim, Roots, NLsolve
using LeastSquaresOptim, Flux.Tracker
using Flux.Tracker: update!
using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions
Automatic differentiation (AD, sometimes called algorithmic differentiation) is a crucial way to increase the performance of both estimation and solution methods
There are essentially four ways to calculate the gradient or Jacobian on a computer
Calculation by hand
Finite differences
Symbolic differentiation
Automatic Differentiation
We will explore AD packages in Julia rather than the alternatives
To summarize here, first recall the chain rule (adapted from Wikipedia)
$$ \frac{dy}{dx} = \frac{dy}{dw} \cdot \frac{dw}{dx} $$Consider functions composed of calculations with fundamental operations with known analytical derivatives, such as $ f(x_1, x_2) = x_1 x_2 + \sin(x_1) $
To compute $ \frac{d f(x_1,x_2)}{d x_1} $
$$ \begin{array}{l|l} \text{Operations to compute value} & \text{Operations to compute} \frac{\partial f(x_1,x_2)}{\partial x_1} \\ \hline w_1 = x_1 & \frac{d w_1}{d x_1} = 1 \text{ (seed)}\\ w_2 = x_2 & \frac{d w_2}{d x_1} = 0 \text{ (seed)} \\ w_3 = w_1 \cdot w_2 & \frac{\partial w_3}{\partial x_1} = w_2 \cdot \frac{d w_1}{d x_1} + w_1 \cdot \frac{d w_2}{d x_1} \\ w_4 = \sin w_1 & \frac{d w_4}{d x_1} = \cos w_1 \cdot \frac{d w_1}{d x_1} \\ w_5 = w_3 + w_4 & \frac{\partial w_5}{\partial x_1} = \frac{\partial w_3}{\partial x_1} + \frac{d w_4}{d x_1} \end{array} $$One way to implement this (used in forward-mode AD) is to use dual numbers
Take a number $ x $ and augment it with an infinitesimal $ \epsilon $ such that $ \epsilon^2 = 0 $, i.e. $ x \to x + x' \epsilon $
All math is then done with this (mathematical, rather than Julia) tuple $ (x, x') $ where the $ x' $ may be hidden from the user
With this definition, we can write a general rule for differentiation of $ g(x,y) $ as
$$ g \big( \left(x,x'\right),\left(y,y'\right) \big) = \left(g(x,y),\partial_x g(x,y)x' + \partial_y g(x,y)y' \right) $$This calculation is simply the chain rule for the total derivative
An AD library using dual numbers concurrently calculates the function and its derivatives, repeating the chain rule until it hits a set of intrinsic rules such as
$$ \begin{align*} x + y \to \left(x,x'\right) + \left(y,y'\right) &= \left(x + y,\underbrace{x' + y'}_{\partial(x + y) = \partial x + \partial y}\right)\\ x y \to \left(x,x'\right) \times \left(y,y'\right) &= \left(x y,\underbrace{x'y + y'x}_{\partial(x y) = y \partial x + x \partial y}\right)\\ \exp(x) \to \exp(\left(x, x'\right)) &= \left(\exp(x),\underbrace{x'\exp(x)}_{\partial(\exp(x)) = \exp(x)\partial x} \right) \end{align*} $$We have already seen one of the AD packages in Julia
using ForwardDiff
h(x) = sin(x[1]) + x[1] * x[2] + sinh(x[1] * x[2]) # multivariate.
x = [1.4 2.2]
@show ForwardDiff.gradient(h,x) # use AD, seeds from x
#Or, can use complicated functions of many variables
f(x) = sum(sin, x) + prod(tan, x) * sum(sqrt, x)
g = (x) -> ForwardDiff.gradient(f, x); # g() is now the gradient
@show g(rand(20)); # gradient at a random point
# ForwardDiff.hessian(f,x') # or the hessian
ForwardDiff.gradient(h, x) = [26.354764961030977 16.663053156992284] g(rand(20)) = [0.8364799440001933, 0.8653634136413618, 0.9667548578232391, 0.8805083699745054, 0.9837916596947623, 0.9084678938547219, 0.8935302157393694, 0.8583646795753902, 0.9676300259774202, 0.8773990806452813, 0.9503798679339673, 0.9542441987684414, 0.8905833436093621, 0.9096513523703488, 0.7443813860489745, 0.7088964693063575, 0.995926189152923, 0.8335909032780905, 0.9832464202440931, 0.9956101188044758]
We can even auto-differentiate complicated functions with embedded iterations
function squareroot(x) #pretending we don't know sqrt()
z = copy(x) # Initial starting point for Newton’s method
while abs(z*z - x) > 1e-13
z = z - (z*z-x)/(2z)
end
return z
end
squareroot(2.0)
1.4142135623730951
using ForwardDiff
dsqrt(x) = ForwardDiff.derivative(squareroot, x)
dsqrt(2.0)
0.35355339059327373
using Flux
using Flux.Tracker
using Flux.Tracker: update!
f(x) = 3x^2 + 2x + 1
# df/dx = 6x + 2
df(x) = Tracker.gradient(f, x)[1]
df(2) # 14.0 (tracked)
14.0 (tracked)
A = rand(2,2)
f(x) = A * x
x0 = [0.1, 2.0]
f(x0)
Flux.jacobian(f, x0)
2×2 Array{Float64,2}: 0.916489 0.923537 0.759557 0.532211
As before, we can differentiate complicated functions
dsquareroot(x) = Tracker.gradient(squareroot, x)
dsquareroot (generic function with 1 method)
From the documentation, we can use a machine learning approach to a linear regression
W = rand(2, 5)
b = rand(2)
predict(x) = W*x .+ b
function loss(x, y)
ŷ = predict(x)
sum((y .- ŷ).^2)
end
x, y = rand(5), rand(2) # Dummy data
loss(x, y) # ~ 3
3.912093769282702
W = param(W)
b = param(b)
gs = Tracker.gradient(() -> loss(x, y), Params([W, b]))
Δ = gs[W]
# Update the parameter and reset the gradient
update!(W, -0.1Δ)
loss(x, y) # ~ 2.5
2.0892843208989103 (tracked)
There are a large number of packages intended to be used for optimization in Julia
Part of the reason for the diversity of options is that Julia makes it possible to efficiently implement a large number of variations on optimization routines
The other reason is that different types of optimization problems require different algorithms
Univariate optimization defaults to a robust hybrid optimization routine called Brent’s method
using Optim
using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions
result = optimize(x-> x^2, -2.0, 1.0)
Results of Optimization Algorithm * Algorithm: Brent's Method * Search Interval: [-2.000000, 1.000000] * Minimizer: 0.000000e+00 * Minimum: 0.000000e+00 * Iterations: 5 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true * Objective Function Calls: 6
Always check if the results converged, and throw errors otherwise
converged(result) || error("Failed to converge in $(iterations(result)) iterations")
xmin = result.minimizer
result.minimum
0.0
The first line is a logical OR between converged(result)
and error("...")
If the convergence check passes, the logical sentence is true, and it will proceed to the next line; if not, it will throw the error
Or to maximize
f(x) = -x^2
result = maximize(f, -2.0, 1.0)
converged(result) || error("Failed to converge in $(iterations(result)) iterations")
xmin = maximizer(result)
fmax = maximum(result)
-0.0
Note: Notice that we call optimize
results using result.minimizer
, and maximize
results using maximizer(result)
There are a variety of algorithms and options for multivariate optimization
From the documentation, the simplest version is
f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv) # i.e. optimize(f, x_iv, NelderMead())
* Status: success * Candidate solution Minimizer: [1.00e+00, 1.00e+00] Minimum: 3.525527e-09 * Found with Algorithm: Nelder-Mead Initial Point: [0.00e+00, 0.00e+00] * Convergence measures √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08 * Work counters Iterations: 60 f(x) calls: 118
The default algorithm in NelderMead
, which is derivative-free and hence requires many function evaluations
To change the algorithm type to L-BFGS
results = optimize(f, x_iv, LBFGS())
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in "*
"$(results.iterations) iterations")
minimum = 5.3784046148998115e-17 with argmin = [0.9999999926662393, 0.9999999853324786] in 24 iterations
Note that this has fewer iterations
As no derivative was given, it used finite differences to approximate the gradient of f(x)
However, since most of the algorithms require derivatives, you will often want to use auto differentiation or pass analytical gradients if possible
f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv, LBFGS(), autodiff=:forward) # i.e. use ForwardDiff.jl
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in "*
"$(results.iterations) iterations")
minimum = 5.191703158437428e-27 with argmin = [0.999999999999928, 0.9999999999998559] in 24 iterations
Note that we did not need to use ForwardDiff.jl
directly, as long as our f(x)
function was written to be generic (see the generic programming lecture )
Alternatively, with an analytical gradient
f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
function g!(G, x)
G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
G[2] = 200.0 * (x[2] - x[1]^2)
end
results = optimize(f, g!, x0, LBFGS()) # or ConjugateGradient()
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in "*
"$(results.iterations) iterations")
minimum = 4.474575116368833e-22 with argmin = [0.9999999999788486, 0.9999999999577246] in 21 iterations
For derivative-free methods, you can change the algorithm – and have no need to provide a gradient
f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv, SimulatedAnnealing()) # or ParticleSwarm() or NelderMead()
* Status: failure (reached maximum number of iterations) (line search failed) * Candidate solution Minimizer: [1.04e+00, 1.10e+00] Minimum: 2.896801e-02 * Found with Algorithm: Simulated Annealing Initial Point: [0.00e+00, 0.00e+00] * Convergence measures |x - x'| = NaN ≰ 0.0e+00 |x - x'|/|x'| = NaN ≰ 0.0e+00 |f(x) - f(x')| = NaN ≰ 0.0e+00 |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00 |g(x)| = NaN ≰ 1.0e-08 * Work counters Iterations: 1000 f(x) calls: 1001
However, you will note that this did not converge, as stochastic methods typically require many more iterations as a tradeoff for their global-convergence properties
See the maximum likelihood example and the accompanying Jupyter notebook
The JuMP.jl package is an ambitious implementation of a modelling language for optimization problems in Julia
In that sense, it is more like an AMPL (or Pyomo) built on top of the Julia language with macros, and able to use a variety of different commerical and open source solvers
If you have a linear, quadratic, conic, mixed-integer linear, etc. problem then this will likely be the ideal “meta-package” for calling various solvers
For nonlinear problems, the modelling language may make things difficult for complicated functions (as it is not designed to be used as a general-purpose nonlinear optimizer)
See the quick start guide for more details on all of the options
The following is an example of calling a linear objective with a nonlinear constraint (provided by an external function)
Here Ipopt
stands for Interior Point OPTimizer
, a nonlinear solver in Julia
using JuMP, Ipopt
# solve
# max( x[1] + x[2] )
# st sqrt(x[1]^2 + x[2]^2) <= 1
function squareroot(x) # pretending we don't know sqrt()
z = x # Initial starting point for Newton’s method
while abs(z*z - x) > 1e-13
z = z - (z*z-x)/(2z)
end
return z
end
m = Model(with_optimizer(Ipopt.Optimizer))
# need to register user defined functions for AD
JuMP.register(m,:squareroot, 1, squareroot, autodiff=true)
@variable(m, x[1:2], start=0.5) # start is the initial condition
@objective(m, Max, sum(x))
@NLconstraint(m, squareroot(x[1]^2+x[2]^2) <= 1)
@show JuMP.optimize!(m)
****************************************************************************** This program contains Ipopt, a library for large-scale nonlinear optimization. Ipopt is released as open source code under the Eclipse Public License (EPL). For more information visit http://projects.coin-or.org/Ipopt ****************************************************************************** This is Ipopt version 3.12.10, running with linear solver mumps. NOTE: Other linear solvers might be more efficient (see Ipopt documentation). Number of nonzeros in equality constraint Jacobian...: 0 Number of nonzeros in inequality constraint Jacobian.: 2 Number of nonzeros in Lagrangian Hessian.............: 3 Total number of variables............................: 2 variables with only lower bounds: 0 variables with lower and upper bounds: 0 variables with only upper bounds: 0 Total number of equality constraints.................: 0 Total number of inequality constraints...............: 1 inequality constraints with only lower bounds: 0 inequality constraints with lower and upper bounds: 0 inequality constraints with only upper bounds: 1 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 0 -1.0000000e+00 0.00e+00 2.07e-01 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0 1 -1.4100714e+00 0.00e+00 5.48e-02 -1.7 3.94e-01 - 1.00e+00 7.36e-01f 1 2 -1.4113851e+00 0.00e+00 2.83e-08 -2.5 9.29e-04 - 1.00e+00 1.00e+00f 1 3 -1.4140632e+00 0.00e+00 1.50e-09 -3.8 1.89e-03 - 1.00e+00 1.00e+00f 1 4 -1.4142117e+00 0.00e+00 1.84e-11 -5.7 1.05e-04 - 1.00e+00 1.00e+00f 1 5 -1.4142136e+00 0.00e+00 8.23e-09 -8.6 1.30e-06 - 1.00e+00 1.00e+00f 1 Number of Iterations....: 5 (scaled) (unscaled) Objective...............: -1.4142135740093271e+00 -1.4142135740093271e+00 Dual infeasibility......: 8.2280586788385790e-09 8.2280586788385790e-09 Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00 Complementarity.........: 2.5059035815063646e-09 2.5059035815063646e-09 Overall NLP error.......: 8.2280586788385790e-09 8.2280586788385790e-09 Number of objective function evaluations = 6 Number of objective gradient evaluations = 6 Number of equality constraint evaluations = 0 Number of inequality constraint evaluations = 6 Number of equality constraint Jacobian evaluations = 0 Number of inequality constraint Jacobian evaluations = 6 Number of Lagrangian Hessian evaluations = 5 Total CPU secs in IPOPT (w/o function evaluations) = 2.012 Total CPU secs in NLP function evaluations = 1.522 EXIT: Optimal Solution Found. JuMP.optimize!(m) = nothing
And this is an example of a quadratic objective
# solve
# min (1-x)^2 + 100(y-x^2)^2)
# st x + y >= 10
using JuMP,Ipopt
m = Model(with_optimizer(Ipopt.Optimizer)) # settings for the solver
@variable(m, x, start = 0.0)
@variable(m, y, start = 0.0)
@NLobjective(m, Min, (1-x)^2 + 100(y-x^2)^2)
JuMP.optimize!(m)
println("x = ", value(x), " y = ", value(y))
# adding a (linear) constraint
@constraint(m, x + y == 10)
JuMP.optimize!(m)
println("x = ", value(x), " y = ", value(y))
This is Ipopt version 3.12.10, running with linear solver mumps. NOTE: Other linear solvers might be more efficient (see Ipopt documentation). Number of nonzeros in equality constraint Jacobian...: 0 Number of nonzeros in inequality constraint Jacobian.: 0 Number of nonzeros in Lagrangian Hessian.............: 3 Total number of variables............................: 2 variables with only lower bounds: 0 variables with lower and upper bounds: 0 variables with only upper bounds: 0 Total number of equality constraints.................: 0 Total number of inequality constraints...............: 0 inequality constraints with only lower bounds: 0 inequality constraints with lower and upper bounds: 0 inequality constraints with only upper bounds: 0 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 0 1.0000000e+00 0.00e+00 2.00e+00 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0 1 9.5312500e-01 0.00e+00 1.25e+01 -1.0 1.00e+00 - 1.00e+00 2.50e-01f 3 2 4.8320569e-01 0.00e+00 1.01e+00 -1.0 9.03e-02 - 1.00e+00 1.00e+00f 1 3 4.5708829e-01 0.00e+00 9.53e+00 -1.0 4.29e-01 - 1.00e+00 5.00e-01f 2 4 1.8894205e-01 0.00e+00 4.15e-01 -1.0 9.51e-02 - 1.00e+00 1.00e+00f 1 5 1.3918726e-01 0.00e+00 6.51e+00 -1.7 3.49e-01 - 1.00e+00 5.00e-01f 2 6 5.4940990e-02 0.00e+00 4.51e-01 -1.7 9.29e-02 - 1.00e+00 1.00e+00f 1 7 2.9144630e-02 0.00e+00 2.27e+00 -1.7 2.49e-01 - 1.00e+00 5.00e-01f 2 8 9.8586451e-03 0.00e+00 1.15e+00 -1.7 1.10e-01 - 1.00e+00 1.00e+00f 1 9 2.3237475e-03 0.00e+00 1.00e+00 -1.7 1.00e-01 - 1.00e+00 1.00e+00f 1 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 10 2.3797236e-04 0.00e+00 2.19e-01 -1.7 5.09e-02 - 1.00e+00 1.00e+00f 1 11 4.9267371e-06 0.00e+00 5.95e-02 -1.7 2.53e-02 - 1.00e+00 1.00e+00f 1 12 2.8189505e-09 0.00e+00 8.31e-04 -2.5 3.20e-03 - 1.00e+00 1.00e+00f 1 13 1.0095040e-15 0.00e+00 8.68e-07 -5.7 9.78e-05 - 1.00e+00 1.00e+00f 1 14 1.3288608e-28 0.00e+00 2.02e-13 -8.6 4.65e-08 - 1.00e+00 1.00e+00f 1 Number of Iterations....: 14 (scaled) (unscaled) Objective...............: 1.3288608467480825e-28 1.3288608467480825e-28 Dual infeasibility......: 2.0183854587685121e-13 2.0183854587685121e-13 Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00 Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00 Overall NLP error.......: 2.0183854587685121e-13 2.0183854587685121e-13 Number of objective function evaluations = 36 Number of objective gradient evaluations = 15 Number of equality constraint evaluations = 0 Number of inequality constraint evaluations = 0 Number of equality constraint Jacobian evaluations = 0 Number of inequality constraint Jacobian evaluations = 0 Number of Lagrangian Hessian evaluations = 14 Total CPU secs in IPOPT (w/o function evaluations) = 0.091 Total CPU secs in NLP function evaluations = 0.011 EXIT: Optimal Solution Found. x = 0.9999999999999899 y = 0.9999999999999792 This is Ipopt version 3.12.10, running with linear solver mumps. NOTE: Other linear solvers might be more efficient (see Ipopt documentation). Number of nonzeros in equality constraint Jacobian...: 2 Number of nonzeros in inequality constraint Jacobian.: 0 Number of nonzeros in Lagrangian Hessian.............: 3 Total number of variables............................: 2 variables with only lower bounds: 0 variables with lower and upper bounds: 0 variables with only upper bounds: 0 Total number of equality constraints.................: 1 Total number of inequality constraints...............: 0 inequality constraints with only lower bounds: 0 inequality constraints with lower and upper bounds: 0 inequality constraints with only upper bounds: 0 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 0 1.0000000e+00 1.00e+01 1.00e+00 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0 1 9.6315968e+05 0.00e+00 3.89e+05 -1.0 9.91e+00 - 1.00e+00 1.00e+00h 1 2 1.6901461e+05 0.00e+00 1.16e+05 -1.0 3.24e+00 - 1.00e+00 1.00e+00f 1 3 2.5433173e+04 1.78e-15 3.18e+04 -1.0 2.05e+00 - 1.00e+00 1.00e+00f 1 4 2.6527756e+03 0.00e+00 7.79e+03 -1.0 1.19e+00 - 1.00e+00 1.00e+00f 1 5 1.1380324e+02 0.00e+00 1.35e+03 -1.0 5.62e-01 - 1.00e+00 1.00e+00f 1 6 3.3745506e+00 0.00e+00 8.45e+01 -1.0 1.50e-01 - 1.00e+00 1.00e+00f 1 7 2.8946196e+00 0.00e+00 4.22e-01 -1.0 1.07e-02 - 1.00e+00 1.00e+00f 1 8 2.8946076e+00 0.00e+00 1.07e-05 -1.7 5.42e-05 - 1.00e+00 1.00e+00f 1 9 2.8946076e+00 0.00e+00 5.91e-13 -8.6 1.38e-09 - 1.00e+00 1.00e+00f 1 Number of Iterations....: 9 (scaled) (unscaled) Objective...............: 2.8946075504894599e+00 2.8946075504894599e+00 Dual infeasibility......: 5.9130478291535837e-13 5.9130478291535837e-13 Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00 Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00 Overall NLP error.......: 5.9130478291535837e-13 5.9130478291535837e-13 Number of objective function evaluations = 10 Number of objective gradient evaluations = 10 Number of equality constraint evaluations = 10 Number of inequality constraint evaluations = 0 Number of equality constraint Jacobian evaluations = 1 Number of inequality constraint Jacobian evaluations = 0 Number of Lagrangian Hessian evaluations = 9 Total CPU secs in IPOPT (w/o function evaluations) = 0.002 Total CPU secs in NLP function evaluations = 0.000 EXIT: Optimal Solution Found. x = 2.701147124098218 y = 7.2988528759017814
Another package for doing global optimization without derivatives is BlackBoxOptim.jl
To see an example from the documentation
using BlackBoxOptim
function rosenbrock2d(x)
return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
results = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2);
Starting optimization with optimizer DiffEvoOpt{FitPopulation{Float64},RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},RandomBound{ContinuousRectSearchSpace}} 0.00 secs, 0 evals, 0 steps Optimization stopped after 10001 steps and 0.07 seconds Termination reason: Max number of steps (10000) reached Steps per second = 144525.53 Function evals per second = 147141.18 Improvements/step = 0.19950 Total function evaluations = 10182 Best candidate found: [1.0, 1.0] Fitness: 0.000000000
An example for parallel execution of the objective is provided
A root of a real function $ f $ on $ [a,b] $ is an $ x \in [a, b] $ such that $ f(x)=0 $
For example, if we plot the function
$$ f(x) = \sin(4 (x - 1/4)) + x + x^{20} - 1 \tag{1} $$
with $ x \in [0,1] $ we get
The unique root is approximately 0.408
The Roots.jl package offers fzero()
to find roots
using Roots
f(x) = sin(4 * (x - 1/4)) + x + x^20 - 1
fzero(f, 0, 1)
0.40829350427936706
The NLsolve.jl package provides functions to solve for multivariate systems of equations and fixed points
From the documentation, to solve for a system of equations without providing a Jacobian
using NLsolve
f(x) = [(x[1]+3)*(x[2]^3-7)+18
sin(x[2]*exp(x[1])-1)] # returns an array
results = nlsolve(f, [ 0.1; 1.2])
Results of Nonlinear Solver Algorithm * Algorithm: Trust-region with dogleg and autoscaling * Starting Point: [0.1, 1.2] * Zero: [-7.775508345910301e-17, 0.9999999999999999] * Inf-norm of residuals: 0.000000 * Iterations: 4 * Convergence: true * |x - x'| < 0.0e+00: false * |f(x)| < 1.0e-08: true * Function Calls (f): 5 * Jacobian Calls (df/dx): 5
In the above case, the algorithm used finite differences to calculate the Jacobian
Alternatively, if f(x)
is written generically, you can use auto-differentiation with a single setting
results = nlsolve(f, [ 0.1; 1.2], autodiff=:forward)
println("converged=$(NLsolve.converged(results)) at root=$(results.zero) in "*
"$(results.iterations) iterations and $(results.f_calls) function calls")
converged=true at root=[-3.487552479724522e-16, 1.0000000000000002] in 4 iterations and 5 function calls
Providing a function which operates inplace (i.e. modifies an argument) may help performance for large systems of equations (and hurt it for small ones)
function f!(F, x) # modifies the first argument
F[1] = (x[1]+3)*(x[2]^3-7)+18
F[2] = sin(x[2]*exp(x[1])-1)
end
results = nlsolve(f!, [ 0.1; 1.2], autodiff=:forward)
println("converged=$(NLsolve.converged(results)) at root=$(results.zero) in "*
"$(results.iterations) iterations and $(results.f_calls) function calls")
converged=true at root=[-3.487552479724522e-16, 1.0000000000000002] in 4 iterations and 5 function calls
Many optimization problems can be solved using linear or nonlinear least squares
Let $ x \in R^N $ and $ F(x) : R^N \to R^M $ with $ M \geq N $, then the nonlinear least squares problem is
$$ \min_x F(x)^T F(x) $$While $ F(x)^T F(x) \to R $, and hence this problem could technically use any nonlinear optimizer, it is useful to exploit the structure of the problem
In particular, the Jacobian of $ F(x) $, can be used to approximate the Hessian of the objective
As with most nonlinear optimization problems, the benefits will typically become evident only when analytical or automatic differentiation is possible
If $ M = N $ and we know a root $ F(x^*) = 0 $ to the system of equations exists, then NLS is the defacto method for solving large systems of equations
An implementation of NLS is given in LeastSquaresOptim.jl
From the documentation
using LeastSquaresOptim
function rosenbrock(x)
[1 - x[1], 100 * (x[2]-x[1]^2)]
end
LeastSquaresOptim.optimize(rosenbrock, zeros(2), Dogleg())
Results of Optimization Algorithm * Algorithm: Dogleg * Minimizer: [1.0,1.0] * Sum of squares at Minimum: 0.000000 * Iterations: 51 * Convergence: true * |x - x'| < 1.0e-08: false * |f(x) - f(x')| / |f(x)| < 1.0e-08: true * |g(x)| < 1.0e-08: false * Function Calls: 52 * Gradient Calls: 36 * Multiplication Calls: 159
Note: Because there is a name clash between Optim.jl
and this package, to use both we need to qualify the use of the optimize
function (i.e. LeastSquaresOptim.optimize
)
Here, by default it will use AD with ForwardDiff.jl
to calculate the Jacobian,
but you could also provide your own calculation of the Jacobian (analytical or using finite differences) and/or calculate the function inplace
function rosenbrock_f!(out, x)
out[1] = 1 - x[1]
out[2] = 100 * (x[2]-x[1]^2)
end
LeastSquaresOptim.optimize!(LeastSquaresProblem(x = zeros(2),
f! = rosenbrock_f!, output_length = 2))
# if you want to use gradient
function rosenbrock_g!(J, x)
J[1, 1] = -1
J[1, 2] = 0
J[2, 1] = -200 * x[1]
J[2, 2] = 100
end
LeastSquaresOptim.optimize!(LeastSquaresProblem(x = zeros(2),
f! = rosenbrock_f!, g! = rosenbrock_g!, output_length = 2))
Results of Optimization Algorithm * Algorithm: Dogleg * Minimizer: [1.0,1.0] * Sum of squares at Minimum: 0.000000 * Iterations: 51 * Convergence: true * |x - x'| < 1.0e-08: false * |f(x) - f(x')| / |f(x)| < 1.0e-08: true * |g(x)| < 1.0e-08: false * Function Calls: 52 * Gradient Calls: 36 * Multiplication Calls: 159
Watch this video from one of Julia’s creators on automatic differentiation