Notebook

How to read this lecture…

If using QuantEcon lectures for the first time on a computer, execute ] add InstantiateFromURL inside of a notebook or the REPL
For some notebooks, enable content with “Trust” on the command tab of Jupyter lab
Code should execute sequentially if run in a Jupyter notebook
Please direct feedback to contact@quantecon.org or discourse forum

Solvers, Optimizers, and Automatic Differentiation¶

Contents¶

Solvers, Optimizers, and Automatic Differentiation

Overview¶

In this lecture we introduce a few of the Julia libraries that we’ve found particularly useful for quantitative work in economics

Setup¶

In [1]:

using InstantiateFromURL
activate_github("QuantEcon/QuantEconLectureAllPackages", tag = "v0.9.0") # activate the QuantEcon environment

using LinearAlgebra, Statistics, Compat # load common packages

Introduction to Automatic Differentiation¶

Automatic differentiation (sometimes called algorithmic differentiation) is a crucial way to increase the performance of both estimation and solution methods

There are essentially three four ways to calculate the gradient or jacobian on a computer

Calculation by hand
```
Where possible, you can calculate the derivative on “pen-and-paper” and potentially simplify the expression
```
Sometimes, though not always, the most accurate and fastest option if there are algebraic simplifications

The algebra is error prone for non-trivial setups-
Finite differences
```
Evaluate the function at least N times to get the gradient, jacobians are even worse
```
Large is numerically stable but inaccurate, too small of is numerically unstable but more accurate

Avoid if you can, and use packages (e.g. DiffEqDiffTools.jl ) to get a good choice of -
- $ \Delta $$ \Delta $
- https://github.com/JuliaDiffEq/DiffEqDiffTools.jl$ \Delta $

$$ \partial_{x_i}f(x_1,\ldots x_N) \approx \frac{f(x_1,\ldots x_i + \Delta,\ldots x_N) - f(x_1,\ldots x_i,\ldots x_N)}{\Delta} $$

Symbolic differentiation
```
If you put in an expression for a function, some packages will do symbolic differentiation
```
In effect, repeated applications of the chain-rule, produce-rule, etc.

Sometimes a good solution, if the package can handle your functions-

Automatic Differentiation

Essentially the same as symbolic differentiation, just occurring at a different time in the compilation process

Equivalent to analytical derivatives since it uses the chain-rule, etc.-

We will explore AD packages in Julia rather than the alternatives

Automatic Differentiation¶

To summarize here, first recall the chain rule (adapted from Wikipedia)

$$ \frac{dy}{dx} = \frac{dy}{dw} \frac{dw}{dx} $$

Consider functions composed of calculations with fundamental operations with known analytical derivatives, such as $ f(x_1, x_2) = x_1 x_2 + \sin(x_1) $

To compute $ \frac{d f(x_1,x_2)}{d x_1} $

$$ \begin{array}{l|l} \text{Operations to compute value} & \text{Operations to compute $\frac{d f(x_1,x_2)}{d x_1}$} \\ \hline w_1 = x_1 & \frac{d w_1}{d x_1} = 1 \text{ (seed)}\\ w_2 = x_2 & \frac{d w_2}{d x_1} = 0 \text{ (seed)} \\ w_3 = w_1 \cdot w_2 & \frac{d w_3}{d x_1} = w_2 \cdot \frac{d w_1}{d x_1} + w_1 \cdot \frac{d w_2}{d x_1} \\ w_4 = \sin w_1 & \frac{d w_4}{d x_1} = \cos w_1 \cdot \frac{d w_1}{d x_1} \\ w_5 = w_3 + w_4 & \frac{d w_5}{d x_1} = \frac{d w_3}{d x_1} + \frac{d w_4}{d x_1} \end{array} $$

Using Dual Numbers¶

One way to implement this (used in Forward-mode AD) is to use Dual Numbers

Take a number $ x $ and augment it with an infinitesimal $ \epsilon $ such that $ \epsilon^2 = 0 $, i.e. $ x \to x + x' \epsilon $

All math is then done with this (mathematical, rather than Julia) tuple $ (x, x') $ where the $ x' $ may be hidden from the user

With these definition, we can write a general rule for differentiation of $ g(x,y) $ as

$$ g(\left(x,x'\right),\left(y,y'\right)) = \left(g(x,y),\partial_x g(x,y)x' + \partial_y g(x,y)y' \right) $$

This calculation is simply the chain rule for the total derivative

An AD library using dual numbers will concurrently calculate the function and its derivatives, repeating the chain rule until it hits a set of intrinsic rules such as

$$ \begin{align*} x + y \to \left(x,x'\right) + \left(y,y'\right) &= \left(x + y,\underbrace{x' + y'}_{\partial(x + y) = \partial x + \partial y}\right)\\ x y \to \left(x,x'\right) \times \left(y,y'\right) &= \left(x y,\underbrace{x'y + y'x}_{\partial(x y) = y \partial x + x \partial y y}\right)\\ \exp(x) \to \exp(\left(x, x'\right)) &= \left(\exp(x),\underbrace{x'\exp(x)}_{\partial(\exp(x)) = \exp(x)\partial x} \right) \end{align*} $$

ForwardDiff.jl¶

We have already seen one of the AD packages in Julia

In [2]:

using ForwardDiff
h(x) = sin(x[1]) + x[1] * x[2] + sinh(x[1] * x[2]) # multivariate.
x = [1.4 2.2]
@show ForwardDiff.gradient(h,x) # use AD, seeds from x

#Or, can use complicated functions of many variables
f(x) = sum(sin, x) + prod(tan, x) * sum(sqrt, x)
g = (x) -> ForwardDiff.gradient(f, x); # g() is now the gradient
@show g(rand(20)); # gradient at a random point
# ForwardDiff.hessian(f,x2) # or the hessian

┌ Info: Recompiling stale cache file /Users/arnavsood/.julia/compiled/v1.0/ForwardDiff/k0ETY.ji for ForwardDiff [f6369f11-7733-5829-9624-2563aa707210]
└ @ Base loading.jl:1184

ForwardDiff.gradient(h, x) = [26.3548 16.6631]
g(rand(20)) = [0.93898, 0.893705, 0.917452, 0.87998, 1.0, 0.996822, 0.5439, 0.734483, 0.896878, 0.898396, 0.704589, 0.931328, 0.897029, 0.967132, 0.981794, 0.841244, 0.916563, 0.982085, 0.85773, 0.999822]

We can even auto-differenitate complicated functions with embedded iterations

In [3]:

function squareroot(x) #pretending we don't know sqrt()
    z = copy(x) # Initial starting point for Newton’s method
    while abs(z*z - x) > 1e-13
        z = z - (z*z-x)/(2z)
    end
    return z
end
sqrt(2.0)

Out[3]:

1.4142135623730951

In [4]:

using ForwardDiff
dsqrt(x) = ForwardDiff.derivative(squareroot, x)
dsqrt(2.0)

Out[4]:

0.35355339059327373

Flux.jl¶

Another is Flux.jl, which is a machine-learning library in Julia

AD is one of the main reasons that machine learning has become so powerful in recent years, and is an essential component of any ML package

In [5]:

using Flux
using Flux.Tracker
using Flux.Tracker: update!

f(x) = 3x^2 + 2x + 1

# df/dx = 6x + 2
df(x) = Tracker.gradient(f, x)[1]

df(2) # 14.0 (tracked)

┌ Info: Recompiling stale cache file /Users/arnavsood/.julia/compiled/v1.0/Flux/QdkVy.ji for Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
└ @ Base loading.jl:1184

Out[5]:

14.0 (tracked)

In [6]:

A = rand(2,2)
f(x) = A * x
x0 = [0.1, 2.0]
f(x0)
Flux.jacobian(f, x0)

Out[6]:

2×2 Adjoint{Float64,Array{Float64,2}}:
 0.697753  0.879331 
 0.828623  0.0438626

As before, we can differentiate complicated functions

In [7]:

dsquareroot(x) = Tracker.gradient(squareroot, x)

Out[7]:

dsquareroot (generic function with 1 method)

From the documentation, we can do a machine-learning approach to a linear regression

In [8]:

W = rand(2, 5)
b = rand(2)

predict(x) = W*x .+ b

function loss(x, y)
ŷ = predict(x)
sum((y .- ŷ).^2)
end

x, y = rand(5), rand(2) # Dummy data
loss(x, y) # ~ 3

Out[8]:

1.449798213832934

In [9]:

W = param(W)
b = param(b)

gs = Tracker.gradient(() -> loss(x, y), Params([W, b]))

Δ = gs[W]

# Update the parameter and reset the gradient
update!(W, -0.1Δ)

loss(x, y) # ~ 2.5

Out[9]:

1.0301487987564637 (tracked)

Optimization¶

There are a large number of packages intended to be used for optimization in Julia

Part of the reason for the diversity of options is that Julia makes it possible to efficiently implement a large number of variations on optimization routines

The other reason is that different types of optimization problems require different algorithms

TODO: need to describe different types of optimization problems and suggested packages at some point

Optim.jl¶

A good pure-Julia solution for the (unconstrained or box-bounded) optimization of univariate and multivariate function is Optim package

By default, the algorithms in Optim target minimization rather than maximization, so if a function is called “optimize” it will mean minimization

Univariate Functions on Bounded Intervals¶

Univariate optimization defaults to a robust hybrid optimization routine called Brent’s method

In [10]:

using Optim
using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions

result = optimize(x-> x^2, -2.0, 1.0)

┌ Info: Recompiling stale cache file /Users/arnavsood/.julia/compiled/v1.0/Optim/R5uoh.ji for Optim [429524aa-4258-5aef-a3af-852621145aeb]
└ @ Base loading.jl:1184

Out[10]:

Results of Optimization Algorithm
 * Algorithm: Brent's Method
 * Search Interval: [-2.000000, 1.000000]
 * Minimizer: 0.000000e+00
 * Minimum: 0.000000e+00
 * Iterations: 5
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 6

Always check if the results converged, and throw errors otherwise

In [11]:

converged(result) || error("Failed to converge in $(iterations(result)) iterations")
xmin = result.minimizer
result.minimum

Out[11]:

0.0

Or to maximize

In [12]:

f(x) = -x^2
result = maximize(f, -2.0, 1.0)
converged(result) || error("Failed to converge in $(iterations(result)) iterations")
xmin = maximizer(result)
fmax = maximum(result)

Out[12]:

-0.0

Note: There are a few inconsistencies with extracting optimize vs. maximize results, as shown above

Unconstrained Multivariate Optimization¶

There are a variety of algorithms and options for multivariate optimization

From the documentation, the simplest version is

In [13]:

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv) # i.e. optimize(f, x_iv, NelderMead())

Out[13]:

Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999634355313174,0.9999315506115275]
 * Minimum: 3.525527e-09
 * Iterations: 60
 * Convergence: true
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 117

The default algorithm in NelderMead, which is derivative-free and hence requires many function evaluations

To change the algorithm type to L-BFGS

In [14]:

results = optimize(f, x_iv, LBFGS())
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in $(results.iterations) iterations")

minimum = 5.378388330692143e-17 with argmin = [1.0, 1.0] in 24 iterations

Note that this has fewer iterations

As no derivative was given, it used finite-differences to approximate the gradient of f(x)

However, since most of the algorithms require derivatives, you will often want to use auto differentiation or pass analytical gradients if possible

In [15]:

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv, LBFGS(), autodiff=:forward) # i.e. use ForwardDiff.jl
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in $(results.iterations) iterations")

minimum = 5.191703158437428e-27 with argmin = [1.0, 1.0] in 24 iterations

Note that we did not need to use ForwardDiff.jl directly, as long as our f(x) function was written to be generic (see the tips and trick )

Alternatively, with an analytical gradient

In [16]:

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
function g!(G, x)
    G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
    G[2] = 200.0 * (x[2] - x[1]^2)
end

results = optimize(f, g!, x0, LBFGS()) # or ConjugateGradient()
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in $(results.iterations) iterations")

minimum = 4.474575116368833e-22 with argmin = [1.0, 1.0] in 21 iterations

For derivative-free methods, you can change the algorithm–and have no need to provide a gradient

In [17]:

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv, SimulatedAnnealing()) # or ParticleSwarm() or NelderMead()

Out[17]:

Results of Optimization Algorithm
 * Algorithm: Simulated Annealing
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.898375086557402,0.8001540211169486]
 * Minimum: 1.512149e-02
 * Iterations: 1000
 * Convergence: false
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = NaN 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = NaN |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = NaN 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: true
 * Objective Calls: 1001

However, you will note that this did not converge, as stochastic methods typically require many more iterations as a tradeoff for their global-convergence properties

See the maximum likelihood example and the accompanying jupyter notebook

JuMP.jl¶

The JuMP.jl package is an ambitious implementation of a modelling language for optimization problems in Julia

In that sense, it is more like an AMPL (or Pyomo) built on top of the Julia language with macros, and able to use a variety of different commerical and open-source solvers

If you have a linear, quadratic, conic, mixed-integer linear, etc. problem then this will likely be the ideal “meta-package” for calling various solvers

For nonlinear problems, the modelling language may make things difficult for complicated functions (as it is not designed to be used as a general-purpose non-linear optimizer)

See the quickstart guide for more details on all of the options

The following is an example of calling a linear objective with a nonlinear constraint (provided by an external function)

In [18]:

using JuMP, Ipopt
# solve
# max( x[1] + x[2] )
# st sqrt(x[1]^2 + x[2]^2) <= 1

function squareroot(x) # pretending we don't know sqrt()
    z = x # Initial starting point for Newton’s method
    while abs(z*z - x) > 1e-13
        z = z - (z*z-x)/(2z)
    end
    return z
end
m = Model(solver = IpoptSolver())
JuMP.register(m,:squareroot, 1, squareroot, autodiff=true) # need to register user defined functions for AD

@variable(m, x[1:2], start=0.5) # start is the initial condition
@objective(m, Max, sum(x))
@NLconstraint(m, squareroot(x[1]^2+x[2]^2) <= 1)
solve(m)

┌ Info: Precompiling JuMP [4076af6c-e467-56ae-b986-b466b2749572]
└ @ Base loading.jl:1186
┌ Info: Precompiling Ipopt [b6b21f68-93f8-5de0-b562-5493be1d77c9]
└ @ Base loading.jl:1186

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.8, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        2
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        1
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        1

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 -1.0000000e+00 0.00e+00 2.07e-01  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1 -1.4100714e+00 0.00e+00 5.48e-02  -1.7 3.94e-01    -  1.00e+00 7.36e-01f  1
   2 -1.4113851e+00 0.00e+00 2.83e-08  -2.5 9.29e-04    -  1.00e+00 1.00e+00f  1
   3 -1.4140632e+00 0.00e+00 1.50e-09  -3.8 1.89e-03    -  1.00e+00 1.00e+00f  1
   4 -1.4142117e+00 0.00e+00 1.84e-11  -5.7 1.05e-04    -  1.00e+00 1.00e+00f  1
   5 -1.4142136e+00 0.00e+00 8.23e-09  -8.6 1.30e-06    -  1.00e+00 1.00e+00f  1

Number of Iterations....: 5

                                   (scaled)                 (unscaled)
Objective...............:  -1.4142135740093271e+00   -1.4142135740093271e+00
Dual infeasibility......:   8.2280586788385790e-09    8.2280586788385790e-09
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   2.5059035815063646e-09    2.5059035815063646e-09
Overall NLP error.......:   8.2280586788385790e-09    8.2280586788385790e-09


Number of objective function evaluations             = 6
Number of objective gradient evaluations             = 6
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 6
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 6
Number of Lagrangian Hessian evaluations             = 5
Total CPU secs in IPOPT (w/o function evaluations)   =      0.113
Total CPU secs in NLP function evaluations           =      0.028

EXIT: Optimal Solution Found.

Out[18]:

:Optimal

And this is an example of a quadratic objective

In [19]:

# solve
# min (1-x)^2 + 100(y-x^2)^2)
# st x + y >= 10

using JuMP,Ipopt
m = Model(solver = IpoptSolver(print_level=0)) # settings for the solver, e.g. suppress output
@variable(m, x, start = 0.0)
@variable(m, y, start = 0.0)

@NLobjective(m, Min, (1-x)^2 + 100(y-x^2)^2)

solve(m)
println("x = ", getvalue(x), " y = ", getvalue(y))

# adding a (linear) constraint
@constraint(m, x + y == 10)
solve(m)
println("x = ", getvalue(x), " y = ", getvalue(y))

x = 0.9999999999999899 y = 0.9999999999999792
x = 2.7011471240982194 y = 7.2988528759017814

BlackBoxOptim.jl¶

Another package for doing global optimization without derivatives is BlackBoxOptim.jl

To see an example from the documentation,

In [20]:

using BlackBoxOptim

function rosenbrock2d(x)
return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end

results = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2);

Starting optimization with optimizer DiffEvoOpt{FitPopulation{Float64},RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},RandomBound{RangePerDimSearchSpace}}
0.00 secs, 0 evals, 0 steps

Optimization stopped after 10001 steps and 0.07263016700744629 seconds

WARNING: using BlackBoxOptim.update! in module Main conflicts with an existing identifier.

Termination reason: Max number of steps (10000) reached
Steps per second = 137697.6043435872
Function evals per second = 139638.94643062307
Improvements/step = 0.2135
Total function evaluations = 10142


Best candidate found: [1.0, 1.0]

Fitness: 0.000000000

An example for parallel execution of the objective is provided

Systems of Equations and Least Squares¶

Roots.jl¶

A root of a real function $ f $ on $ [a,b] $ is an $ x \in [a, b] $ such that $ f(x)=0 $

For example, if we plot the function

$$ f(x) = \sin(4 (x - 1/4)) + x + x^{20} - 1 $$

(1)

with $ x \in [0,1] $ we get

The unique root is approximately 0.408

The Roots package offers the fzero() to find roots

In [21]:

using Roots
f(x) = sin(4 * (x - 1/4)) + x + x^20 - 1
fzero(f, 0, 1)

Out[21]:

0.40829350427936706

NLsolve.jl¶

The package NLsolve.jl provides functions to solve for multivariate systems of equations and fixed-points

From the documentation, to solve for a system of equations without providing a jacobian

In [22]:

using NLsolve

f(x) = [(x[1]+3)*(x[2]^3-7)+18
        sin(x[2]*exp(x[1])-1)] # returns an array

results = nlsolve(f, [ 0.1; 1.2])

┌ Info: Recompiling stale cache file /Users/arnavsood/.julia/compiled/v1.0/NLsolve/KFCNP.ji for NLsolve [2774e3e8-f4cf-5e23-947b-6d7e65073b56]
└ @ Base loading.jl:1184
WARNING: using NLsolve.converged in module Main conflicts with an existing identifier.

Out[22]:

Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [0.1, 1.2]
 * Zero: [-7.77555e-17, 1.0]
 * Inf-norm of residuals: 0.000000
 * Iterations: 4
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 5
 * Jacobian Calls (df/dx): 5

In the above case, the algorithm used finite-differences to calculate the jacobian

Alternatively, if f(x) is written generically, you can use auto-differentiation with a single setting

In [23]:

results = nlsolve(f, [ 0.1; 1.2], autodiff=:forward)

println("converged=$(NLsolve.converged(results)) at root=$(results.zero) in $(results.iterations) iterations and $(results.f_calls) function calls")

converged=true at root=[-3.7818e-16, 1.0] in 4 iterations and 5 function calls

Providing a function with operates in-place (i.e. modifying an argument) may help performance for large systems of equations (and hurt it for small ones)

In [24]:

function f!(F, x) # modifies the first argument
    F[1] = (x[1]+3)*(x[2]^3-7)+18
    F[2] = sin(x[2]*exp(x[1])-1)
end

results = nlsolve(f!, [ 0.1; 1.2], autodiff=:forward)

println("converged=$(NLsolve.converged(results)) at root=$(results.zero) in $(results.iterations) iterations and $(results.f_calls) function calls")

converged=true at root=[-3.7818e-16, 1.0] in 4 iterations and 5 function calls

LeastSquaresOptim.jl¶

Many optimization problems can be solved using linear or nonlinear least squares

Let $ x \in R^N $ and $ F(x) : R^N \to R^M $ with $ M \geq N $, then the nonlinear least squares problem is

$$ \min_x F(x)^T F(x) $$

While $ F(x)^T F(x) \to R $, and hence this problem could technically use any nonlinear optimizer, it is useful to exploit the structure of the problem

In particular, the Jacobian of $ F(x) $, can be used to approximate the Hessian of the objective

As with most nonlinear optimization problems, the benefits will typically become evident only when analytical or automatic differentiation is possible

If $ M = N $ and we know a root $ F(x^*) = 0 $ to the system of equations exists, then NLS is the defacto method for solving large systems of equations

An implementation of NLS is given in LeastSquaresOptim.jl

From the documentation

In [25]:

using LeastSquaresOptim
function rosenbrock(x)
    [1 - x[1], 100 * (x[2]-x[1]^2)]
end
LeastSquaresOptim.optimize(rosenbrock, zeros(2), Dogleg())

┌ Info: Recompiling stale cache file /Users/arnavsood/.julia/compiled/v1.0/LeastSquaresOptim/hdNWC.ji for LeastSquaresOptim [0fc2ff8b-aaa3-5acd-a817-1944a5e08891]
└ @ Base loading.jl:1184

Out[25]:

Results of Optimization Algorithm
 * Algorithm: Dogleg
 * Minimizer: [1.0,1.0]
 * Sum of squares at Minimum: 0.000000
 * Iterations: 51
 * Convergence: true
 * |x - x'| < 1.0e-08: false
 * |f(x) - f(x')| / |f(x)| < 1.0e-08: true
 * |g(x)| < 1.0e-08: false
 * Function Calls: 52
 * Gradient Calls: 36
 * Multiplication Calls: 159

Note: Because there is a name clash between Optim.jl and this package, to use both we need to qualify the use of the optimize function (i.e. LeastSquaresOptim.optimize)

Here, by default it will use AD with ForwardDiff.jl to calculate the Jacobian, but you could also provide your own calculation of the Jacobian (analytical or using finite differences) and/or calculate the function in-place

In [26]:

function rosenbrock_f!(out, x)
    out[1] = 1 - x[1]
    out[2] = 100 * (x[2]-x[1]^2)
end
LeastSquaresOptim.optimize!(LeastSquaresProblem(x = zeros(2), f! = rosenbrock_f!, output_length = 2))

# if you want to use gradient
function rosenbrock_g!(J, x)
    J[1, 1] = -1
    J[1, 2] = 0
    J[2, 1] = -200 * x[1]
    J[2, 2] = 100
end
LeastSquaresOptim.optimize!(LeastSquaresProblem(x = zeros(2), f! = rosenbrock_f!, g! = rosenbrock_g!, output_length = 2))

Out[26]:

Results of Optimization Algorithm
 * Algorithm: Dogleg
 * Minimizer: [1.0,1.0]
 * Sum of squares at Minimum: 0.000000
 * Iterations: 51
 * Convergence: true
 * |x - x'| < 1.0e-08: false
 * |f(x) - f(x')| / |f(x)| < 1.0e-08: true
 * |g(x)| < 1.0e-08: false
 * Function Calls: 52
 * Gradient Calls: 36
 * Multiplication Calls: 159

Exercises¶

Exercise 1¶

Watch this video from one of Julia’s creators on automatic differentiation