In [1]:
HTML("""
<style>
.rendered_html table, .rendered_html th, .rendered_html tr, .rendered_html td {
     font-size: 100%;
}
</style>
""")
Out[1]:
In [2]:
using PyPlot, PyCall
using AIBECS, WorldOceanAtlasTools
using LinearAlgebra

The F-1 algorithm



Benoît Pasquier and François Primeau
University of California, Irvine

Paper: The F-1 algorithm for efficient computation of the Hessian matrix of an objective function defined implicitly by the solution of a steady-state problem (under review)

Code: F1Method.jl (Julia package — check it out on GitHub!)

F-1 algorithm – Outline

  1. Motivation \& Context
  2. Autodifferentiation
  3. What is the F-1 algorithm?
  4. Benchmarks

As we go through, I will demo some Julia code!

Motivation

For parameter optimization and parameter sensitivity estimation!

The AIBECS for building global marine steady-state biogeochemistry models in just a few commands (yesterday's CCRC seminar)

And the F-1 algorithm to facilitate optimization of biogeochemical parameters.

But the context of the F-1 algorithm is more general...

Context

Steady-state equation

$$\frac{\partial \boldsymbol{x}}{\partial t} = \boldsymbol{F}(\boldsymbol{x},\boldsymbol{p}) = 0$$

for some state $\boldsymbol{x}$ (size $n \sim 400\,000$) and parameters $\boldsymbol{p}$ (size $m \sim 10$).

Constrained optimization problem

$$\left\{\begin{aligned} \mathop{\textrm{minimize}}_{\boldsymbol{x}, \boldsymbol{p}} & & f(\boldsymbol{x}, \boldsymbol{p}) \\ \textrm{subject to} & & \boldsymbol{F}(\boldsymbol{x}, \boldsymbol{p}) = 0 \end{aligned}\right.$$
In [3]:
function constrained_optimization_plot()
    figure(figsize=(10,6))
    f(x,p) = (x - 1)^2 + (p - 1)^2 + 1
    s(p) = 1 + 0.9atan(p - 0.3)
    xs, ps = -1.2:0.1:3, -1.2:0.1:3.2
    plt = plot_surface(xs, ps, [f(x,p) for p in ps, x in xs], alpha = 0.5, cmap=:viridis_r)
    gca(projection="3d")
    P = repeat(reshape(ps, length(ps), 1), outer = [1, length(xs)])
    X = repeat(reshape(xs, 1, length(xs)), outer = [length(ps), 1])
    contour(X, P, [f(x,p) for p in ps, x in xs], levels = 2:6, colors="black", alpha=0.5, linewidths=0.5)
    plot([s(p) for p in ps], ps)
    legend(("\$F(x,p) = 0\$",))
    xlabel("state \$x\$"); ylabel("parameters \$p\$"); zlabel("objective \$f(x,p)\$")
    xticks(unique(round.(xs))); yticks(unique(round.(ps))); zticks(2:6)
    return plt, f, s, xs, ps
end
constrained_optimization_plot()
Out[3]:
(PyObject <mpl_toolkits.mplot3d.art3d.Poly3DCollection object at 0x140778470>, getfield(Main, Symbol("#f#5"))(), getfield(Main, Symbol("#s#6"))(), -1.2:0.1:3.0, -1.2:0.1:3.2)

Unconstrained optimization along the manifold of steady-state solutions.

$$\mathop{\textrm{minimize}}_\boldsymbol{p} \hat{f}(\boldsymbol{p})$$

where $$\hat{f}(\boldsymbol{p}) \equiv f\big(\boldsymbol{s}(\boldsymbol{p}), \boldsymbol{p}\big)$$ is the objective function.

And $\boldsymbol{s}(\boldsymbol{p})$ is the steady-state solution for parameters $\boldsymbol{p}$, i.e., such that

$$\boldsymbol{F}\left(\boldsymbol{s}(\boldsymbol{p}),\boldsymbol{p}\right) = 0$$
In [4]:
function unconstrained_optimization_plot()
    plt, f, s, xs, ps = constrained_optimization_plot()
    plot([s(p) for p in ps], ps, [f(s(p),p) for p in ps], color="black", linewidth=3)
    plot(maximum(xs) * ones(length(ps)), ps, [f(s(p),p) for p in ps], color="red")
    legend(("\$x=s(p) \\Longleftrightarrow F(x,p)=0\$","\$f(x,p)\$ with \$x = s(p)\$", "\$\\hat{f}(p) = f(s(p),p)\$"))
    for p in ps[1:22:end]
        plot(s(p) * [1,1], p * [1,1], f(s(p),p) * [0,1], color="black", alpha = 0.3, linestyle="--")
        plot([s(p), maximum(xs)], p * [1,1], f(s(p),p) * [1,1], color="black", alpha = 0.3, linestyle="--", marker="o")
    end
    return plt
end
unconstrained_optimization_plot()
Out[4]:
PyObject <mpl_toolkits.mplot3d.art3d.Poly3DCollection object at 0x1413973c8>

Two nested iterative algorithms

Inner solver with Newton step

$$\Delta \boldsymbol{x} \equiv - \left[\nabla_\boldsymbol{x} \boldsymbol{F}(\boldsymbol{x},\boldsymbol{p})\right]^{-1} \boldsymbol{F}(\boldsymbol{x},\boldsymbol{p})$$

Outer Optimizer with Newton step

$$\Delta\boldsymbol{p} \equiv - \left[\nabla^2\hat{f}(\boldsymbol{p})\right]^{-1}\nabla \hat{f}(\boldsymbol{p})$$

requires the Hessian of the objective function,

$$\nabla^2 \hat{f}(\boldsymbol{p})$$

Autodifferentiation

Take the Taylor expansion of $\hat{f}(\boldsymbol{p})$ in the $h\boldsymbol{e}_j$ direction for a given $h$:

$$\hat{f}(\boldsymbol{p} + h \boldsymbol{e}_j) = \hat{f}(\boldsymbol{p}) + h \underbrace{\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j}_{?} + \frac{h^2}{2} \, \left[\boldsymbol{e}_j^\mathsf{T} \, \nabla^2\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j\right] + \ldots$$

A standard solution is to use Finite differences:

$$\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j = \frac{\hat{f}(\boldsymbol{p} + h \boldsymbol{e}_j) - \hat{f}(\boldsymbol{p})}{h} + \mathcal{O}(h)$$

But truncation and round-off errors!

A better solution is to use Complex numbers!
Taylor-expand in the $ih\boldsymbol{e}_j$ direction: $$\hat{f}(\boldsymbol{p} + i h \boldsymbol{e}_j) = \hat{f}(\boldsymbol{p}) + i h \underbrace{\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j}_{?} - \frac{h^2}{2} \, \left[\boldsymbol{e}_j^\mathsf{T} \, \nabla^2\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j\right] + \ldots$$

Because when taking the imaginary part, the convergence is faster and there are no more round-off errors:

$$\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j = \Im\left[\frac{\hat{f}(\boldsymbol{p} + i h \boldsymbol{e}_j)}{h}\right] + \mathcal{O}(h^2)$$

In [5]:
𝑓(x) = cos(x^2) + exp(x)
∇𝑓(x) = -2x * sin(x^2) + exp(x)
finite_differences(f, x, h) = (f(x + h) - f(x)) / h
centered_differences(f, x, h) = (f(x + h) - f(x - h)) / 2h
complex_step_method(f, x, h) = imag(f(x + im * h)) / h
relative_error(m, f, ∇f, x, h) = Float64(abs(BigFloat(m(f, x, h)) - ∇f(BigFloat(x))) / abs(∇f(x)))
x, step_sizes = 2.0, 10 .^ (-20:0.02:0)
numerical_schemes = [finite_differences, centered_differences, complex_step_method]
plot(step_sizes, [relative_error(m, 𝑓, ∇𝑓, x, h) for h in step_sizes, m in numerical_schemes])
loglog(), legend(string.(numerical_schemes)), xlabel("step size, \$h\$"), ylabel("Relative Error, \$\\frac{|\\bullet - \\nabla f(x)|}{|\\nabla f(x)|}\$")
title("There are better alternatives to finite differences")
Out[5]:
PyObject Text(0.5, 1, 'There are better alternatives to finite differences')

An even better solution is to use Dual numbers!

Define $\varepsilon \ne 0$ s.t. $\varepsilon^2 = 0$, then the complete Taylor expansion in the $\varepsilon \boldsymbol{e}_j$ direction is

$$\hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_j) = \hat{f}(\boldsymbol{p}) + \varepsilon \underbrace{\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j}_{?}$$

Hence, 1st derivatives are given exactly by

$$\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j = \mathfrak{D}\left[\hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_j)\right]$$

where $\mathfrak{D}$ is the dual part (the $\varepsilon$ coefficient).

The dual number $\varepsilon$ behaves like an infinitesimal and it gives very accurate derivatives!

In [6]:
using DualNumbers
dual_step_method(f, x, h) = dualpart(f(x + ε))
push!(numerical_schemes, dual_step_method)
plot(step_sizes, [relative_error(m, 𝑓, ∇𝑓, x, h) for h in step_sizes, m in numerical_schemes])
loglog(), legend(string.(numerical_schemes)), xlabel("step size, \$h\$"), ylabel("Relative Error, \$\\frac{|\\bullet - \\nabla f(x)|}{|\\nabla f(x)|}\$")
title("There are even better alternatives to complex-step differentiation")
Out[6]:
PyObject Text(0.5, 1, 'There are even better alternatives to complex-step differentiation')

The gradient of the objective function can be computed in $m$ dual evaluations of the objective function, via

$$\nabla\hat{f}(\boldsymbol{p}) = \mathfrak{D} \left( \left[\begin{matrix} \hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_1) \\ \hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_2) \\ \vdots \\ \hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_{m}) \end{matrix} \right]^\mathsf{T} \right)$$

where $m$ is the number of parameters.

For second derivatives, we can use hyperdual numbers.
Let $\varepsilon_1$ and $\varepsilon_2$ be the hyperdual units defined by $\varepsilon_1^2 = \varepsilon_2^2 = 0$ but $\varepsilon_1 \varepsilon_2 \ne 0$.

Let $\boldsymbol{p}_{jk} = \boldsymbol{p} + \varepsilon_1 \boldsymbol{e}_j + \varepsilon_2 \boldsymbol{e}_k$, for which the Taylor expansion of $\hat{f}$ is

$$\hat{f}(\boldsymbol{p}_{jk}) = \hat{f}(\boldsymbol{p}) + \varepsilon_1 \nabla\hat{f}(\boldsymbol{p}) \boldsymbol{e}_j + \varepsilon_2 \nabla\hat{f}(\boldsymbol{p}) \boldsymbol{e}_k + \varepsilon_1 \varepsilon_2 \underbrace{\boldsymbol{e}_j^\mathsf{T} \nabla^2\hat{f}(\boldsymbol{p}) \boldsymbol{e}_k}_{?}$$

Taking the "hyperdual" part gives the second derivative:

$$\boldsymbol{e}_j^\mathsf{T} \nabla^2\hat{f}(\boldsymbol{p}) \boldsymbol{e}_k = \mathfrak{H}\left[\hat{f}(\boldsymbol{p}_{jk})\right]$$

where $\mathfrak{H}$ stands for hyperdual part (the $\varepsilon_1 \varepsilon_2$ coefficient).

Hyperdual steps also give very accurate derivatives!

In [7]:
∇²𝑓(x) = -2sin(x^2) - 4x^2 * cos(x^2) + exp(x)
using HyperDualNumbers
finite_differences_2(f, x, h) = (f(x + h) - 2f(x) + f(x - h)) / h^2
hyperdual_step_method(f, x, h) = ε₁ε₂part(f(x + ε₁ + ε₂))
numerical_schemes_2 = [finite_differences_2, hyperdual_step_method]
plot(step_sizes, [relative_error(m, 𝑓, ∇²𝑓, x, h) for h in step_sizes, m in numerical_schemes_2])
loglog(), legend(string.(numerical_schemes_2)), xlabel("step size, \$h\$"), ylabel("Relative Error, \$\\frac{|\\bullet - \\nabla f(x)|}{|\\nabla f(x)|}\$")
title("HyperDual Numbers for second derivatives")
Out[7]:
PyObject Text(0.5, 1, 'HyperDual Numbers for second derivatives')

Autodifferentiating through an iterative solver is expensive

The Hessian of $\hat{f}$ can be computed in $\frac{m(m+1)}{2}$ hyperdual evaluations:

$$\nabla^2\hat{f}(\boldsymbol{p}) = \mathfrak{H} \left( \left[\begin{matrix} \hat{f}(\boldsymbol{p}_{11}) & \hat{f}(\boldsymbol{p}_{12}) & \cdots & \hat{f}(\boldsymbol{p}_{1m}) \\ \hat{f}(\boldsymbol{p}_{12}) & \hat{f}(\boldsymbol{p}_{22}) & \cdots & \hat{f}(\boldsymbol{p}_{2m}) \\ \vdots & \vdots & \ddots & \vdots \\ \hat{f}(\boldsymbol{p}_{1m}) & \hat{f}(\boldsymbol{p}_{2m}) & \cdots & \hat{f}(\boldsymbol{p}_{mm}) \end{matrix} \right] \right)$$

But this requires $\frac{m(m+1)}{2}$ calls to the inner solver, which requires hyperdual factorizations, and fine-tuned non-real tolerances!

What is the F-1 algorithm

What does it do?

The F-1 algorithm allows you to calculate the gradient and Hessian of an objective function, $\hat{f}(\boldsymbol{p})$, defined implicitly by the solution of a steady-state problem.

How does it work?

It leverages analytical shortcuts, combined with dual and hyperdual numbers, to avoid calls to the inner solver and unnecessary factorizations.

Analytical gradient

Differentiate the objective function $\hat{f}(\boldsymbol{p}) = f\left(\boldsymbol{s}(\boldsymbol{p}), \boldsymbol{p}\right)$:

$$\color{forestgreen}{\underbrace{\nabla\hat{f}(\boldsymbol{p})}_{1 \times m}} = \color{royalblue}{\underbrace{\nabla_\boldsymbol{x}f(\boldsymbol{s}, \boldsymbol{p})_{\vphantom{\boldsymbol{p}}}}_{1 \times n}} \, \color{red}{\underbrace{\nabla \boldsymbol{s}(\boldsymbol{p})_{\vphantom{\boldsymbol{p}}}}_{n \times m}} + \color{DarkOrchid}{\underbrace{\nabla_\boldsymbol{p} f(\boldsymbol{s}, \boldsymbol{p})}_{1 \times m}}$$

Differentiate the steady-state equation, $\boldsymbol{F}\left(\boldsymbol{s}(\boldsymbol{p}),\boldsymbol{p}\right)=0$:

$$\color{royalblue}{\underbrace{\mathbf{A}_{\vphantom{\boldsymbol{p}}}}_{n \times n}} \, \color{red}{\underbrace{\nabla\boldsymbol{s}(\boldsymbol{p})_{\vphantom{\boldsymbol{p}}}}_{n \times m}} + \color{forestgreen}{\underbrace{\nabla_\boldsymbol{p} \boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p})}_{n\times m}} = 0$$

where $\mathbf{A} \equiv \nabla_\boldsymbol{x}\boldsymbol{F}(\boldsymbol{s},\boldsymbol{p})$ is the Jacobian of the steady-state system (a large, sparse matrix)

Analytical Hessian

Differentiate $\hat{f}(\boldsymbol{p}) = f\left(\boldsymbol{s}(\boldsymbol{p}), \boldsymbol{p}\right)$ twice:

$$\begin{split} \nabla^2 \hat{f}(\boldsymbol{p}) &= \nabla_{\boldsymbol{x}\boldsymbol{x}}f(\boldsymbol{s}, \boldsymbol{p}) \, (\nabla\boldsymbol{s} \otimes \nabla\boldsymbol{s}) + \nabla_{\boldsymbol{x}\boldsymbol{p}}f(\boldsymbol{s}, \boldsymbol{p}) \, (\nabla\boldsymbol{s} \otimes \mathbf{I}_\boldsymbol{p}) \\ &\quad+ \nabla_{\boldsymbol{p}\boldsymbol{x}}f(\boldsymbol{s}, \boldsymbol{p}) \, (\mathbf{I}_\boldsymbol{p} \otimes \nabla\boldsymbol{s}) + \nabla_{\boldsymbol{p}\boldsymbol{p}}f(\boldsymbol{s}, \boldsymbol{p}) \, (\mathbf{I}_\boldsymbol{p} \otimes \mathbf{I}_\boldsymbol{p}) \\ &\quad+ \nabla_\boldsymbol{x}f(\boldsymbol{s}, \boldsymbol{p}) \, \nabla^2\boldsymbol{s}, \end{split}$$

Differentiate the steady-state equation, $\boldsymbol{F}\left(\boldsymbol{s}(\boldsymbol{p}),\boldsymbol{p}\right)=0$, twice:

$$\begin{split} 0 & = \nabla_{\boldsymbol{x}\boldsymbol{x}}\boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p}) \, (\nabla\boldsymbol{s} \otimes \nabla\boldsymbol{s}) + \nabla_{\boldsymbol{x}\boldsymbol{p}}\boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p}) \, (\nabla\boldsymbol{s} \otimes \mathbf{I}_\boldsymbol{p})\\ & \quad + \nabla_{\boldsymbol{p}\boldsymbol{x}}\boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p}) \, (\mathbf{I}_\boldsymbol{p} \otimes \nabla\boldsymbol{s}) + \nabla_{\boldsymbol{p}\boldsymbol{p}}\boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p}) \, (\mathbf{I}_\boldsymbol{p} \otimes \mathbf{I}_\boldsymbol{p}) \\ & \quad + \mathbf{A} \, \nabla^2\boldsymbol{s}. \end{split}$$

(Tensor notation of Manton [2012])

F-1 Gradient and Hessian

  1. Find the steady state solution $\boldsymbol{s}(\boldsymbol{p})$

  2. Factorize the Jacobian $\mathbf{A} = \nabla_\boldsymbol{x} \boldsymbol{F}\left(\boldsymbol{s}(\boldsymbol{p}), \boldsymbol{p}\right)$ ($1$ factorization)

  3. Compute $\nabla\boldsymbol{s}(\boldsymbol{p}) = -\mathbf{A}^{-1} \nabla_\boldsymbol{p} \boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p})$ ($m$ forward and back substitutions)

  4. Compute the gradient

    $$\nabla\hat{f}(\boldsymbol{p}) = \nabla_\boldsymbol{x}f(\boldsymbol{s}, \boldsymbol{p}) \, \nabla \boldsymbol{s}(\boldsymbol{p}) + \nabla_\boldsymbol{p} f(\boldsymbol{s}, \boldsymbol{p})$$

  5. Compute the Hessian ($1$ forward and back substitution)

    $$[\nabla^2\hat{f}(\boldsymbol{p})]_{jk} = \mathfrak{H}\big[f(\boldsymbol{x}_{jk}, \boldsymbol{p}_{jk})\big] - \mathfrak{H}\big[\boldsymbol{F}(\boldsymbol{x}_{jk}, \boldsymbol{p}_{jk})^\mathsf{T}\big] \, \mathbf{A}^{-\mathsf{T}} \,\nabla_\boldsymbol{x}f(\boldsymbol{s},\boldsymbol{p})^\mathsf{T}$$

    where $\boldsymbol{x}_{jk} \equiv \boldsymbol{s} + \varepsilon_1 \nabla\boldsymbol{s} \, \boldsymbol{e}_j +\varepsilon_2 \nabla\boldsymbol{s} \, \boldsymbol{e}_k$

Demo

Let us first quickly build a model using the AIBECS

This will create $\boldsymbol{F}(\boldsymbol{x},\boldsymbol{p})$ and $f(\boldsymbol{x},\boldsymbol{p})$ and some derivatives, such that we have an expensive objective function, $\hat{f}(\boldsymbol{p})$, defined implicitly by the solution $\boldsymbol{s}(\boldsymbol{p})$.

Below I will skip the details of the model but I will run the code that generates F, ∇ₓF, f, and ∇ₓf.

Simple phosphorus-cycling model:

- Dissolved inorganic phosphorus (DIP)
- Particulate organic phosphorus (POP)

In [8]:
const wet3D, grd, T_OCIM = OCIM0.load()
T_DIP(p) = T_OCIM
const ztop = vector_of_top_depths(wet3D, grd) .|> ustrip
const iwet, v = findall(vec(wet3D)), vector_of_volumes(wet3D, grd) .|> ustrip
const nb, z = length(iwet), grd.depth_3D[iwet] .|> ustrip
const DIV, Iabove = buildDIV(wet3D, iwet, grd), buildIabove(wet3D, iwet)
const S₀, S′ = buildPFD(ones(nb), DIV, Iabove), buildPFD(ztop, DIV, Iabove)
T_POP(p) = p.w₀ * S₀ + p.w′ * S′
function G_DIP!(dx, DIP, POP, p)
    τ, k, z₀, κ, xgeo, τgeo = p.τ, p.k, p.z₀, p.κ, p.xgeo, p.τgeo
    dx .= @. (xgeo - DIP) / τgeo - (DIP  0) / τ * DIP^2 / (DIP + k) * (z  z₀) + κ * POP
end
function G_POP!(dx, DIP, POP, p)
    τ, k, z₀, κ = p.τ, p.k, p.z₀, p.κ
    dx .= @. (DIP  0) / τ * DIP^2 / (DIP + k) * (z  z₀) - κ * POP
end
const iDIP = 1:nb
const iPOP = nb .+ iDIP
t = empty_parameter_table()
add_parameter!(t, :xgeo, 2.17u"mmol/m^3", optimizable = true)
add_parameter!(t, :τgeo, 1.0u"Myr")
add_parameter!(t, :k, 5.0u"μmol/m^3", optimizable = true)
add_parameter!(t, :z₀, 80.0u"m")
add_parameter!(t, :w₀, 0.5u"m/d", optimizable = true)
add_parameter!(t, :w′, 0.1u"1/d", optimizable = true)
add_parameter!(t, :κ, 0.3u"1/d", optimizable = true)
add_parameter!(t, :τ, 100.0u"d", optimizable = true)
initialize_Parameters_type(t, "Pcycle_Parameters")
p = Pcycle_Parameters()
F!, ∇ₓF = inplace_state_function_and_Jacobian((T_DIP,T_POP), (G_DIP!,G_POP!), nb)
x = p.xgeo * ones(2nb)
const μDIPobs3D, σ²DIPobs3D = WorldOceanAtlasTools.fit_to_grid(grd, 2018, "phosphate", "annual", "1°", "an")
const μDIPobs, σ²DIPobs = μDIPobs3D[iwet], σ²DIPobs3D[iwet]
const μx, σ²x = (μDIPobs, missing), (σ²DIPobs, missing)
const ωs, ωp = [1.0, 0.0], 1e-4
f, ∇ₓf, ∇ₚf = generate_objective_and_derivatives(ωs, μx, σ²x, v, ωp, mean_obs(p), variance_obs(p))
F(x::Vector{Tx}, p::Pcycle_Parameters{Tp}) where {Tx,Tp} = F!(Vector{promote_type(Tx,Tp)}(undef,length(x)),x,p)
Loading OCIM0.1 ✔
┌ Info: You are about to use OCIM0.1 model.
│ If you use it for research, please cite:
│ 
│ - Primeau, F. W., Holzer, M., and DeVries, T. (2013), Southern Ocean nutrient trapping and the efficiency of the biological pump, J. Geophys. Res. Oceans, 118, 2547–2564, doi:10.1002/jgrc.20181.
│ - DeVries, T. and F. Primeau, 2011: Dynamically and Observationally Constrained Estimates of Water-Mass Distributions and Ages in the Global Ocean. J. Phys. Oceanogr., 41, 2381–2401, doi:10.1175/JPO-D-10-05011.1
│ 
│ You can find the corresponding BibTeX entries in the CITATION.bib file
│ at the root of the AIBECS.jl package repository.
│ (Look for the "DeVries_Primeau_2011" and "Primeau_etal_2013" keys.)
└ @ AIBECS.OCIM0 /Users/benoitpasquier/.julia/dev/AIBECS/src/OCIM0.jl:53
Registering WOA data with DataDeps
  Reading NetCDF file
  Rearranging data
  Filtering data
  Averaging data over each grid box
  Setting μ = 0 and σ² = ∞ where no obs
  Setting a realistic minimum for σ²
Out[8]:
F (generic function with 1 method)

Tracer equations:

$$\left[\frac{\partial}{\partial t} + \nabla \cdot (\boldsymbol{u} + \mathbf{K}\cdot\nabla )\right] x_\mathsf{DIP} = -U(x_\mathsf{DIP}) + R(x_\mathsf{POP})$$$$\left[\frac{\partial}{\partial t} + \nabla \cdot \boldsymbol{w}\right] x_\mathsf{POP} = U(x_\mathsf{DIP}) - R(x_\mathsf{POP})$$
  • DIP→POP: $U=\frac{x_\mathsf{DIP}}{\tau} \, \frac{x_\mathsf{DIP}}{x_\mathsf{DIP} + k} \, (z < z_0)$

  • POP→DIP: $R = \kappa \, x_\mathsf{POP}$

6 parameters

In [9]:
p
Out[9]:
  xgeo = 2.17e+00 [mmol m⁻³] 
  τgeo = 1.00e+00 [Myr] (fixed)
     k = 5.00e+00 [μmol m⁻³] 
    z₀ = 8.00e+01 [m] (fixed)
    w₀ = 5.00e-01 [m d⁻¹] 
    w′ = 1.00e-01 [d⁻¹] 
     κ = 3.00e-01 [d⁻¹] 
     τ = 1.00e+02 [d] 
Pcycle_Parameters{Float64}
In [10]:
p[:]
Out[10]:
6-element Array{Float64,1}:
 0.00217              
 4.9999999999999996e-6
 5.787037037037037e-6 
 1.1574074074074074e-6
 3.472222222222222e-6 
 8.64e6               

Using the F-1 algorithm is easy

In [11]:
using F1Method
mem = F1Method.initialize_mem(x, p) 
objective(p) = F1Method.objective(f, F, ∇ₓF,      mem, p, CTKAlg(); preprint=" ")
gradient(p)  =  F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, CTKAlg(); preprint=" ")
hessian(p)   =   F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, CTKAlg(); preprint=" ")
Out[11]:
hessian (generic function with 1 method)

That's it, we have defined functions that "autodifferentiate" through the iterative solver!

Testing the F-1 algorithm

We simply call the objective,

In [12]:
objective(p)
 (No initial Jacobian factors fed to Newton solver)
 Solving F(x) = 0 (using Shamanskii Method)
 │   iteration     |F(x)|   |δx|/|x|   Jac age   fac age
 │       0        5.5e-06                               
 │       1        4.1e-12    7.7e-01        1         1    
 │       2        2.7e-13    1.0e-05        2         2    
 │       3        2.2e-14    3.7e-07        3         3    
 │       4        2.0e-15    3.2e-08        4         4    
 │       5        2.0e-16    3.0e-09        5         5    
 │       6        2.1e-17    3.1e-10        6         6    
 │       7        2.3e-18    3.3e-11        7         7    
 │       8        2.6e-19    4.3e-12        8         8    
 │       9        3.0e-20    3.5e-12        9         9    
 └─> Newton has converged, |x|/|F(x)| = 1e+12 years
Out[12]:
0.009662688019579478

the gradient,

In [13]:
gradient(p)
Out[13]:
1×6 Array{Float64,2}:
 -15.9279  9.05772  -37.5944  -9661.29  3326.29  1.13229e-10

and the Hessian,

In [14]:
hessian(p)
Out[14]:
6×6 Array{Float64,2}:
     1.93635e5    -7994.37        …  -4.39396e6   -1.04921e-6 
 -7994.37             3.78521e6       1.59453e6    9.90002e-7 
     1.32924e5   -98274.9             7.01703e5   -3.56705e-6 
     1.25173e7       -4.29221e6       2.42006e9   -0.000106045
    -4.39396e6        1.59453e6      -1.75386e9    4.12933e-5 
    -1.04921e-6       9.90002e-7  …   4.12933e-5   2.87856e-17

And do it again after updating the parameters, this time also recording the time spent, for the objective,

In [15]:
@time objective(1.1p)
 (No initial Jacobian factors fed to Newton solver)
 Solving F(x) = 0 (using Shamanskii Method)
 │   iteration     |F(x)|   |δx|/|x|   Jac age   fac age
 │       0        4.7e-09                               
 │       1        3.8e-12    9.9e-02        1         1    
 │       2        2.9e-13    4.9e-06        2         2    
 │       3        3.0e-14    4.5e-07        3         3    
 │       4        3.3e-15    4.8e-08        4         4    
 │       5        3.7e-16    5.4e-09        5         5    
 │       6        4.3e-17    6.2e-10        6         6    
 │       7        4.9e-18    7.1e-11        7         7    
 │       8        5.6e-19    8.1e-12        8         8    
 │       9        6.4e-20    1.9e-12        9         9    
 │      10        7.4e-21    7.9e-13       10        10    
 └─> Newton has converged, |x|/|F(x)| = 4.5e+12 years
 38.230818 seconds (98.39 k allocations: 3.182 GiB, 5.85% gc time)
Out[15]:
0.010696151016861694

the gradient,

In [16]:
@time gradient(1.1p)
 32.789312 seconds (2.02 k allocations: 3.037 GiB, 6.97% gc time)
Out[16]:
1×6 Array{Float64,2}:
 25.0791  9.91269  -8.38374  -6268.7  2153.61  -6.97202e-11

and the Hessian,

In [17]:
@time hessian(1.1p)
 18.919829 seconds (5.66 k allocations: 8.197 GiB, 42.01% gc time)
Out[17]:
6×6 Array{Float64,2}:
     1.92817e5   -7210.06         1.37981e5   …  -4.75529e6    -1.06793e-6 
 -7210.06            2.6767e6    -1.14666e5       2.31281e6     1.09749e-6 
     1.37981e5      -1.14666e5    2.66979e6      -1.3653e7     -7.11517e-6 
     1.3576e7       -6.36511e6    5.19961e7       3.78458e8    -0.000387782
    -4.75529e6       2.31281e6   -1.3653e7       -6.57363e8     0.000141119
    -1.06793e-6      1.09749e-6  -7.11517e-6  …   0.000141119   7.76363e-17

Factorizations are the bottleneck

In [18]:
@time factorize(∇ₓF(mem.s, mem.p))
 24.525396 seconds (530.16 k allocations: 2.183 GiB, 2.74% gc time)
Out[18]:
UMFPACK LU Factorization of a (382338, 382338) sparse matrix
Ptr{Nothing} @0x00007febc64858f0
In [19]:
length(p), length(p)^2 * 20.313170 * u"s" |> u"minute"
Out[19]:
(6, 12.187902000000001 minute)

Benchmark the F-1 algorithm
(for a full optimization run)

Algorithm Definition Factorizations
F-1 F-1 algorithm $\mathcal{O}(1)$
DUAL Dual step for Hessian + Analytical gradient $\mathcal{O}(m)$
COMPLEX Complex step for Hessian + Analytical gradient $\mathcal{O}(m)$
FD1 Finite differences for Hessian + Analytical gradient $\mathcal{O}(m)$
HYPER Hyperdual step for Hessian and dual step for gradient $\mathcal{O}(m^2)$
FD2 Finite differences for Hessian and gradient $\mathcal{O}(m^2)$

Computation time full optimization

Partition of computation time

Conclusions

The F-1 algorithm is

  • easy to use and understand
  • accurate (machine-precision)
  • fast!

Check it on Github
at briochemc/F1Method.jl!