In [1]:
HTML("""
<style>
.rendered_html table, .rendered_html th, .rendered_html tr, .rendered_html td {
font-size: 100%;
}
</style>
""")

Out[1]:
In [2]:
using PyPlot, PyCall
using AIBECS, WorldOceanAtlasTools
using LinearAlgebra


# The F-1 algorithm

Benoît Pasquier and François Primeau
University of California, Irvine

Paper: The F-1 algorithm for efficient computation of the Hessian matrix of an objective function defined implicitly by the solution of a steady-state problem (under review)

Code: F1Method.jl (Julia package — check it out on GitHub!)

## F-1 algorithm – Outline¶

1. Motivation \& Context
2. Autodifferentiation
3. What is the F-1 algorithm?
4. Benchmarks

As we go through, I will demo some Julia code!

# Motivation¶

For parameter optimization and parameter sensitivity estimation!

The AIBECS for building global marine steady-state biogeochemistry models in just a few commands (yesterday's CCRC seminar)

And the F-1 algorithm to facilitate optimization of biogeochemical parameters.

But the context of the F-1 algorithm is more general...

# Context¶

$$\frac{\partial \boldsymbol{x}}{\partial t} = \boldsymbol{F}(\boldsymbol{x},\boldsymbol{p}) = 0$$

for some state $\boldsymbol{x}$ (size $n \sim 400\,000$) and parameters $\boldsymbol{p}$ (size $m \sim 10$).

Constrained optimization problem

\left\{\begin{aligned} \mathop{\textrm{minimize}}_{\boldsymbol{x}, \boldsymbol{p}} & & f(\boldsymbol{x}, \boldsymbol{p}) \\ \textrm{subject to} & & \boldsymbol{F}(\boldsymbol{x}, \boldsymbol{p}) = 0 \end{aligned}\right.
In [3]:
function constrained_optimization_plot()
figure(figsize=(10,6))
f(x,p) = (x - 1)^2 + (p - 1)^2 + 1
s(p) = 1 + 0.9atan(p - 0.3)
xs, ps = -1.2:0.1:3, -1.2:0.1:3.2
plt = plot_surface(xs, ps, [f(x,p) for p in ps, x in xs], alpha = 0.5, cmap=:viridis_r)
gca(projection="3d")
P = repeat(reshape(ps, length(ps), 1), outer = [1, length(xs)])
X = repeat(reshape(xs, 1, length(xs)), outer = [length(ps), 1])
contour(X, P, [f(x,p) for p in ps, x in xs], levels = 2:6, colors="black", alpha=0.5, linewidths=0.5)
plot([s(p) for p in ps], ps)
legend(("\$F(x,p) = 0\$",))
xlabel("state \$x\$"); ylabel("parameters \$p\$"); zlabel("objective \$f(x,p)\$")
xticks(unique(round.(xs))); yticks(unique(round.(ps))); zticks(2:6)
return plt, f, s, xs, ps
end
constrained_optimization_plot()

Out[3]:
(PyObject <mpl_toolkits.mplot3d.art3d.Poly3DCollection object at 0x140778470>, getfield(Main, Symbol("#f#5"))(), getfield(Main, Symbol("#s#6"))(), -1.2:0.1:3.0, -1.2:0.1:3.2)

Unconstrained optimization along the manifold of steady-state solutions.

$$\mathop{\textrm{minimize}}_\boldsymbol{p} \hat{f}(\boldsymbol{p})$$

where $$\hat{f}(\boldsymbol{p}) \equiv f\big(\boldsymbol{s}(\boldsymbol{p}), \boldsymbol{p}\big)$$ is the objective function.

And $\boldsymbol{s}(\boldsymbol{p})$ is the steady-state solution for parameters $\boldsymbol{p}$, i.e., such that

$$\boldsymbol{F}\left(\boldsymbol{s}(\boldsymbol{p}),\boldsymbol{p}\right) = 0$$
In [4]:
function unconstrained_optimization_plot()
plt, f, s, xs, ps = constrained_optimization_plot()
plot([s(p) for p in ps], ps, [f(s(p),p) for p in ps], color="black", linewidth=3)
plot(maximum(xs) * ones(length(ps)), ps, [f(s(p),p) for p in ps], color="red")
legend(("\$x=s(p) \\Longleftrightarrow F(x,p)=0\$","\$f(x,p)\$ with \$x = s(p)\$", "\$\\hat{f}(p) = f(s(p),p)\$"))
for p in ps[1:22:end]
plot(s(p) * [1,1], p * [1,1], f(s(p),p) * [0,1], color="black", alpha = 0.3, linestyle="--")
plot([s(p), maximum(xs)], p * [1,1], f(s(p),p) * [1,1], color="black", alpha = 0.3, linestyle="--", marker="o")
end
return plt
end
unconstrained_optimization_plot()

Out[4]:
PyObject <mpl_toolkits.mplot3d.art3d.Poly3DCollection object at 0x1413973c8>

Two nested iterative algorithms

Inner solver with Newton step

$$\Delta \boldsymbol{x} \equiv - \left[\nabla_\boldsymbol{x} \boldsymbol{F}(\boldsymbol{x},\boldsymbol{p})\right]^{-1} \boldsymbol{F}(\boldsymbol{x},\boldsymbol{p})$$

Outer Optimizer with Newton step

$$\Delta\boldsymbol{p} \equiv - \left[\nabla^2\hat{f}(\boldsymbol{p})\right]^{-1}\nabla \hat{f}(\boldsymbol{p})$$

requires the Hessian of the objective function,

$$\nabla^2 \hat{f}(\boldsymbol{p})$$

# Autodifferentiation¶

Take the Taylor expansion of $\hat{f}(\boldsymbol{p})$ in the $h\boldsymbol{e}_j$ direction for a given $h$:

$$\hat{f}(\boldsymbol{p} + h \boldsymbol{e}_j) = \hat{f}(\boldsymbol{p}) + h \underbrace{\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j}_{?} + \frac{h^2}{2} \, \left[\boldsymbol{e}_j^\mathsf{T} \, \nabla^2\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j\right] + \ldots$$

A standard solution is to use Finite differences:

$$\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j = \frac{\hat{f}(\boldsymbol{p} + h \boldsymbol{e}_j) - \hat{f}(\boldsymbol{p})}{h} + \mathcal{O}(h)$$

But truncation and round-off errors!

A better solution is to use Complex numbers!
Taylor-expand in the $ih\boldsymbol{e}_j$ direction: $$\hat{f}(\boldsymbol{p} + i h \boldsymbol{e}_j) = \hat{f}(\boldsymbol{p}) + i h \underbrace{\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j}_{?} - \frac{h^2}{2} \, \left[\boldsymbol{e}_j^\mathsf{T} \, \nabla^2\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j\right] + \ldots$$

Because when taking the imaginary part, the convergence is faster and there are no more round-off errors:

$$\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j = \Im\left[\frac{\hat{f}(\boldsymbol{p} + i h \boldsymbol{e}_j)}{h}\right] + \mathcal{O}(h^2)$$

In [5]:
𝑓(x) = cos(x^2) + exp(x)
∇𝑓(x) = -2x * sin(x^2) + exp(x)
finite_differences(f, x, h) = (f(x + h) - f(x)) / h
centered_differences(f, x, h) = (f(x + h) - f(x - h)) / 2h
complex_step_method(f, x, h) = imag(f(x + im * h)) / h
relative_error(m, f, ∇f, x, h) = Float64(abs(BigFloat(m(f, x, h)) - ∇f(BigFloat(x))) / abs(∇f(x)))
x, step_sizes = 2.0, 10 .^ (-20:0.02:0)
numerical_schemes = [finite_differences, centered_differences, complex_step_method]
plot(step_sizes, [relative_error(m, 𝑓, ∇𝑓, x, h) for h in step_sizes, m in numerical_schemes])
loglog(), legend(string.(numerical_schemes)), xlabel("step size, \$h\$"), ylabel("Relative Error, \$\\frac{|\\bullet - \\nabla f(x)|}{|\\nabla f(x)|}\$")
title("There are better alternatives to finite differences")

Out[5]:
PyObject Text(0.5, 1, 'There are better alternatives to finite differences')

An even better solution is to use Dual numbers!

Define $\varepsilon \ne 0$ s.t. $\varepsilon^2 = 0$, then the complete Taylor expansion in the $\varepsilon \boldsymbol{e}_j$ direction is

$$\hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_j) = \hat{f}(\boldsymbol{p}) + \varepsilon \underbrace{\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j}_{?}$$

Hence, 1st derivatives are given exactly by

$$\nabla\hat{f}(\boldsymbol{p}) \, \boldsymbol{e}_j = \mathfrak{D}\left[\hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_j)\right]$$

where $\mathfrak{D}$ is the dual part (the $\varepsilon$ coefficient).

The dual number $\varepsilon$ behaves like an infinitesimal and it gives very accurate derivatives!

In [6]:
using DualNumbers
dual_step_method(f, x, h) = dualpart(f(x + ε))
push!(numerical_schemes, dual_step_method)
plot(step_sizes, [relative_error(m, 𝑓, ∇𝑓, x, h) for h in step_sizes, m in numerical_schemes])
loglog(), legend(string.(numerical_schemes)), xlabel("step size, \$h\$"), ylabel("Relative Error, \$\\frac{|\\bullet - \\nabla f(x)|}{|\\nabla f(x)|}\$")
title("There are even better alternatives to complex-step differentiation")

Out[6]:
PyObject Text(0.5, 1, 'There are even better alternatives to complex-step differentiation')

The gradient of the objective function can be computed in $m$ dual evaluations of the objective function, via

$$\nabla\hat{f}(\boldsymbol{p}) = \mathfrak{D} \left( \left[\begin{matrix} \hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_1) \\ \hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_2) \\ \vdots \\ \hat{f}(\boldsymbol{p} + \varepsilon \boldsymbol{e}_{m}) \end{matrix} \right]^\mathsf{T} \right)$$

where $m$ is the number of parameters.

For second derivatives, we can use hyperdual numbers.
Let $\varepsilon_1$ and $\varepsilon_2$ be the hyperdual units defined by $\varepsilon_1^2 = \varepsilon_2^2 = 0$ but $\varepsilon_1 \varepsilon_2 \ne 0$.

Let $\boldsymbol{p}_{jk} = \boldsymbol{p} + \varepsilon_1 \boldsymbol{e}_j + \varepsilon_2 \boldsymbol{e}_k$, for which the Taylor expansion of $\hat{f}$ is

$$\hat{f}(\boldsymbol{p}_{jk}) = \hat{f}(\boldsymbol{p}) + \varepsilon_1 \nabla\hat{f}(\boldsymbol{p}) \boldsymbol{e}_j + \varepsilon_2 \nabla\hat{f}(\boldsymbol{p}) \boldsymbol{e}_k + \varepsilon_1 \varepsilon_2 \underbrace{\boldsymbol{e}_j^\mathsf{T} \nabla^2\hat{f}(\boldsymbol{p}) \boldsymbol{e}_k}_{?}$$

Taking the "hyperdual" part gives the second derivative:

$$\boldsymbol{e}_j^\mathsf{T} \nabla^2\hat{f}(\boldsymbol{p}) \boldsymbol{e}_k = \mathfrak{H}\left[\hat{f}(\boldsymbol{p}_{jk})\right]$$

where $\mathfrak{H}$ stands for hyperdual part (the $\varepsilon_1 \varepsilon_2$ coefficient).

Hyperdual steps also give very accurate derivatives!

In [7]:
∇²𝑓(x) = -2sin(x^2) - 4x^2 * cos(x^2) + exp(x)
using HyperDualNumbers
finite_differences_2(f, x, h) = (f(x + h) - 2f(x) + f(x - h)) / h^2
hyperdual_step_method(f, x, h) = ε₁ε₂part(f(x + ε₁ + ε₂))
numerical_schemes_2 = [finite_differences_2, hyperdual_step_method]
plot(step_sizes, [relative_error(m, 𝑓, ∇²𝑓, x, h) for h in step_sizes, m in numerical_schemes_2])
loglog(), legend(string.(numerical_schemes_2)), xlabel("step size, \$h\$"), ylabel("Relative Error, \$\\frac{|\\bullet - \\nabla f(x)|}{|\\nabla f(x)|}\$")
title("HyperDual Numbers for second derivatives")

Out[7]:
PyObject Text(0.5, 1, 'HyperDual Numbers for second derivatives')

### Autodifferentiating through an iterative solver is expensive¶

The Hessian of $\hat{f}$ can be computed in $\frac{m(m+1)}{2}$ hyperdual evaluations:

$$\nabla^2\hat{f}(\boldsymbol{p}) = \mathfrak{H} \left( \left[\begin{matrix} \hat{f}(\boldsymbol{p}_{11}) & \hat{f}(\boldsymbol{p}_{12}) & \cdots & \hat{f}(\boldsymbol{p}_{1m}) \\ \hat{f}(\boldsymbol{p}_{12}) & \hat{f}(\boldsymbol{p}_{22}) & \cdots & \hat{f}(\boldsymbol{p}_{2m}) \\ \vdots & \vdots & \ddots & \vdots \\ \hat{f}(\boldsymbol{p}_{1m}) & \hat{f}(\boldsymbol{p}_{2m}) & \cdots & \hat{f}(\boldsymbol{p}_{mm}) \end{matrix} \right] \right)$$

But this requires $\frac{m(m+1)}{2}$ calls to the inner solver, which requires hyperdual factorizations, and fine-tuned non-real tolerances!

# What is the F-1 algorithm¶

### What does it do?¶

The F-1 algorithm allows you to calculate the gradient and Hessian of an objective function, $\hat{f}(\boldsymbol{p})$, defined implicitly by the solution of a steady-state problem.

### How does it work?¶

It leverages analytical shortcuts, combined with dual and hyperdual numbers, to avoid calls to the inner solver and unnecessary factorizations.

Differentiate the objective function $\hat{f}(\boldsymbol{p}) = f\left(\boldsymbol{s}(\boldsymbol{p}), \boldsymbol{p}\right)$:

$$\color{forestgreen}{\underbrace{\nabla\hat{f}(\boldsymbol{p})}_{1 \times m}} = \color{royalblue}{\underbrace{\nabla_\boldsymbol{x}f(\boldsymbol{s}, \boldsymbol{p})_{\vphantom{\boldsymbol{p}}}}_{1 \times n}} \, \color{red}{\underbrace{\nabla \boldsymbol{s}(\boldsymbol{p})_{\vphantom{\boldsymbol{p}}}}_{n \times m}} + \color{DarkOrchid}{\underbrace{\nabla_\boldsymbol{p} f(\boldsymbol{s}, \boldsymbol{p})}_{1 \times m}}$$

Differentiate the steady-state equation, $\boldsymbol{F}\left(\boldsymbol{s}(\boldsymbol{p}),\boldsymbol{p}\right)=0$:

$$\color{royalblue}{\underbrace{\mathbf{A}_{\vphantom{\boldsymbol{p}}}}_{n \times n}} \, \color{red}{\underbrace{\nabla\boldsymbol{s}(\boldsymbol{p})_{\vphantom{\boldsymbol{p}}}}_{n \times m}} + \color{forestgreen}{\underbrace{\nabla_\boldsymbol{p} \boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p})}_{n\times m}} = 0$$

where $\mathbf{A} \equiv \nabla_\boldsymbol{x}\boldsymbol{F}(\boldsymbol{s},\boldsymbol{p})$ is the Jacobian of the steady-state system (a large, sparse matrix)

### Analytical Hessian¶

Differentiate $\hat{f}(\boldsymbol{p}) = f\left(\boldsymbol{s}(\boldsymbol{p}), \boldsymbol{p}\right)$ twice:

$$\begin{split} \nabla^2 \hat{f}(\boldsymbol{p}) &= \nabla_{\boldsymbol{x}\boldsymbol{x}}f(\boldsymbol{s}, \boldsymbol{p}) \, (\nabla\boldsymbol{s} \otimes \nabla\boldsymbol{s}) + \nabla_{\boldsymbol{x}\boldsymbol{p}}f(\boldsymbol{s}, \boldsymbol{p}) \, (\nabla\boldsymbol{s} \otimes \mathbf{I}_\boldsymbol{p}) \\ &\quad+ \nabla_{\boldsymbol{p}\boldsymbol{x}}f(\boldsymbol{s}, \boldsymbol{p}) \, (\mathbf{I}_\boldsymbol{p} \otimes \nabla\boldsymbol{s}) + \nabla_{\boldsymbol{p}\boldsymbol{p}}f(\boldsymbol{s}, \boldsymbol{p}) \, (\mathbf{I}_\boldsymbol{p} \otimes \mathbf{I}_\boldsymbol{p}) \\ &\quad+ \nabla_\boldsymbol{x}f(\boldsymbol{s}, \boldsymbol{p}) \, \nabla^2\boldsymbol{s}, \end{split}$$

Differentiate the steady-state equation, $\boldsymbol{F}\left(\boldsymbol{s}(\boldsymbol{p}),\boldsymbol{p}\right)=0$, twice:

$$\begin{split} 0 & = \nabla_{\boldsymbol{x}\boldsymbol{x}}\boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p}) \, (\nabla\boldsymbol{s} \otimes \nabla\boldsymbol{s}) + \nabla_{\boldsymbol{x}\boldsymbol{p}}\boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p}) \, (\nabla\boldsymbol{s} \otimes \mathbf{I}_\boldsymbol{p})\\ & \quad + \nabla_{\boldsymbol{p}\boldsymbol{x}}\boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p}) \, (\mathbf{I}_\boldsymbol{p} \otimes \nabla\boldsymbol{s}) + \nabla_{\boldsymbol{p}\boldsymbol{p}}\boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p}) \, (\mathbf{I}_\boldsymbol{p} \otimes \mathbf{I}_\boldsymbol{p}) \\ & \quad + \mathbf{A} \, \nabla^2\boldsymbol{s}. \end{split}$$

(Tensor notation of Manton [2012])

1. Find the steady state solution $\boldsymbol{s}(\boldsymbol{p})$

2. Factorize the Jacobian $\mathbf{A} = \nabla_\boldsymbol{x} \boldsymbol{F}\left(\boldsymbol{s}(\boldsymbol{p}), \boldsymbol{p}\right)$ ($1$ factorization)

3. Compute $\nabla\boldsymbol{s}(\boldsymbol{p}) = -\mathbf{A}^{-1} \nabla_\boldsymbol{p} \boldsymbol{F}(\boldsymbol{s}, \boldsymbol{p})$ ($m$ forward and back substitutions)

$$\nabla\hat{f}(\boldsymbol{p}) = \nabla_\boldsymbol{x}f(\boldsymbol{s}, \boldsymbol{p}) \, \nabla \boldsymbol{s}(\boldsymbol{p}) + \nabla_\boldsymbol{p} f(\boldsymbol{s}, \boldsymbol{p})$$

5. Compute the Hessian ($1$ forward and back substitution)

$$[\nabla^2\hat{f}(\boldsymbol{p})]_{jk} = \mathfrak{H}\big[f(\boldsymbol{x}_{jk}, \boldsymbol{p}_{jk})\big] - \mathfrak{H}\big[\boldsymbol{F}(\boldsymbol{x}_{jk}, \boldsymbol{p}_{jk})^\mathsf{T}\big] \, \mathbf{A}^{-\mathsf{T}} \,\nabla_\boldsymbol{x}f(\boldsymbol{s},\boldsymbol{p})^\mathsf{T}$$

where $\boldsymbol{x}_{jk} \equiv \boldsymbol{s} + \varepsilon_1 \nabla\boldsymbol{s} \, \boldsymbol{e}_j +\varepsilon_2 \nabla\boldsymbol{s} \, \boldsymbol{e}_k$

# Demo¶

Let us first quickly build a model using the AIBECS

This will create $\boldsymbol{F}(\boldsymbol{x},\boldsymbol{p})$ and $f(\boldsymbol{x},\boldsymbol{p})$ and some derivatives, such that we have an expensive objective function, $\hat{f}(\boldsymbol{p})$, defined implicitly by the solution $\boldsymbol{s}(\boldsymbol{p})$.

Below I will skip the details of the model but I will run the code that generates F, ∇ₓF, f, and ∇ₓf.

### Simple phosphorus-cycling model:

- Dissolved inorganic phosphorus (DIP)
- Particulate organic phosphorus (POP)

In [8]:
const wet3D, grd, T_OCIM = OCIM0.load()
T_DIP(p) = T_OCIM
const ztop = vector_of_top_depths(wet3D, grd) .|> ustrip
const iwet, v = findall(vec(wet3D)), vector_of_volumes(wet3D, grd) .|> ustrip
const nb, z = length(iwet), grd.depth_3D[iwet] .|> ustrip
const DIV, Iabove = buildDIV(wet3D, iwet, grd), buildIabove(wet3D, iwet)
const S₀, S′ = buildPFD(ones(nb), DIV, Iabove), buildPFD(ztop, DIV, Iabove)
T_POP(p) = p.w₀ * S₀ + p.w′ * S′
function G_DIP!(dx, DIP, POP, p)
τ, k, z₀, κ, xgeo, τgeo = p.τ, p.k, p.z₀, p.κ, p.xgeo, p.τgeo
dx .= @. (xgeo - DIP) / τgeo - (DIP ≥ 0) / τ * DIP^2 / (DIP + k) * (z ≤ z₀) + κ * POP
end
function G_POP!(dx, DIP, POP, p)
τ, k, z₀, κ = p.τ, p.k, p.z₀, p.κ
dx .= @. (DIP ≥ 0) / τ * DIP^2 / (DIP + k) * (z ≤ z₀) - κ * POP
end
const iDIP = 1:nb
const iPOP = nb .+ iDIP
t = empty_parameter_table()
add_parameter!(t, :xgeo, 2.17u"mmol/m^3", optimizable = true)
add_parameter!(t, :k, 5.0u"μmol/m^3", optimizable = true)
add_parameter!(t, :w₀, 0.5u"m/d", optimizable = true)
add_parameter!(t, :w′, 0.1u"1/d", optimizable = true)
add_parameter!(t, :κ, 0.3u"1/d", optimizable = true)
add_parameter!(t, :τ, 100.0u"d", optimizable = true)
initialize_Parameters_type(t, "Pcycle_Parameters")
p = Pcycle_Parameters()
F!, ∇ₓF = inplace_state_function_and_Jacobian((T_DIP,T_POP), (G_DIP!,G_POP!), nb)
x = p.xgeo * ones(2nb)
const μDIPobs3D, σ²DIPobs3D = WorldOceanAtlasTools.fit_to_grid(grd, 2018, "phosphate", "annual", "1°", "an")
const μDIPobs, σ²DIPobs = μDIPobs3D[iwet], σ²DIPobs3D[iwet]
const μx, σ²x = (μDIPobs, missing), (σ²DIPobs, missing)
const ωs, ωp = [1.0, 0.0], 1e-4
f, ∇ₓf, ∇ₚf = generate_objective_and_derivatives(ωs, μx, σ²x, v, ωp, mean_obs(p), variance_obs(p))
F(x::Vector{Tx}, p::Pcycle_Parameters{Tp}) where {Tx,Tp} = F!(Vector{promote_type(Tx,Tp)}(undef,length(x)),x,p)

Loading OCIM0.1 ✔

┌ Info: You are about to use OCIM0.1 model.
│ If you use it for research, please cite:
│
│ - Primeau, F. W., Holzer, M., and DeVries, T. (2013), Southern Ocean nutrient trapping and the efficiency of the biological pump, J. Geophys. Res. Oceans, 118, 2547–2564, doi:10.1002/jgrc.20181.
│ - DeVries, T. and F. Primeau, 2011: Dynamically and Observationally Constrained Estimates of Water-Mass Distributions and Ages in the Global Ocean. J. Phys. Oceanogr., 41, 2381–2401, doi:10.1175/JPO-D-10-05011.1
│
│ You can find the corresponding BibTeX entries in the CITATION.bib file
│ at the root of the AIBECS.jl package repository.
│ (Look for the "DeVries_Primeau_2011" and "Primeau_etal_2013" keys.)
└ @ AIBECS.OCIM0 /Users/benoitpasquier/.julia/dev/AIBECS/src/OCIM0.jl:53

Registering WOA data with DataDeps
Rearranging data
Filtering data
Averaging data over each grid box
Setting μ = 0 and σ² = ∞ where no obs
Setting a realistic minimum for σ²

Out[8]:
F (generic function with 1 method)

Tracer equations:

$$\left[\frac{\partial}{\partial t} + \nabla \cdot (\boldsymbol{u} + \mathbf{K}\cdot\nabla )\right] x_\mathsf{DIP} = -U(x_\mathsf{DIP}) + R(x_\mathsf{POP})$$$$\left[\frac{\partial}{\partial t} + \nabla \cdot \boldsymbol{w}\right] x_\mathsf{POP} = U(x_\mathsf{DIP}) - R(x_\mathsf{POP})$$
• DIP→POP: $U=\frac{x_\mathsf{DIP}}{\tau} \, \frac{x_\mathsf{DIP}}{x_\mathsf{DIP} + k} \, (z < z_0)$

• POP→DIP: $R = \kappa \, x_\mathsf{POP}$

### 6 parameters¶

In [9]:
p

Out[9]:
  xgeo = 2.17e+00 [mmol m⁻³]
τgeo = 1.00e+00 [Myr] (fixed)
k = 5.00e+00 [μmol m⁻³]
z₀ = 8.00e+01 [m] (fixed)
w₀ = 5.00e-01 [m d⁻¹]
w′ = 1.00e-01 [d⁻¹]
κ = 3.00e-01 [d⁻¹]
τ = 1.00e+02 [d]

Pcycle_Parameters{Float64}

In [10]:
p[:]

Out[10]:
6-element Array{Float64,1}:
0.00217
4.9999999999999996e-6
5.787037037037037e-6
1.1574074074074074e-6
3.472222222222222e-6
8.64e6               

# Using the F-1 algorithm is easy¶

In [11]:
using F1Method
mem = F1Method.initialize_mem(x, p)
objective(p) = F1Method.objective(f, F, ∇ₓF,      mem, p, CTKAlg(); preprint=" ")
hessian(p)   =   F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, CTKAlg(); preprint=" ")

Out[11]:
hessian (generic function with 1 method)

That's it, we have defined functions that "autodifferentiate" through the iterative solver!

# Testing the F-1 algorithm¶

We simply call the objective,

In [12]:
objective(p)

 (No initial Jacobian factors fed to Newton solver)
Solving F(x) = 0 (using Shamanskii Method)
│   iteration     |F(x)|   |δx|/|x|   Jac age   fac age
│       0        5.5e-06
│       1        4.1e-12    7.7e-01        1         1
│       2        2.7e-13    1.0e-05        2         2
│       3        2.2e-14    3.7e-07        3         3
│       4        2.0e-15    3.2e-08        4         4
│       5        2.0e-16    3.0e-09        5         5
│       6        2.1e-17    3.1e-10        6         6
│       7        2.3e-18    3.3e-11        7         7
│       8        2.6e-19    4.3e-12        8         8
│       9        3.0e-20    3.5e-12        9         9
└─> Newton has converged, |x|/|F(x)| = 1e+12 years

Out[12]:
0.009662688019579478

In [13]:
gradient(p)

Out[13]:
1×6 Array{Float64,2}:
-15.9279  9.05772  -37.5944  -9661.29  3326.29  1.13229e-10

and the Hessian,

In [14]:
hessian(p)

Out[14]:
6×6 Array{Float64,2}:
1.93635e5    -7994.37        …  -4.39396e6   -1.04921e-6
-7994.37             3.78521e6       1.59453e6    9.90002e-7
1.32924e5   -98274.9             7.01703e5   -3.56705e-6
1.25173e7       -4.29221e6       2.42006e9   -0.000106045
-4.39396e6        1.59453e6      -1.75386e9    4.12933e-5
-1.04921e-6       9.90002e-7  …   4.12933e-5   2.87856e-17

And do it again after updating the parameters, this time also recording the time spent, for the objective,

In [15]:
@time objective(1.1p)

 (No initial Jacobian factors fed to Newton solver)
Solving F(x) = 0 (using Shamanskii Method)
│   iteration     |F(x)|   |δx|/|x|   Jac age   fac age
│       0        4.7e-09
│       1        3.8e-12    9.9e-02        1         1
│       2        2.9e-13    4.9e-06        2         2
│       3        3.0e-14    4.5e-07        3         3
│       4        3.3e-15    4.8e-08        4         4
│       5        3.7e-16    5.4e-09        5         5
│       6        4.3e-17    6.2e-10        6         6
│       7        4.9e-18    7.1e-11        7         7
│       8        5.6e-19    8.1e-12        8         8
│       9        6.4e-20    1.9e-12        9         9
│      10        7.4e-21    7.9e-13       10        10
└─> Newton has converged, |x|/|F(x)| = 4.5e+12 years
38.230818 seconds (98.39 k allocations: 3.182 GiB, 5.85% gc time)

Out[15]:
0.010696151016861694

In [16]:
@time gradient(1.1p)

 32.789312 seconds (2.02 k allocations: 3.037 GiB, 6.97% gc time)

Out[16]:
1×6 Array{Float64,2}:
25.0791  9.91269  -8.38374  -6268.7  2153.61  -6.97202e-11

and the Hessian,

In [17]:
@time hessian(1.1p)

 18.919829 seconds (5.66 k allocations: 8.197 GiB, 42.01% gc time)

Out[17]:
6×6 Array{Float64,2}:
1.92817e5   -7210.06         1.37981e5   …  -4.75529e6    -1.06793e-6
-7210.06            2.6767e6    -1.14666e5       2.31281e6     1.09749e-6
1.37981e5      -1.14666e5    2.66979e6      -1.3653e7     -7.11517e-6
1.3576e7       -6.36511e6    5.19961e7       3.78458e8    -0.000387782
-4.75529e6       2.31281e6   -1.3653e7       -6.57363e8     0.000141119
-1.06793e-6      1.09749e-6  -7.11517e-6  …   0.000141119   7.76363e-17

Factorizations are the bottleneck

In [18]:
@time factorize(∇ₓF(mem.s, mem.p))

 24.525396 seconds (530.16 k allocations: 2.183 GiB, 2.74% gc time)

Out[18]:
UMFPACK LU Factorization of a (382338, 382338) sparse matrix
Ptr{Nothing} @0x00007febc64858f0
In [19]:
length(p), length(p)^2 * 20.313170 * u"s" |> u"minute"

Out[19]:
(6, 12.187902000000001 minute)

# Benchmark the F-1 algorithm(for a full optimization run)

Algorithm Definition Factorizations
F-1 F-1 algorithm $\mathcal{O}(1)$
DUAL Dual step for Hessian + Analytical gradient $\mathcal{O}(m)$
COMPLEX Complex step for Hessian + Analytical gradient $\mathcal{O}(m)$
FD1 Finite differences for Hessian + Analytical gradient $\mathcal{O}(m)$
HYPER Hyperdual step for Hessian and dual step for gradient $\mathcal{O}(m^2)$
FD2 Finite differences for Hessian and gradient $\mathcal{O}(m^2)$

# Conclusions

The F-1 algorithm is

• easy to use and understand
• accurate (machine-precision)
• fast!

Check it on Github
at briochemc/F1Method.jl!