---
jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.15.2
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Lecture 17: Quasi-Newton methods

## Approximation of the derivative

-   The formula $x^{(i+1)} = x^{(i)} - \frac{f(x^{(i)})}{f'(x^{(i)})}$ requires that we are able to compute an expression for the derivative of $f(x)$.

-   This may not always be possible:

    -   the function $f(x)$ may be a "black box";
    -   the formula for $f(x)$ may be known but may be difficult for us to differentiate.

-   We can modify Newton's method by simply approximating $f'(x^{(i)})$, rather like we approximated $y'(t)$ when solving a differential equation.

### Approximation of $f'(x)$

-   Recall that $f'(x) = \lim_{\mathrm{d}x \to 0} \frac{f(x + \mathrm{d}x) - f(x)}{\mathrm{d}x}$.

-   Hence we can choose a small value for $\mathrm{d}x$ (how small?) and approximate:

    $$
    f'(x) \approx \frac{f(x + \mathrm{d}x) - f(x)}{\mathrm{d}x}.
    $$

-   This modified-Newton method then becomes

    $$
    x^{(i+1)} = x^{(i)} - \frac{\mathrm{d}x \times f(x^{(i)})}{f(x^{(i)} + \mathrm{d}x) - f(x^{(i)})}.
    $$

### The choice of $\mathrm{d}x$

-   Note that the computational cost of calculating each iterate is about the same as for Newton's method.

-   What is not obvious however is what choice should be made for the value of $\mathrm{d}x$.

-   In theory (exact arithmetic) the smaller the choice of $\mathrm{d}x$ the better the approximation of the derivative.

-   In practice, however, we know that the operation of subtracting two very similar values from each other can lead to significant rounding errors when using floating point arithmetic.

-   Consider the following example...

## Problems with floating point arithmetic

-   When $f(x) = x^3$ then $f'(x) = 3 x^2$.

-   Hence, at $x_0 = 1$, $f(x_0) = 1$ and $f'(x_0) = 3$.

-   Consider what happens when we approximate this with python, using finite values for $\mathrm{d}x$.

```{code-cell} ipython3
:tags: [remove-input]

def f(x):
    return x**3


def df(x):
    return 3 * x**2


x = 1

headers = ["dx", "approx", "abs error", "rel error"]
data = []

for e in range(4, 18, 2):
    dx = 10**-e
    approx = (f(x + dx) - f(x)) / dx
    exact = df(x)
    abs_error = abs(exact - approx)
    rel_error = abs(exact - approx) / exact

    new_data = [dx, approx, abs_error, rel_error]
    data.append(new_data)

import pandas as pd

df = pd.DataFrame(data, columns=headers)
df.style.format(
    formatter={"dx": "{:e}", "approx": "{:f}", "abs error": "{:e}", "rel error": "{:e}"}
).hide_index().set_caption(
    "Simple approximation of a derivative using floating point arithmetic"
)
```

## Modified Newton's method

-   Recall the definition of machine precision/unit roundoff from Lecture 3.

-   The modified Newton method uses $\mathrm{d}x = \sqrt{eps}$.

- For double precision we have

```{code-cell} ipython3
import numpy as np

eps = np.finfo(float).eps
dx = np.sqrt(eps)

x0 = 1.0
df_approx = ((x0 + dx) ** 3 - x0**3) / dx
abs_error = abs(df_approx - 3)
rel_error = abs_error / 3

print(f"{dx=}\n{df_approx=}\n{abs_error=}\n{rel_error=}")
```

### Examples

- The example to compute the square root of 2 to a tolerance of $10^{-12}$ with starting value $1$ gives $x^* \approx 1.4142135623731$ after 5 iterations.

- The naca0012 example starting at 1 with tolerance $10^{-4}$ gives $x^* \approx 0.76579$ after 2 iterations.

- The naca0012 example starting at 1 with tolerance $10^{-5}$ gives $x^* \approx 0.765239$ after 3 iterations.

- The naca0012 example starting at 0.1 with tolerance $10^{-4}$ gives $x^* \approx 0.03386$ after 5 iterations.

In each case the performance is almost identical to the Newton method.

## The secant method

-   Suppose we choose $\mathrm{d}x = x^{(i-1)} - x^{(i)}$, then we get

    $$
    f'(x^{(i)}) \approx \frac{f(x^{(i)} + \mathrm{d}x) - f(x^{(i)})}{\mathrm{d}x}
    = \frac{f(x^{(i)}) - f(x^{(i-1)})}{x^{(i)} - x^{(i-1)}}.
    $$

-   Newton's method then becomes

    $$
    x^{(i+1)} = x^{(i)} - f(x^{(i)}) \frac{x^{(i)} - x^{(i-1)}}{f(x^{(i)}) - f(x^{(i-1)})}
    \left(\approx x^{(i)} - \frac{f(x^{(i)})}{f'(x^{(i)})} \right).
    $$

-   Note that the main advantage of this approach over the previous modified Newton approximation is that only one new evaluation of $f(x)$ is required per iteration (apart from the very first iteration).

### Pros and cons

Advantages:

-   $f'(x)$ is not required;
-   only one new evaluation of $f(x)$ per iteration;
-   still converges almost as quickly as Newton's method.

Disadvantages:

-   *two* starting values, $x^{(0)}$ and $x^{(1)}$, are required;
-   may require one more iteration than exact Newton (but iterations are cheaper);
-   like Newton's method the iteration can fail to converge for some starting iterates.

### Examples

- The example to compute the square root of 2 starting from 1 and 1.5 to a tolerance of $10^{-4}$ gives the solution as $x^* \approx 1.4142$ after 3 iterations.

- The example to computation compound interest start from 100 and 120 to a tolerance of 0.1 gives the solution as $x^* \approx 235.9$ after 6 iterations.

- The naca0012 example starting from 1 and 0.9 to a tolerance of $10^{-4}$ gives the solution as $x^* \approx 0.7653$ after 3 iterations.

- The naca0012 example starting from 0 and 0.1 to a tolerance of $10^{-4}$ gives the solution as $x^* \approx 0.0339$ after 5 iterations.

## Further reading

- Wikipedia: [Quasi-Newton method](https://en.wikipedia.org/wiki/Quasi-Newton_method)
- Wikipedia: [Secant method](https://en.wikipedia.org/wiki/Secant_method)
- `scipy`: Optimization and root finding [`scipy.optimize`](https://docs.scipy.org/doc/scipy/reference/optimize.html)

The [slides used in the lecture](./lec17_.ipynb) are also available