This is a tutorial notebook for the `MultiNorm`

class from the `multinorm`

Python package (Github).

- A quick tutorial introduction (~ 10 min read)
- Explain and illustrate some of the theory / statistical aspects
- Hands-on introduction how to work with
`MultiNorm`

class.

Suggestions:

- Read this tutorial and judge if
`multinorm`

is useful for you. - If yes, install it with
`pip install multinorm`

, download this notebook, and execute / play with these examples on your machine. - Learn more at https://multinorm.readthedocs.io (complete description of features and theory).

We will use pandas, so after reading this you will be a data scientist! In the future we will probably change the example model to a deep neural network and use tensorflow, because likelihood fitting is so 1990s.

- TODO: What is MVN (theory)?
- https://en.wikipedia.org/wiki/Multivariate_normal_distribution
- TODO: What is
`MultiNorm`

(code)?

We will be using scipy.optimize.curve_fit to fit a model to data and obtain an example covariance matrix. Note that most analysis problems are not a simple least squares curve fit, but involve custom data and likelihood. Information how to obtain parameter error and covariance estimates with other popular Python packages is described in the `multinorm`

documentation.

Let's start by importing everything we'll use in this notebook.

In [1]:

```
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.optimize
from multinorm import MultiNorm
```

As mentioned in the introduction, the kind of analysis we are considering here is that there is a **model** with **parameters**, as well as some **data** and a **likelihood function**.

We will use the Rat42 dataset from the NIST Standard Reference Database. To quote from here:

This model and data are an example of fitting sigmoidal growth curves taken from Ratkowsky (1983). The response variable is pasture yield, and the predictor variable is growing time.

Did I hear you say

Why? pasture yield? WTF?

Well, the data is small (9 observations x, y) and , and the model is simple (one line in Python) and has 3 parameters, which is nice to illustrate properties of the correlation matrix and multivariate normal distribution. And the fit is using a least square likelihood, which is built-in in `scipy`

and available for pretty much any fitting package, or can be implemented also with one line in Python.

In [2]:

```
data = pd.DataFrame({
'x': [9, 14, 21, 28, 42, 57, 63, 70, 79],
'y': [8.93, 10.80, 18.59, 22.33, 39.35, 56.11, 61.73, 64.62, 67.08],
})
data.T
```

Out[2]:

And having a DOI (http://dx.doi.org/10.18434/T43G6C) and expected reference results for the analysis is also nice. This is the result we should find:

In [3]:

```
expected = pd.DataFrame.from_records(
[
("b1", 7.2462237576e01, 1.7340283401e00),
("b2", 2.6180768402e00, 8.8295217536e-02),
("b3", 6.7359200066e-02, 3.4465663377e-03),
],
columns=["par", "value", "error"],
index='par',
)
expected
```

Out[3]:

Our model is just a simple line.

In [4]:

```
def model(x, b1, b2, b3):
return b1 / (1 + np.exp(b2 - b3 *x))
```

If we fit the model to the data, we obtain a parameter estimate (`popt`

, the "optimial" or "best" parameter values) and a covariance matrix (`pcov`

).

In [5]:

```
p0 = (100, 1, 0.1)
popt, pcov = scipy.optimize.curve_fit(
f=model,
xdata=data["x"],
ydata=data["y"],
p0=p0,
)
```

Let's make a plot to see the best-fit model and the data:

In [6]:

```
plt.plot(data['x'], data["y"], 'o')
x = np.linspace(0, 100)
plt.plot(x, model(x, *popt));
```

Finally, this is where `MultiNorm`

comes in!

In [7]:

```
from multinorm import MultiNorm
```

In [8]:

```
mn = MultiNorm(mean=popt, cov=pcov, names=["b1", "b2", "b3"])
```

In [9]:

```
mn
```

Out[9]:

`MultiNorm`

is very simple, it just stores the `mean`

vector and `cov`

matrix
and exposes them as pandas Series and DataFrame objects, with `name`

as index.

Of course the `mean`

and `cov`

are still available as Numpy arrays, using the `.values`

property of the pandas objects.

In [10]:

```
mn.names
```

Out[10]:

In [11]:

```
mn.mean
```

Out[11]:

In [12]:

```
mn.cov
```

Out[12]:

In the printout of `mn`

above, we saw that in addition to the `names`

, `mean`

and `cov`

we passed in, another property `error`

exists.

This is the vector of "1 sigma" parameter error estimates:

In [13]:

```
mn.error
```

Out[13]:

Usually you would state your measurement giving the estimated value and error for each parameter:

`b1 = 72.46 +/- 1.73`

`b2 = 2.618 +/- 0.088`

`b3 = 0.0673 +/- 0.0034`

The relationship between the covariance matrix and the errors is very simple: the diagonal entries of the covariance matrix contains the errors squared, meaning that you can compute the errors as the square root of the diagonal entries:

In [14]:

```
np.sqrt(pcov.diagonal())
```

Out[14]:

Possibly the property you'll use most is `parameters`

, which is a `pandas.DataFrame`

with `name`

as index, and columns that contain `mean`

and `error`

for all parameters. It has a nice table printout, and you can use the common pandas methods to select rows and columns and entries.

In [15]:

```
mn.parameters
```

Out[15]:

In [16]:

```
pars = mn.parameters
# Access parameter info as Series by index or name
pars.iloc[0]
pars.loc["b1"]
# All the power of pandas at your fingertip
# Just as an example:
pars["mean"].to_frame().T
```

Out[16]:

tbd

In [17]:

```
mn.correlation
```

Out[17]:

In [18]:

```
mn.correlation.loc["b1", "b3"]
```

Out[18]:

In [19]:

```
mn.correlation["b3"].to_frame()
```

Out[19]:

TODO: plotting ellipses with MPL is horrible. Add a plot helper method!

In [20]:

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ellipse = mn.fix("b3").to_matplotlib_ellipse()
ax.add_artist(ellipse)
ax.set_xlabel("b1")
ax.set_ylabel("b2")
# ax.set_xlim(7, 13)
# ax.set_ylim(17, 23)
ax.grid(True)
plt.plot(mn.mean["b1"], mn.mean["b2"])
plt.tight_layout()
plt.show()
```

Let us recap the most important points concerning multivariante normal distributions and the `MultiNorm`

class.

To learn more: https://multinorm.readthedocs.io

- TODO: MVN are ...

- TODO: the
`MultiNorm`

class ...

- TODO

In [ ]:

```
```