In [ ]:

# WARNING: advised to install a specific version, e.g. tensorwaves==0.1.2
%pip install -q tensorwaves[doc,jax,pwa,viz] IPython

In [ ]:

import os

STATIC_WEB_PAGE = {"EXECUTE_NB", "READTHEDOCS"}.intersection(os.environ)

{autolink-concat}

Chi-squared estimator¶

One of the {class}.Estimators in the {mod}.estimator module is {class}.ChiSquared. This estimator is useful if you have a set of $n$ measured values $\mathbf{y}=\left\{y_1, y_2, \dots, y_n\right\}$ for $m$-dimensional data points $\mathbf{x}=\left\{x_{j,1}, x_{j,2}, \dots, x_{j,n}\right\}$, with $j\in\left\{1, \dots, m\right\}$.

To illustrate how to fit an expression to some data sample $\mathbf{x},\mathbf{y}$, we'll generate 'observed' values $\mathbf{y}$ for the following 1-dimensional polynomial.

In [ ]:

import sympy as sp

a, b, c, x = sp.symbols("a b c x")
expression = a + b * x + c * x**2
expression

From this expression, we create a {class}.ParametrizedFunction, where the symbols $a,b,c$ are interpreted as its parameters. The values here are chosen arbitrarily to generate the $\mathbf{y}$-values in the next step.

In [ ]:

from tensorwaves.function.sympy import create_parametrized_function

function = create_parametrized_function(
    expression,
    parameters={a: 17, b: -2, c: -0.8},
    backend="jax",
)

Next, we uniformly generate data points $\mathbf{x}$ a line segment. The corresponding values for $\mathbf{y}$ are computed with the above polynomial function and smeared with a small, normally distributed offset.

In [ ]:

import numpy as np


def smear_gaussian(array, sigma, rng):
    return array + rng.normal(scale=sigma, size=len(array))


rng = np.random.default_rng(seed=0)
sample_size = 500
x_min, x_max = -5, +5
x_values = rng.uniform(x_min, x_max, sample_size)
data = {"x": x_values}
observed_y = function(data)
observed_y = smear_gaussian(observed_y, sigma=5, rng=rng)

To make the fit a bit more interesting, we give the {attr}~.ParametrizedFunction.parameters a different value than the ones we used to generate the $\mathbf{y}$-values with.

In [ ]:

original_parameters = function.parameters
initial_parameters = {"a": -25, "b": 1.5, "c": 2.6}
function.update_parameters(initial_parameters)

In [ ]:

%config InlineBackend.figure_formats = ['svg']

import matplotlib.pyplot as plt


def compare_model(function, x_values, observed_y):
    _, ax = plt.subplots(figsize=(8, 4))
    linear_domain = {"x": np.linspace(x_min, x_max, 100)}
    ax.plot(
        linear_domain["x"],
        function(linear_domain),
        c="red",
        linewidth=3,
        label="initial fit model",
    )
    ax.scatter(x_values, observed_y, s=2, label="generated data")
    ax.set_xlabel("$x$")
    ax.set_ylabel("$y$")
    ax.set_ylim([-30, 50])
    ax.legend(loc="upper left")
    plt.show()


compare_model(function, x_values, observed_y)

Finally, we construct a {class}.ChiSquared estimator and use it to optimize the {class}.ParametrizedFunction with regard to the $\mathbf{x},\mathbf{y}$ data points.

In [ ]:

from tensorwaves.estimator import ChiSquared
from tensorwaves.optimizer import Minuit2

estimator = ChiSquared(function, data, observed_y, backend="jax")
optimizer = Minuit2()
fit_result = optimizer.optimize(estimator, initial_parameters)
fit_result

In [ ]:

assert fit_result.minimum_valid

The optimized parameters in the {class}.FitResult are indeed comparable to the original parameter values with which the data was generated:

In [ ]:

original_parameters

In [ ]:

compare_model(function, x_values, observed_y)