Author: Yannick Copin y.copin@ipnl.in2p3.fr $ \renewcommand{\d}{\mathrm{d}} \renewcommand{\v}[1]{\boldsymbol{#1}} % Vector \renewcommand{\m}[1]{\mathsf{#1}} % Matrix \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\argmin}{argmin} $
Abstract: When computing the mean value of a measurement sample, standard supposedly-optimal inverse-variance weighted-mean usually omits the presence of intrinsic dispersion. On the other hand, various methods have been used to compute intrinsic dispersion a posteriori from $\chi^2$ statistic. I present unified mean and intrinsic dispersion maximum-likelihood estimates.
Changelog: 21/12/28: Ported to Python 3.x, iminuit 2.x
Online versions:
import numpy as N
print("Numpy", N.__version__)
import scipy as S
print("Scipy", S.__version__)
import matplotlib.pyplot as P
print("Matplotlib", P.matplotlib.__version__)
import iminuit
print("iminuit v", iminuit.__version__)
!date
Numpy 1.21.4 Scipy 1.7.2 Matplotlib 3.5.0 iminuit v 2.8.4 mar. 28 déc. 2021 19:16:45 CET
%matplotlib inline
P.rcParams['figure.figsize'] = 12, 9
import scipy.stats as SS
import scipy.optimize as SO
Let $\v{x}$ be a collection of independent observations derived from (unknown) normal population $\cal{N}(\mu, \sigma^2)$ with (known) normal observational errors $\d\v{x}$. The underlying distribution of each observation $x_i$ is therefore normal $\cal{N}(\mu, \sigma^2 + \d x_i^2)$, and the likelihood $\cal{L}$ of the observations is defined as: $$\begin{align} \cal{L} &= \prod_i \frac{1}{\sqrt{2\pi(\sigma^2 + \d x_i^2})}\exp\left(-\frac{1}{2}\frac{(x_i - \mu)^2}{\sigma^2 + \d x_i^2}\right) \\ -2\ln\cal{L} &\equiv \sum_i \frac{(x_i - \mu)^2}{\sigma^2 + \d x_i^2} + \sum_i \ln(\sigma^2 + \d x_i^2) \\ &= \chi^2(\mu,\sigma) + \delta\chi^2(\sigma) \\ &= \chi_T^2(\mu,\sigma) \end{align}$$ Minimizing the $\chi^2$ against $\mu$ leads to the traditional variance-weighted average maximum-likelihood (ML) estimate of $\mu$: $$ \frac{\partial\chi^2}{\partial\mu} = 0 \quad\text{for}\quad \hat{\mu} = \frac{1}{\sum_i w_i} \sum_i w_i x_i \quad\text{with}\quad w_i = \frac{1}{(\sigma^2 + \d x_i^2)^{2}} $$ where the variance weighting includes the yet-unknown intrinsic dispersion $\sigma$. Unfortunately, the expression cannot be minimized analytically against $\sigma$, but the numerical minimization is easy: $$ (\hat{\mu},\hat{\sigma}) = \argmin_{\mu,\sigma} \chi_T^2(\mu,\sigma) $$
Note that in the error-free case ($\d\v{x}=\v{0}$), one correctly retrieves: $$\begin{align} \hat{\mu} &= \frac{1}{N}\sum_i x_i \\ \hat{\sigma}^2 &= \frac{1}{N}\sum_i (x_i - \hat{\mu})^2 \end{align}$$ the latter being the (biased) ML-estimator of the variance of $\v{x}$ (see Oliphant, 2006).
Note furthermore that the partial $\chi^2$ as defined above properly follows a $\chi^2$-distribution with $N-2$ degrees of freedom, and provides therefore a way to assert goodness-of-fit of the model, in the present case the validity of the errors $\d\v{x}$. On the contrary, the total $\chi_T^2$ which is minimized does not follow a $\chi^2$-distribution.
def chi2(mu, sig, x, dx):
return N.sum((x - mu)**2 / (dx**2 + sig**2)) # Standard chi2
def delta_chi2(sig, dx):
return N.sum(N.log(dx**2 + sig**2)) # Complementary term
def chi2total(mu, sig, x, dx):
return chi2(mu, sig, x, dx) + delta_chi2(sig, dx)
For normally distributed $\mathcal{N}(\mu,\sigma^2)$ measurements $\v{x}$ with correlated errors described by covariance matrix $\m{\Sigma}$, the likelihood can be written from the total covariance matrix $\m{\Sigma} + \m{1}\sigma^2$: $$\begin{align} \mathcal{L} &= (2\pi)^{-N/2}|\m{\Sigma} + \m{1}\sigma^2|^{-1/2}\,\exp\left(-\frac{1}{2}(\v{x}-\mu)^{T}(\m{\Sigma} + \m{1}\sigma^2)^{-1}(\v{x}-\mu)\right), \\ -2\ln\mathcal{L} &\equiv (\v{x}-\mu)^{T}(\m{\Sigma} + \m{1}\sigma^2)^{-1}(\v{x}-\mu) + \ln|\m{\Sigma} + \m{1}\sigma^2| \\ &= \chi^2(\mu,\sigma) + \delta\chi^2(\sigma) \\ &= \chi_T^2(\mu,\sigma) \end{align}$$
The x dataset is defined as originating from a single null value observed with log-normally distributed errors dx, therefore µ=0 and σ=0 (no intrinsic dispersion).
The y dataset is defined as originating from normal population (µ=0, intrinsic dispersion σ=2), observed with the same log-normally distributed errors dx.
n = 500 # Sample size
mu0 = 0 # Intrinsic mean
sig0 = 2 # Intrinsic scatter
N.random.seed(1234)
dx = N.sqrt(N.random.lognormal(sigma=1, size=n)) # Log-normally distributed uncertainties
x = N.random.normal(loc=mu0, scale=dx) # x: No intrinsic dispersion: mu=mu0, sig=0
y = x + N.random.randn(n)*sig0 # y: Gaussian uncertainties + intrinsic dispersion
fig, ax = P.subplots(1, 1)
ax.errorbar(N.arange(n), x, yerr=dx, fmt='b.', label='x: no intrinsic dispersion')
ax.errorbar(N.arange(n), y, yerr=dx, fmt='r.', label='y: x + intrinsic dispersion')
#ax.hist(x, bins='auto', orientation='horizontal', histtype='stepfilled', color='b', alpha=0.5)
#ax.hist(y, bins='auto', orientation='horizontal', histtype='step', color='r')
ax.legend();
mup = N.average(y) # Sample mean
var = N.var(y, ddof=1) # Sample variance
dvar2 = 2 * var**2 / len(y) # Error on var**2 in normal approximation
std = N.sqrt(var) # Sample stddev
dmup = std/N.sqrt(len(y)) # Error on sample mean
dstd = N.sqrt(dvar2)/(2*std) # Error on stddev in normal approximation
print(f"Plain: µ = {mup:.3f} ± {dmup:.3f}, σ = {std:.3f} ± {dstd:.3f}")
Plain: µ = 0.093 ± 0.111, σ = 2.472 ± 0.078
muw, sumw = N.average(y, weights=dx**(-2), returned=True) # Optimally weighted mean
dmuw = sumw**(-0.5) # Uncertainty on weighted mean
chi2w = chi2(muw, 0, y, dx)
pw = SS.distributions.chi2.sf(chi2w, len(x) - 1) # SS.chisqprob deprecated in v0.17.0
print(f"Weighted, NO intrinsic: µ = {muw:.3f} ± {dmuw:.3f} (χ²/DoF={chi2w:.2f}/{len(x)-1}, p={pw:.1%})")
Weighted, NO intrinsic: µ = 0.148 ± 0.035 (χ²/DoF=3602.66/499, p=0.0%)
mus = N.linspace(-1, +1, 21)
sigs = N.linspace(0, 4, 41)
chi2ts = N.array([ [ chi2(mu,sig, y, dx) for mu in mus ] for sig in sigs ])
chi2ds = N.array([ [ delta_chi2(sig, dx) for mu in mus ] for sig in sigs ])
chi2s = chi2ts + chi2ds
jmin, imin = N.unravel_index(chi2s.argmin(), chi2s.shape)
print(f"Minimum: χ² = {chi2s.min():.2f} at µ = {mus[imin]:.3f}, σ = {sigs[jmin]:.3f}")
Minimum: χ² = 1381.09 at µ = 0.100, σ = 2.100
res = SO.minimize(lambda p: chi2total(p[0], p[1], y, dx), (y.mean(), y.std()), tol=1e-3)
mu, sig = res.x
dmu, dsig = N.sqrt((2*res.hess_inv).diagonal())
print(f"Scipy: total χ² = {res.fun:.2f} at µ = {mu:.3f} ± {dmu:.3f}, σ = {sig:.3f} ± {dsig:.3f}")
Scipy: total χ² = 1380.98 at µ = 0.067 ± 0.108, σ = 2.087 ± 0.084
Note: the iminuit.Minuit
interface changed between versions 1.x and 2.x; however, the minimize
interface (introduced in v1.3) remained unchanged.
minuit = iminuit.Minuit(lambda mu, sig: chi2total(mu, sig, y, dx), mu=y.mean(), sig=y.std())
minuit.errordef = iminuit.Minuit.LEAST_SQUARES
minuit.migrad()
Migrad | ||||
---|---|---|---|---|
FCN = 1381 | Nfcn = 31 | |||
EDM = 1.59e-06 (Goal: 0.0002) | ||||
Valid Minimum | No Parameters at limit | |||
Below EDM threshold (goal x 10) | Below call limit | |||
Covariance | Hesse ok | Accurate | Pos. def. | Not forced |
Name | Value | Hesse Error | Minos Error- | Minos Error+ | Limit- | Limit+ | Fixed | |
---|---|---|---|---|---|---|---|---|
0 | mu | 0.07 | 0.11 | |||||
1 | sig | 2.09 | 0.08 |
mu | sig | |
---|---|---|
mu | 0.0112 | 2.67e-05 (0.003) |
sig | 2.67e-05 (0.003) | 0.00713 |
mu, sig = minuit.values['mu'], abs(minuit.values['sig'])
dmu, dsig = minuit.errors['mu'], minuit.errors['sig']
print(f"Minuit.migrad: total χ² = {minuit.fval:.2f} at µ = {mu:.3f} ± {dmu:.3f}, σ = {sig:.3f} ± {dsig:.3f}")
Minuit.migrad: total χ² = 1380.98 at µ = 0.067 ± 0.106, σ = 2.088 ± 0.084
res = iminuit.minimize(lambda p: chi2total(p[0], p[1], y, dx), (y.mean(), y.std())) # tol=1e-3
mu, sig = res.x
dmu, dsig = N.sqrt((2*res.hess_inv).diagonal())
print(f"Minuit.minimize: total χ² = {res.fun:.2f} at µ = {mu:.3f} ± {dmu:.3f}, σ = {sig:.3f} ± {dsig:.3f}")
Minuit.minimize: total χ² = 1380.98 at µ = 0.067 ± 0.106, σ = 2.088 ± 0.084
def musig(x, dx, verbose=False):
"""
Compute maximum-likelihood estimate of mean and intrinsic dispersion.
Warning: does not support masked arrays nor invalid measurement errors.
"""
def chi2(mu, sig, x, dx):
return N.sum((x - mu)**2 / (dx**2 + sig**2)) # Standard chi2
def delta_chi2(sig, dx):
return N.sum(N.log(dx**2 + sig**2)) # Complementary term
def chi2total(mu, sig, x, dx):
return chi2(mu, sig, x, dx) + delta_chi2(sig, dx)
# Initial guess
mu = x.mean()
sig = x.std()
minuit = iminuit.Minuit(lambda mu, sig: chi2total(mu, sig, x, dx), mu=mu, sig=sig)
minuit.errordef = iminuit.Minuit.LEAST_SQUARES
minuit.migrad() # Minimization
if not minuit.valid:
raise ValueError("Convergence issue.")
mu, sig = minuit.values['mu'], abs(minuit.values['sig'])
dmu, dsig = minuit.errors['mu'], minuit.errors['sig']
if verbose:
fval = minuit.fval # Minimal total chi2
c1 = chi2(mu, sig, x, dx) # Traditional chi2
c2 = delta_chi2(sig, dx) # Complementary term
dof = len(x) - 2 # DoF
p = SS.distributions.chi2.sf(c1, dof) # p-value of *traditional* chi2
print(f"Minuit: χ² = {c1:.2f}, δχ² = {c2:.2f}"
f" at µ = {mu:.3f} ± {dmu:.3f}, σ = {sig:.3f} ± {dsig:.3f} (DoF={dof}, p={p:.1%})")
return (mu, dmu), (sig, dsig)
fig, ax = P.subplots(1, 1)
cnt = ax.contourf(mus, sigs, chi2ts/(n-2),
cmap='gist_heat_r', levels=N.arange(11)/2, extend='max')
cntf = ax.contour(mus, sigs, chi2s/(n-2),
cmap='gray', levels=N.arange(2.7,3.4,0.1), extend='max')
ax.clabel(cntf, fmt='%.1f')
ax.set(xlabel="µ", ylabel="σ")
cbf = fig.colorbar(cntf)
cbf.ax.set_ylabel('Total χ²/DoF')
cb = fig.colorbar(cnt)
cb.ax.set_ylabel('χ²/DoF')
ax.plot([mu0], [sig0], 'y*', ms=10, label=f"Truth: µ={mu0:.1f}, σ={sig0:.1f}")
ax.errorbar([mup], [std], xerr=[dmup], yerr=[dstd], fmt='rs', ecolor='r',
label=f"Plain: µ={mup:.3f}±{dmup:.3f}, σ={std:.3f}±{dstd:.3f}")
(mu, dmu), (sig, dsig) = musig(y, dx, verbose=True)
ax.errorbar([mu], [sig], xerr=[dmu], yerr=[dsig], fmt='g*', ecolor='g',
label=f"MLE: µ={mu:.3f}±{dmu:.3f}, σ={sig:.3f}±{dsig:.3f}")
ax.legend(loc=2, numpoints=1, frameon=False);
Minuit: χ² = 505.80, δχ² = 875.18 at µ = 0.067 ± 0.106, σ = 2.088 ± 0.084 (DoF=498, p=39.5%)
chi2ts = N.array([ [ chi2(mu,sig, x, dx) for mu in mus ] for sig in sigs ])
chi2ds = N.array([ [ delta_chi2(sig, dx) for mu in mus ] for sig in sigs ])
chi2s = chi2ts + chi2ds
fig, ax = P.subplots(1, 1)
cnt = ax.contourf(mus, sigs, chi2ts/(n-2),
cmap='gist_heat_r', levels=N.arange(0, 2.1, 0.2), extend='max')
cntf = ax.contour(mus, sigs, chi2s/(n-2),
cmap='gray', levels=N.arange(0.8, 2.1, 0.2), extend='max')
ax.clabel(cntf, fmt='%.1f')
ax.set(xlabel="µ", ylabel="σ")
cbf = fig.colorbar(cntf)
cbf.ax.set_ylabel('Total χ²/DoF')
cb = fig.colorbar(cnt)
cb.ax.set_ylabel('χ²/DoF')
ax.plot([mu0], [0], 'y*', ms=10, label=f"Truth: µ={mu0:.1f}, σ={0:.1f}")
mup = N.average(x) # Sample mean
var = N.var(x, ddof=1) # Sample variance
dvar2 = 2*var**2/len(x) # Error on var**2 in normal approximation
std = N.sqrt(var) # Sample stddev
dmup = std/N.sqrt(len(x)) # Error on sample mean
dstd = N.sqrt(dvar2)/(2*std) # Error on stddev in normal approximation
ax.errorbar([mup], [std], xerr=[dmup], yerr=[dstd], fmt='rs', ecolor='r',
label=f"Plain: µ={mup:.3f}±{dmup:.3f}, σ={std:.3f}±{dstd:.3f}")
(mu, dmu), (sig, dsig) = musig(x, dx, verbose=True)
ax.errorbar([mu], [sig], xerr=[dmu], yerr=[dsig], fmt='g*', ecolor='g',
label=f"MLE: µ={mu:.3f}±{dmu:.3f}, σ={sig:.3f}±{dsig:.3f}")
ax.legend(loc=2, numpoints=1, frameon=False);
Minuit: χ² = 486.71, δχ² = 4.87 at µ = -0.001 ± 0.035, σ = 0.000 ± 0.106 (DoF=498, p=63.3%)