Author: Antonino Ingargiola - Date: 07/20/2014
In this notebook we fit a TCSPC histogram with a model functions that includes an exponential decay, an instrument response function (IRF) and uncorrelated background. The IRF is considered fixed (i.e. measured) and is used to deconvolve the data. This kind of fit is commonly performed on fluorescence lifetime experimental curves.
For illustrative purposes, the TCSPC data is contextually simulated by simply extracting random exponential waiting time, adding a random number from the IRF distribution and adding a fraction of uncorrelated events (background). These events are binned in order to obtain a realistic TCSPC histogram. This procedure assures a proper simulation of the Poisson statistics in each bin.
The fitting procedure can use both a non-linear least square and a Maximum Likelihood minimization. In the latter case the log-likelihood function obtained from Poisson statistics is maximized.
Note: This notebook uses the fitting packages lmfit that allows to easily set constrains on the parameters (i.e. fixed, bounded, etc ...). It also performs rigorous error analysis and allows to easily compare different minimization algorithms (other than least squares).
%matplotlib inline
import numpy as np
import scipy
import scipy.stats
from scipy.optimize import minimize
import matplotlib.pyplot as plt
from matplotlib.pyplot import hist, plot
import lmfit
from lmfit import Parameters, report_fit
time_step_s = (60e-9/4096) # time step in seconds (S.I.)
time_step_ns = time_step_s*1e9 # time step in nano-seconds
time_nbins = 2048 # number of time bins
time_idx = np.arange(time_nbins) # time axis in index units
time_ns = time_idx*time_step_ns # time axis in nano-seconds
Let define the Exponentially modified Gaussian distribution representing the convolution of a gaussian and an exponential. This function is used to model the IRF.
$$f(x;\mu,\sigma,\lambda) = \frac{\lambda}{2} e^{\frac{\lambda}{2} (2 \mu + \lambda \sigma^2 - 2 x)} \operatorname{erfc} (\frac{\mu + \lambda \sigma^2 - x}{ \sqrt{2} \sigma})$$def exgauss(x, mu, sig, tau):
lam = 1./tau
return 0.5*lam * np.exp(0.5*lam * (2*mu + lam*(sig**2) - 2*x)) *\
scipy.special.erfc((mu + lam*(sig**2) - x)/(np.sqrt(2)*sig))
We are going to compute the IRF, using exgauss
as a PDF. The following parameters have been fitted from an experimental IRF:
irf_sig = 0.033
irf_mu = 5*irf_sig
irf_lam = 0.1
x_irf = np.arange(0, (irf_lam*15 + irf_mu)/time_step_ns)
p_irf = exgauss(x_irf, irf_mu/time_step_ns, irf_sig/time_step_ns,
irf_lam/time_step_ns)
irf = scipy.stats.rv_discrete(name='irf', values=(x_irf, p_irf))
x_irf.size
114
plot(x_irf, p_irf)
plt.yscale('log')
irf_mu/time_step_ns
11.264000000000001
assert np.allclose(p_irf.sum(), 1)
Finally we need to draw random numbers from the IRF's PDF. To do so, we create a discrete random distribution with scipy:
irf = scipy.stats.rv_discrete(name='irf', values=(x_irf, p_irf))
fig, ax = plt.subplots(1, 2, figsize=(12, 4))
ax[0].plot(x_irf, irf.pmf(x_irf));
ax[0].set_title('IRF PDF')
ax[1].hist(irf.rvs(size=200), bins=x_irf, histtype='step')
ax[1].set_title('Random samples from the IRF distribution');
ax[0].set_yscale('log')
#ax[1].set_yscale('log')
assert (p_irf == irf.pmf(x_irf)).all()
np.random.seed(2)
num_samples = 5e5
tau = 2e-9
baseline_fraction = 0.03
offset = 2e-9
sample_decay = np.random.exponential(scale=tau/time_step_s,
size=num_samples) + offset/time_step_s
sample_decay += irf.rvs(size=num_samples)
sample_baseline = np.random.randint(low=0, high=time_nbins,
size=baseline_fraction*num_samples)
sample_tot = np.hstack((sample_decay, sample_baseline))
decay_hist, bins = np.histogram(sample_tot, bins=np.arange(time_nbins + 1))
baseline_true = baseline_fraction*num_samples / time_nbins
baseline_true
7.32421875
hist_sim_decay, _, _ = hist(sample_tot, bins=np.arange(time_nbins+1), histtype='step');
plt.yscale('log')
(hist_sim_decay > 0).all()
False
We use a single exponential model convoluted with the model-IRF and with the addition of a baseline:
def model(x, tau, ampl, baseline, offset, irf_pdf=irf.pmf(x_irf)):
"""
Model function used to fit the data.
"""
y = np.zeros(x.size)
pos_range = (x - offset) >= 0
y[pos_range] = ampl*np.exp(-(x[pos_range] - offset)/tau)
z = np.convolve(y, irf_pdf, mode='same')
z += baseline
return z
def residuals(params, x, y, weights):
"""
Returns the array of residuals for the current parameters.
"""
tau = params['tau'].value
baseline = params['baseline'].value
offset = params['offset'].value
ampl = params['ampl'].value
return (y - model(x, tau, ampl, baseline, offset))*weights
def plot_fit(time, ydata, params, yscale='log', zoom_origin=False):
"""
Function to plot data and model function.
"""
plot(time, ydata, marker='.')
plot(time, model(time, **{k: v.value for k, v in params.items()}),
color='r', lw=2.5, alpha=0.7)
if yscale == 'log':
plt.yscale('log')
plt.ylim(1)
if zoom_origin:
dt = time[1] - time[0]
t0 = params['offset'] - 50*dt
plt.xlim(t0 - 50*dt, t0 + 50*dt)
Log-likelihood function:
$$p(k) = \frac{\lambda^k}{\Gamma(k+1)}\exp(-\lambda)$$$$\log p(k) = k \log \lambda - \log\Gamma(k+1) -\lambda$$In this context, for each TCSPC bin $i$, we have a Poisson rate $\lambda_i$ function of the model parameters:
$$\lambda_i = \lambda(t_i, \tau, A, t_0, b)$$and a number of counts $y_i$ that are actual the TCSPC bin counts.
def loglike(params, x, ydata):
tau, baseline, offset, ampl = params
ymodel = model(x, tau, ampl, baseline, offset)
return (ymodel - ydata*np.log(ymodel)).sum()
def from_params(params):
return [params[k].value for k in ('tau', 'baseline', 'offset', 'ampl')]
def to_params(p, params):
for v, k in zip(p, ('tau', 'baseline', 'offset', 'ampl')):
params[k].value = v
return params
Weights for proper scaling of the residuals according to the Poisson std.dev.
weights = 1./np.sqrt(decay_hist)
weights[decay_hist == 0] = 1./np.sqrt(baseline_true)
(decay_hist == 0).any()
True
params = Parameters()
params.add('tau', value=2.8)
params.add('baseline', value=5)
params.add('offset', value=2.814, vary=False)
params.add('ampl', value=700)
Check the model with the intial parameters:
plot_fit(time_ns, decay_hist, params)
Now we fit the parameters. To more easily converge to the global minimum we first perform a minimization with the Nelder-Mead method (a.k.a. downhill simplex). After that, we perform a second minimization with non-linear least-squares in order to obtain error estimates.
fit_res = lmfit.minimize(residuals, params, args=(time_ns, decay_hist, weights), method='nelder')
report_fit(fit_res.params)
print 'Reduced Chi-square: %.3f' % fit_res.redchi
[[Variables]] ampl: 3691.373 initial = 700 baseline: 6.317692 initial = 5 offset: 2.814 (fixed) tau: 1.988625 initial = 2.8 [[Correlations]] (unreported correlations are < 0.100) Reduced Chi-square: 1.177
fit_res = lmfit.minimize(residuals, params, args=(time_ns, decay_hist, weights), method='leastsq')
report_fit(fit_res.params)
print 'Reduced Chi-square: %.3f' % fit_res.redchi
[[Variables]] ampl: 3691.374 +/- 8.272421 (0.22%) initial = 3691.373 baseline: 6.317692 +/- 0.08213102 (1.30%) initial = 6.317692 offset: 2.814 (fixed) tau: 1.988625 +/- 0.003367729 (0.17%) initial = 1.988625 [[Correlations]] (unreported correlations are < 0.100) C(ampl, tau) = -0.722 C(baseline, tau) = -0.236 C(ampl, baseline) = 0.109 Reduced Chi-square: 1.177
plot_fit(time_ns, decay_hist, params, zoom_origin=False)
Now we compute the confidence interval by samplig the reduced $\chi^2$:
ci = lmfit.conf_interval(fit_res)
lmfit.printfuncs.report_ci(ci)
99.70% 95.00% 67.40% 0.00% 67.40% 95.00% 99.70% tau 1.97862 1.98195 1.98535 1.98863 1.99191 1.99532 1.99873 baseline 6.07364 6.15659 6.23696 6.33703 6.39842 6.47878 6.56170 ampl3666.739613675.195673683.351253697.352253699.408873707.588043716.00500
To find the ML estimate we maximize the log-likelihood function:
res = minimize(loglike, x0=from_params(params), args=(time_ns, decay_hist),
method='Nelder-Mead')
params = to_params(res.x, params)
report_fit(params)
[[Variables]] ampl: 3650.818 +/- 8.272421 (0.23%) initial = 3691.373 baseline: 7.279041 +/- 0.08213102 (1.13%) initial = 6.317692 offset: 2.80642 (fixed) tau: 2.005321 +/- 0.003367729 (0.17%) initial = 1.988625 [[Correlations]] (unreported correlations are < 0.100) C(ampl, tau) = -0.722 C(baseline, tau) = -0.236 C(ampl, baseline) = 0.109
plot_fit(time_ns, decay_hist, params, zoom_origin=False)
Note that the ML method gives a better estimation of the true baseline value (
baseline_true
) but the fitted lifetime is not significantly different (in this case).
from IPython.core.display import HTML
HTML(open("./styles/custom2.css", "r").read())