This notebook is part of smFRET burst analysis software FRETBursts.
In this notebook shows how to fit a FRET histogram. For a complete tutorial on burst analysis see [FRETBursts - us-ALEX smFRET burst analysis](FRETBursts - us-ALEX smFRET burst analysis.ipynb).
from fretbursts import *
- Optimized (cython) burst search loaded. - Optimized (cython) photon counting loaded. -------------------------------------------------------------- You are running FRETBursts (version 0.7+46.ge31fadb.dirty). If you use this software please cite the following paper: FRETBursts: An Open Source Toolkit for Analysis of Freely-Diffusing Single-Molecule FRET Ingargiola et al. (2016). http://dx.doi.org/10.1371/journal.pone.0160716 --------------------------------------------------------------
sns = init_notebook(apionly=True)
import lmfit
print('lmfit version:', lmfit.__version__)
lmfit version: 1.0.3
# Tweak here matplotlib style
import matplotlib as mpl
mpl.rcParams['font.sans-serif'].insert(0, 'Arial')
mpl.rcParams['font.size'] = 12
%config InlineBackend.figure_format = 'retina'
url = 'http://files.figshare.com/2182601/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5'
download_file(url, save_dir='./data')
full_fname = "./data/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5"
d = loader.photon_hdf5(full_fname)
loader.alex_apply_period(d)
d.calc_bg(bg.exp_fit, time_s=1000, tail_min_us=(800, 4000, 1500, 1000, 3000))
d.burst_search(L=10, m=10, F=6)
ds = d.select_bursts(select_bursts.size, add_naa=True, th1=30)
URL: http://files.figshare.com/2182601/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5 File: 0023uLRpitc_NTP_20dT_0.5GndCl.hdf5 File already on disk: /home/paul/Disk/Python/OpenSMFS/FRETBursts_notebooks/notebooks/data/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5 Delete it to re-download. # Total photons (after ALEX selection): 2,259,522 # D photons in D+A excitation periods: 721,537 # A photons in D+A excitation periods: 1,537,985 # D+A photons in D excitation period: 1,434,842 # D+A photons in A excitation period: 824,680 - Calculating BG rates ... get bg th arrays Channel 0 [DONE] - Performing burst search (verbose=False) ...[DONE] - Calculating burst periods ...[DONE] - Counting D and A ph and calculating FRET ... - Applying background correction. [DONE Counting D/A]
We start defining the model. Here we choose a 3-Gaussian model:
model = mfit.factory_three_gaussians()
model.print_param_hints()
Name Value Min Max Vary Expr p1_amplitude 1 0.01 inf True p1_center 0 0 1 True p1_fwhm nan -inf inf True 2.3548200*p1_sigma p1_height nan -inf inf True 0.3989423*p1_amplitude/max(1e-15, p1_sigma) p1_sigma 0.05 0.02 0.2 True p2_amplitude 1 0.01 inf True p2_center 0.5 0 1 True p2_fwhm nan -inf inf True 2.3548200*p2_sigma p2_height nan -inf inf True 0.3989423*p2_amplitude/max(1e-15, p2_sigma) p2_sigma 0.05 0.02 0.2 True p3_amplitude 1 0.01 inf True p3_center 1 0 1 True p3_fwhm nan -inf inf True 2.3548200*p3_sigma p3_height nan -inf inf True 0.3989423*p3_amplitude/max(1e-15, p3_sigma) p3_sigma 0.05 0.02 0.2 True
The previsou cell prints all the model parameters.
Each parameters has an initial value and bounds (min, max).
The column vary
tells if a parameter is varied during the fit
(if False the parameter is fixed).
Parameters with an expression (Expr
column) are not free but
the are computed as a function of other parameters.
We can modify the paramenters constrains as follows:
model.set_param_hint('p1_center', value=0.1, min=-0.1, max=0.3)
model.set_param_hint('p2_center', value=0.4, min=0.3, max=0.7)
model.set_param_hint('p2_sigma', value=0.04, min=0.02, max=0.18)
model.set_param_hint('p3_center', value=0.85, min=0.7, max=1.1)
Then, we fit and plot the model:
E_fitter = bext.bursts_fitter(ds, 'E', binwidth=0.03)
E_fitter.fit_histogram(model=model, pdf=False, method='nelder')
E_fitter.fit_histogram(model=model, pdf=False, method='leastsq')
dplot(ds, hist_fret, show_model=True, pdf=False);
# dplot(ds, hist_fret, show_model=True, pdf=False, figsize=(6, 4.5));
# plt.xlim(-0.1, 1.1)
# plt.savefig('fret_hist_fit.png', bbox_inches='tight', dpi=200, transparent=False)
The results are in E_fitter
:
res = E_fitter.fit_res[0]
res.params.pretty_print()
Name Value Min Max Stderr Vary Expr Brute_Step p1_amplitude 26.43 0.01 inf 2.665 True None None p1_center 0.1304 -0.1 0.3 0.003665 True None None p1_fwhm 0.1247 -inf inf 0.009604 False 2.3548200*p1_sigma None p1_height 199 -inf inf 12.7 False 0.3989423*p1_amplitude/max(1e-15, p1_sigma) None p1_sigma 0.05296 0.02 0.2 0.004079 True None None p2_amplitude 97.73 0.01 inf 5.766 True None None p2_center 0.42 0.3 0.7 0.006083 True None None p2_fwhm 0.3394 -inf inf 0.02357 False 2.3548200*p2_sigma None p2_height 270.5 -inf inf 6.868 False 0.3989423*p2_amplitude/max(1e-15, p2_sigma) None p2_sigma 0.1441 0.02 0.18 0.01001 True None None p3_amplitude 46.61 0.01 inf 4.2 True None None p3_center 0.8268 0.7 1.1 0.01187 True None None p3_fwhm 0.2946 -inf inf 0.02508 False 2.3548200*p3_sigma None p3_height 148.6 -inf inf 6.991 False 0.3989423*p3_amplitude/max(1e-15, p3_sigma) None p3_sigma 0.1251 0.02 0.2 0.01065 True None None
To get a dictionary of values:
res.values
{'p1_amplitude': 26.425112571102087, 'p1_center': 0.13042117805610684, 'p1_sigma': 0.052962724398903344, 'p2_amplitude': 97.72510948676253, 'p2_center': 0.4199740995810512, 'p2_sigma': 0.14414401587742645, 'p3_amplitude': 46.61031832136763, 'p3_center': 0.8267695386960199, 'p3_sigma': 0.12509313036967762, 'p1_fwhm': 0.12471768266902558, 'p1_height': 199.04744906009154, 'p2_fwhm': 0.3394332114684814, 'p2_height': 270.47033280627744, 'p3_fwhm': 0.2945718052571243, 'p3_height': 148.647871708916}
This is the startndard lmfit's fit report:
print(res.fit_report(min_correl=0.5))
[[Model]] ((Model(gaussian, prefix='p1_') + Model(gaussian, prefix='p2_')) + Model(gaussian, prefix='p3_')) [[Fit Statistics]] # fitting method = leastsq # function evals = 154 # data points = 46 # variables = 9 chi-square = 7950.00085 reduced chi-square = 214.864888 Akaike info crit = 255.005152 Bayesian info crit = 271.462925 [[Variables]] p1_amplitude: 26.4251126 +/- 2.66475790 (10.08%) (init = 1) p1_center: 0.13042118 +/- 0.00366511 (2.81%) (init = 0.1) p1_sigma: 0.05296272 +/- 0.00407857 (7.70%) (init = 0.05) p2_amplitude: 97.7251095 +/- 5.76640353 (5.90%) (init = 1) p2_center: 0.41997410 +/- 0.00608253 (1.45%) (init = 0.4) p2_sigma: 0.14414402 +/- 0.01001133 (6.95%) (init = 0.04) p3_amplitude: 46.6103183 +/- 4.20009275 (9.01%) (init = 1) p3_center: 0.82676954 +/- 0.01187091 (1.44%) (init = 0.85) p3_sigma: 0.12509313 +/- 0.01064950 (8.51%) (init = 0.05) p1_fwhm: 0.12471768 +/- 0.00960429 (7.70%) == '2.3548200*p1_sigma' p1_height: 199.047449 +/- 12.7008735 (6.38%) == '0.3989423*p1_amplitude/max(1e-15, p1_sigma)' p2_fwhm: 0.33943321 +/- 0.02357489 (6.95%) == '2.3548200*p2_sigma' p2_height: 270.470333 +/- 6.86806312 (2.54%) == '0.3989423*p2_amplitude/max(1e-15, p2_sigma)' p3_fwhm: 0.29457181 +/- 0.02507766 (8.51%) == '2.3548200*p3_sigma' p3_height: 148.647872 +/- 6.99134733 (4.70%) == '0.3989423*p3_amplitude/max(1e-15, p3_sigma)' [[Correlations]] (unreported correlations are < 0.500) C(p2_amplitude, p2_sigma) = 0.935 C(p3_amplitude, p3_sigma) = 0.857 C(p2_amplitude, p3_amplitude) = -0.806 C(p1_amplitude, p2_sigma) = -0.792 C(p2_sigma, p3_amplitude) = -0.789 C(p2_amplitude, p3_center) = 0.779 C(p1_amplitude, p1_sigma) = 0.774 C(p2_sigma, p3_center) = 0.768 C(p1_amplitude, p2_amplitude) = -0.762 C(p3_amplitude, p3_center) = -0.731 C(p2_amplitude, p3_sigma) = -0.713 C(p2_sigma, p3_sigma) = -0.665 C(p3_center, p3_sigma) = -0.660 C(p1_sigma, p2_sigma) = -0.554 C(p2_center, p3_sigma) = -0.553 C(p1_sigma, p2_amplitude) = -0.547 C(p2_center, p3_amplitude) = -0.534 C(p1_amplitude, p3_amplitude) = 0.531 C(p1_amplitude, p3_center) = -0.518 C(p2_center, p3_center) = 0.511
The previous cell reports error ranges computed from the covariance matrix. More accurare confidence intervals can be obtained with:
ci = res.conf_interval()
lmfit.report_ci(ci)
/home/paul/anaconda3/envs/Py38/lib/python3.8/site-packages/lmfit/confidence.py:317: UserWarning: Bound reached with prob(p2_sigma=0.18) = 0.992812065990089 < max(sigmas) warn(errmsg)
99.73% 95.45% 68.27% _BEST_ 68.27% 95.45% 99.73% p1_amplitude: -9.30991 -6.10472 -3.00234 26.42511 +2.97612 +6.05976 +9.43949 p1_center : -0.01193 -0.00771 -0.00382 0.13042 +0.00397 +0.00835 +0.01354 p1_sigma : -0.01374 -0.00894 -0.00442 0.05296 +0.00456 +0.00953 +0.01531 p2_amplitude: -22.75290 -14.16482 -6.83120 97.72511 +6.80352 +13.90510 +21.30745 p2_center : -0.02278 -0.01400 -0.00664 0.41997 +0.00641 +0.01298 +0.02009 p2_sigma : -0.03326 -0.02232 -0.01138 0.14414 +0.01224 +0.02572 +inf p3_amplitude: -13.74698 -9.15839 -4.65290 46.61032 +5.09442 +11.07792 +18.71141 p3_center : -0.05671 -0.03303 -0.01497 0.82677 +0.01330 +0.02587 +0.03840 p3_sigma : -0.03201 -0.02169 -0.01125 0.12509 +0.01292 +0.02877 +0.04959
It is convenient to put the fit results in a DataFrame for further analysis.
A dataframe of fitted parameters is already in E_fitter
:
E_fitter.params
p1_amplitude | p1_center | p1_fwhm | p1_height | p1_sigma | p2_amplitude | p2_center | p2_fwhm | p2_height | p2_sigma | p3_amplitude | p3_center | p3_fwhm | p3_height | p3_sigma | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26.425113 | 0.130421 | 0.124718 | 199.047449 | 0.052963 | 97.725109 | 0.419974 | 0.339433 | 270.470333 | 0.144144 | 46.610318 | 0.82677 | 0.294572 | 148.647872 | 0.125093 |
With pybroom
we can get a "tidy" DataFrame
with more complete fit results:
import pybroom as br
df = br.tidy(res)
df
name | value | min | max | vary | expr | stderr | init_value | |
---|---|---|---|---|---|---|---|---|
0 | p1_amplitude | 26.425113 | 0.01 | inf | True | None | 2.664758 | 1.00 |
1 | p1_center | 0.130421 | -0.10 | 0.3 | True | None | 0.003665 | 0.10 |
2 | p1_fwhm | 0.124718 | -inf | inf | False | 2.3548200*p1_sigma | 0.009604 | NaN |
3 | p1_height | 199.047449 | -inf | inf | False | 0.3989423*p1_amplitude/max(1e-15, p1_sigma) | 12.700874 | NaN |
4 | p1_sigma | 0.052963 | 0.02 | 0.2 | True | None | 0.004079 | 0.05 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
10 | p3_amplitude | 46.610318 | 0.01 | inf | True | None | 4.200093 | 1.00 |
11 | p3_center | 0.826770 | 0.70 | 1.1 | True | None | 0.011871 | 0.85 |
12 | p3_fwhm | 0.294572 | -inf | inf | False | 2.3548200*p3_sigma | 0.025078 | NaN |
13 | p3_height | 148.647872 | -inf | inf | False | 0.3989423*p3_amplitude/max(1e-15, p3_sigma) | 6.991347 | NaN |
14 | p3_sigma | 0.125093 | 0.02 | 0.2 | True | None | 0.010650 | 0.05 |
15 rows × 8 columns
Now, for example, we can easily select parameters by name:
df.loc[df.name.str.contains('center')]
name | value | min | max | vary | expr | stderr | init_value | |
---|---|---|---|---|---|---|---|---|
1 | p1_center | 0.130421 | -0.1 | 0.3 | True | None | 0.003665 | 0.10 |
6 | p2_center | 0.419974 | 0.3 | 0.7 | True | None | 0.006083 | 0.40 |
11 | p3_center | 0.826770 | 0.7 | 1.1 | True | None | 0.011871 | 0.85 |
df.loc[df.name.str.contains('sigma')]
name | value | min | max | vary | expr | stderr | init_value | |
---|---|---|---|---|---|---|---|---|
4 | p1_sigma | 0.052963 | 0.02 | 0.20 | True | None | 0.004079 | 0.05 |
9 | p2_sigma | 0.144144 | 0.02 | 0.18 | True | None | 0.010011 | 0.04 |
14 | p3_sigma | 0.125093 | 0.02 | 0.20 | True | None | 0.010650 | 0.05 |