FRETBursts - μs-ALEX smFRET burst analysis

This notebook is part of a tutorial series for the FRETBursts burst analysis software.

In this notebook, we present a typical FRETBursts workflow for μs-ALEX smFRET burst analysis. Briefly, we show how to perform background estimation, burst search, burst selection, compute FRET histograms and ALEX histograms, do sub-population selection and finally, FRET efficiency fit.


If you are new to the notebook interface see First Steps before continuing.

Before running the notebook, you can click on menu Cell -> All Output -> Clear to clear all previous output. This will avoid mixing output from current execution and the previously saved one.

Loading the software

We start by loading FRETBursts:

In [1]:
from fretbursts import *
 - Optimized (cython) burst search loaded.
 - Optimized (cython) photon counting loaded.
--------------------------------------------------------------
 You are running FRETBursts (version 0.5.7+0.g7267863.dirty).

 If you use this software please cite the following paper:

   FRETBursts: An Open Source Toolkit for Analysis of Freely-Diffusing Single-Molecule FRET
   Ingargiola et al. (2016). http://dx.doi.org/10.1371/journal.pone.0160716 

--------------------------------------------------------------
Thanks in advance for remembering to cite FRETBursts in publications or presentations!

Note that FRETBursts version string tells you the exact FRETBursts version (and revision) in use. Storing the version in the notebook helps with reproducibility and tracking software regressions.

Next, we initialize the default plot style for the notebook (under the hood it uses seaborn):

In [2]:
sns = init_notebook()

Note that the previous command has no output. Finally, we print the version of some dependencies:

In [3]:
import lmfit; lmfit.__version__
Out[3]:
'0.9.5'
In [4]:
import phconvert; phconvert.__version__
Out[4]:
'0.7.1'

Downloading the data file

The full list of smFRET measurements used in the FRETBursts tutorials can be found on Figshare.

This is the file we will download:

In [5]:
url = 'http://files.figshare.com/2182601/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5'
You can change the url variable above to download your own data file. This is useful if you are executing FRETBursts online and you want to use your own data file. See First Steps.

Here, we download the data file and put it in a folder named data, inside the notebook folder:

In [6]:
download_file(url, save_dir='./data')
URL:  http://files.figshare.com/2182601/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5
File: 0023uLRpitc_NTP_20dT_0.5GndCl.hdf5
 
File already on disk: /Users/anto/src/FRETBursts/notebooks/data/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5 
Delete it to re-download.

NOTE: If you modified the url variable providing an invalid URL the previous command will fail. In this case, edit the cell containing the url and re-try the download.

Selecting the data file

Use one of the following 2 methods to select a data file.

Option 1: Paste the file-name

Here, we can directly define the file name to be loaded:

In [7]:
filename = "./data/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5"
filename
Out[7]:
'./data/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5'

Now filename contains the path of the file you just selected. Run again the previous cell to select a new file. In a following cell we will check if the file actually exists.

When running the notebook online and using your own data file, make sure to modify the previous cell.
See First Steps.

Option 2: Use an "Open File" dialog

Alternatively, you can select a data file with an "Open File" windows. Note that, since this only works in a local installation, the next commands are commented (so nothing will happen when running the cell).

If you want to try the File Dialog, you need to remove the # signs:

In [8]:
# filename = OpenFileDialog()
# filename

Check that the data file exists

Let's check that the file exists:

In [9]:
import os
if os.path.isfile(filename):
    print("Perfect, file found!")
else:
    print("Sorry, file:\n%s not found" % filename)
Perfect, file found!

Load the selected file

We can finally load the data and store it in a variable called d:

In [10]:
d = loader.photon_hdf5(filename)

If you don't get any message, the file is loaded successfully.

We can also set the 3 correction coefficients:

  • leakage or bleed-through: leakage
  • direct excitation: dir_ex (ALEX-only)
  • gamma-factor gamma
In [11]:
d.leakage = 0.11
d.dir_ex = 0.04
d.gamma = 1.

NOTE: at any later moment, after burst search, a simple reassignment of these coefficient will update the burst data with the new correction values.

μs-ALEX parameters

At this point, timestamps and detectors numbers are contained in the ph_times_t and det_t attributes of d. Let's print them:

In [12]:
d.ph_times_t, d.det_t
Out[12]:
([array([     146847,      188045,      294124, ..., 47999863658,
         47999877783, 47999955353])],
 [array([0, 1, 1, ..., 1, 1, 0], dtype=uint32)])

We need to define some ALEX parameters:

In [13]:
d.add(det_donor_accept = (0, 1), 
      alex_period = 4000,
      offset = 700,
      D_ON = (2180, 3900), 
      A_ON = (200, 1800))

Here the parameters are:

  • det_donor_accept: donor and acceptor channels
  • alex_period: length of excitation period (in timestamps units)
  • D_ON and A_ON: donor and acceptor excitation windows
  • offset: the offset between the start of alternation and start of timestamping (see also Definition of alternation periods).

To check that the above parameters are correct, we need to plot the histogram of timestamps (modulo the alternation period) and superimpose the two excitation period definitions to it:

In [14]:
bpl.plot_alternation_hist(d)

If the previous alternation histogram looks correct, the corresponding definitions of the excitation periods can be applied to the data using the following command:

In [15]:
loader.alex_apply_period(d)
# Total photons (after ALEX selection):   2,259,522
#  D  photons in D+A excitation periods:    721,537
#  A  photons in D+A excitation periods:  1,537,985
# D+A photons in  D  excitation period:   1,434,842
# D+A photons in  A  excitation period:     824,680

If the previous histogram does not look right, the parameters in the d.add(...) cell can be modified and checked by running the histogram plot cell until everything looks fine. Don't forget to apply the parameters with loader.usalex_apply_period(d) as a last step.

NOTE: After applying the ALEX parameters a new array of timestamps containing only photons inside the excitation periods is created (name d.ph_times_m). To save memory, by default, the old timestamps array (d.ph_times_t) is deleted. Therefore, in the following, when we talk about all-photon selection we always refer to all photons inside both excitation periods.

Measurement infos

The entire measurement data is now stored in the variable d. Printing it will give a compact representation containing the file-name and additional parameters:

In [16]:
d
Out[16]:
data_0023uLRpitc_NTP_20dT_0.5GndCl G1.000 Lk11.000 dir4.0

To check the measurement duration (in seconds) run:

In [17]:
d.time_max
Out[17]:
599.99943316249994

Plotting basics

In this section basic concepts of plotting with FRETBursts using the timetrace plot as an example.

To plot a timetrace of the measurement we use:

In [18]:
dplot(d, timetrace);

Here, dplot is a generic wrapper (the same for all plots) that takes care of setting up the figure, title and axis (in the multispot case dplot creates multi-panel plot). The second argument, timetrace, is the actual plot function. All the eventual additional arguments passed to dplot are, in turn, passed to the plot function (e.g. timetrace).

If we look at the documentation for timetrace function we notice that it accepts a long list of arguments. In python, when an argument is not specified, it will take the default value specified in the function definition (see previus link).

As an example, to change the bin size (i.e. duration) of the timetrace histogram, we can look up in the timetrace documentation and find that the argument we need to modify is binwidth (we can also see that the default value is 0.001 seconds). We can then re-plot the timetrace using a bin of 0.5 ms:

In [19]:
dplot(d, timetrace, binwidth=0.5e-3);

The timetrace is computed between tmin and tmax (by default 0 and 200s), but as you can see is displayed only between 0 an 1 second, just because these are the default x-axis limits. The axis limits can be changes by using the standard matplotlib command plt.xlim(). On the other hand, to change the range where the timetrace is computed, we pass the additional arguments tmin and tmax as follows:

In [20]:
dplot(d, timetrace, binwidth=0.5e-3, tmin=50, tmax=150)
plt.xlim(51, 52);

When using FRETBursts in a notebook, all plots are static by default. This is because we use the so called inline backend of matplotlib. If you want to manipulate figures interactively, you can switch to the interactive notebook backend with:

%matplotlib notebook

to go back to inline use:

%matplotlib inline

NOTE: Currently, the notebook backend is incompatible with the QT backend. If in a session you activate the notebook backend, then switching to the QT backend requires restarting the notebook. Conversely, you can switch between inline and notebook or between inline and qt4 backends in the same session wihtou issues.

See also:

Background estimation

As a first step of the analysis, we need to estimate the background. The assumption is that the background is a Poisson process and therefore the corresponding inter photon delays are exponentially distributed. Since the background can change during the measurement, a new estimation is computed every time_s seconds (this time is called the background period).

The inter photon delay distribution contains both single-molecule signal and background, the latter being the only one we are interested in and the former being in general much shorter. Therefore, a threshold is needed to discriminate between the exponential tail and the single-molecule peak.

Choosing a threshold and fitting the exponential tail are two different problems. FRETBursts provides several ways to specify the minimum threshold and different functions to fit the exponential tail.

We will go over three different methods in increasing order of complexity (and recommendability)

For more information see:

Single threshold

Let start with a standard Maximum Likelihood (ML) background fit with a minimum tail threshold of 500μs:

In [21]:
d.calc_bg(bg.exp_fit, time_s=1000, tail_min_us=500)
 - Calculating BG rates ... [DONE]

We can look at how the fit looks with:

In [22]:
dplot(d, hist_bg, show_fit=True)
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x1042f54e0>

Note that the fits are not very good. This is understandable because we used a single threshold for all the photon streams, each one having a quite different background.

Multiple thresholds

To improve the fit, we can try specifying a threshold for each channel. This method is bit ad-hoc but it may work well when the thresholds are properly choosen.

In [23]:
d.calc_bg(bg.exp_fit, time_s=1000, tail_min_us=(800, 4000, 1500, 1000, 3000))
 - Calculating BG rates ... [DONE]
In [24]:
dplot(d, hist_bg, show_fit=True);

For ALEX measurements, the tuple passed to tail_min_us in order to define the thresholds needs to contain
5 values corresponding the 5 distinct photon streams (the all-photon stream

  • the 4 base alternation streams). The order of these 5 values need to match the order of photon streams in the Data.ph_streams attribute:
In [25]:
d.ph_streams
Out[25]:
[Ph_sel(Dex='DAem', Aex='DAem'),
 Ph_sel(Dex='Dem', Aex=None),
 Ph_sel(Dex='Aem', Aex=None),
 Ph_sel(Dex=None, Aex='Dem'),
 Ph_sel(Dex=None, Aex='Aem')]

Automatic threshold

Finally, is possible to let FRETBursts infer the threshold automatically with:

In [26]:
d.calc_bg(bg.exp_fit, time_s=1000, tail_min_us='auto', F_bg=1.7)
 - Calculating BG rates ... [DONE]

Which results in the following fit plot:

In [27]:
dplot(d, hist_bg, show_fit=True);

Under the hood, this method estimates the threshold automatically according to this formula:

threshold_auto = F_bg / coarse_background_rate

where F_bg is the fit function input argument (by default 1.7) and coarse_background_rate is an initial background estimation performed with fixed threshold. This method is concemptually an iterative method to compute the threshold that is stopped after the second iteration (this is usually more than enough for accurate estimates).

Of the three methods here described, the latter is the recommended one since it works well and without user intervention in a wide range of experimental conditions.

Background timetrace

It is a good practice to monitor background rates as a function of time. Here, we compute background in adjacent 30s windows (called background periods) and plot the estimated rates as a function of time.

In [28]:
d.calc_bg(bg.exp_fit, time_s=30, tail_min_us='auto', F_bg=1.7)
 - Calculating BG rates ... [DONE]
In [29]:
dplot(d, timetrace_bg)
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x1246163c8>

Getting the background rates

The background rates are stored in Data() attributes bg, bg_dd, bg_ad, bg_aa. These contain all the fitted background rates for each channel and period.

We can also get the average background for each channel:

In [30]:
d.rate_m, d.rate_dd, d.rate_ad, d.rate_da, d.rate_aa
Out[30]:
([2212.6520253589342],
 [592.2024909931622],
 [994.39404281480734],
 [74.737849340892041],
 [564.86449968607053])

Burst analysis

The first step of burst analysis is the burst search.

We will use the sliding-window algorithm on all photons. Note that "all photons", as mentioned before, means all photons selected in the alternation histogram. An important variation compared to the classical sliding-windows is that the threshold-rate for burst start is computed as a function of the background and changes when the background changes during the measurement.

To perform a burst search evaluating the photon rate with 10 photons (m=10), and selecting a minimum rate 6 times larger than the background rate (F=6) calculated with all photons (default):

In [31]:
d.burst_search(L=10, m=10, F=6)
 - Performing burst search (verbose=False) ...[DONE]
 - Calculating burst periods ...[DONE]
 - Counting D and A ph and calculating FRET ... 
   - Applying background correction.
   - Applying leakage correction.
   - Applying direct excitation correction.
   [DONE Counting D/A]

The previous command performs the burst search, corrects the bursts sizes for background, spectral leakage and direct excitation, and computes $\gamma$-corrected FRET and Stoichiometry.

See the burst_search documentation for more details.

We can plot the resulting FRET histogram using the following command:

In [32]:
dplot(d, hist_fret);

All pre-defined plots follow this pattern: call the generic dplot() function, passing 2 parameters:

  • the measurement data (d in this case)
  • the plot function (hist_fret)

In some case we can add other optional parameters to tweak the plot.

All plot functions start with hist_ for histograms, scatter_ for scatter-plots or timetrace_ for plots as a function of measurement time. You can use autocompletion to find all plot function or you can look in bursts_plot.py where all plot functions are defined.

Instead of hist_fret we can use hist_fret_kde to add a KDE overlay. Also, we can plot a weighted histogram by passing an additional parameter weights:

In [33]:
dplot(d, hist_fret, show_kde=True);
dplot(d, hist_fret, show_kde=True, weights='size');
 - Overwriting the old E_fitter object with the new weights.

You can experiment with different weighting schema (for all supported weights see get_weigths() function in fret_fit.py).

Burst selection

When we performed the burst search, we specified L=10 without explaining what this parameter means. L is traditionally the minimum size (number of photons) for a burst: smaller bursts will be rejected. By setting L=m (10 in this case) we are deciding to not discard any burst (because the smallest detected burst has at least m counts).

Selecting the bursts in a second step, by applying a minimum burst size criterion, results in a more accurate and unbiased selection.

For example, we can select bursts with more than 30 photons (after background, gamma, leakage and direct excitation corrections) and store the result in a new Data() variable ds:

In [34]:
ds = d.select_bursts(select_bursts.size, th1=30)

By defaults the burst size includes donor and acceptor photons during donor excitation. To add acceptor photons during acceptor excitation (naa), we add the parameter add_naa=True:

In [35]:
ds = d.select_bursts(select_bursts.size, add_naa=True, th1=30)

Similar to plot functions, all selection functions are defined in select_bursts.py and you can access them by typing select_bursts. and using the TAB key for autocompletion.

See also:

To replot the FRET histogram after selection (note that now we are passing ds to the plot function):

In [36]:
dplot(ds, hist_fret);

Note how the histogram exhibits much more clearly defined peaks after burst selection.

Histogram fitting and plotting style

Under the hood the previous hist_fret plot creates a MultiFitter object for $E$ values. This object, stored as ds.E_fitter, operates on multi-channel data and computes the histogram, KDE and can fit the histogram with a model (lmfit.Model).

Now, just for illustration purposes, we fit the previous histogram with 3 Gaussians, using the already created ds.E_fitter object:

In [37]:
ds.E_fitter.fit_histogram(mfit.factory_three_gaussians(), verbose=False)
In [38]:
dplot(ds, hist_fret, show_model=True);

The bin width can be changed with binwidth argument. Alternatively, an arbitrary array of bin edges can be passed in bins (overriding binwidth).

We can customize the appearance of this plot (type hist_fret? for the complete set of arguments).

For example to change from a bar plot to a line-plot we use the hist_style argument:

In [39]:
dplot(ds, hist_fret, show_model=True, hist_style='line')
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x120a132e8>

We can customize the line-plot, bar-plot, the model plot and the KDE plot by passing dictionaries with matplotlib style. The name of the arguments are:

  • hist_plot_style: style for the histogram line-plot
  • hist_bar_style: style for the histogram bar-plot
  • model_plot_style: style for the model plot
  • kde_plot_style: style for the KDE plot

As an example:

In [40]:
dplot(ds, hist_fret, show_model=True, hist_style='bar', show_kde=True,
      kde_plot_style = dict(linewidth=5, color='orange', alpha=0.6),
      hist_plot_style = dict(linewidth=3, markersize=8, color='b', alpha=0.6))
plt.legend();

Other plots

Similarly, we can plot the burst size using all photons (type hist_size? to learn about all plot options):

In [41]:
dplot(ds, hist_size, add_naa=True);

Or plot the burst size histogram for the different components:

In [42]:
dplot(ds, hist_size_all);

NOTE: The previous plot may generate a benign warning due to the presence of zeroes when switching to log scale. Just ignore it.

A scatterplot of Size vs FRET is created by:

In [43]:
dplot(ds, scatter_fret_nd_na)
xlim(-1, 2)
Out[43]:
(-1, 2)

Study of different populations

Removing multi-mers

We can further select only bursts smaller than 300 photons to get rid of multi-molecule events:

In [44]:
ds2 = ds.select_bursts(select_bursts.size, th2=300)

and superimpose the two histograms before and after selection to see the difference:

In [45]:
ax = dplot(ds2, hist_fret, hist_style='bar', show_kde=True, 
              hist_bar_style = dict(facecolor='r', alpha=0.5, 
                                    label='Hist. no large bursts'),
              kde_plot_style = dict(lw=3, color='m', 
                                    label='KDE no large bursts'))
dplot(ds, hist_fret, ax=ax, hist_style='bar', show_kde=True,
      hist_bar_style = dict(label='Hist. with large bursts'),
      kde_plot_style = dict(lw=3, label='KDE with large bursts'))
plt.legend();

NOTE: It is not necessarily true that bursts with more that 300 photons represents multiple molecules. To asses the valididty of this assumption it can be useful to plot the peak count rates in each burst. See hist_burst_phrate for this kind of plot.

Fit and plot peak positions

We can find the KDE peak position in a range (let say 0.2 ... 0.6):

In [46]:
ds.E_fitter.find_kde_max(np.r_[0:1:0.0002], xmin=0.2, xmax=0.6)

and plot it with show_kde_peak=True, we also use show_fit_value=True to show a box with the fitted value:

In [47]:
dplot(ds, hist_fret, hist_style='line', 
      show_fit_value=True, 
      show_kde=True, show_kde_peak=True);

Instead of using the KDE, we can use the peak position as fitted from a gaussian model.

In [48]:
ds.E_fitter.fit_histogram(mfit.factory_three_gaussians(), verbose=False)

To select which peak to show we use fit_from='p1_center':

In [49]:
dplot(ds, hist_fret, hist_style='line', 
      show_fit_value=True, fit_from='p2_center', 
      show_model=True);

The string 'p2_center' is the name of the parameter of the gaussian fit that we want to show in the text box. To see all the parameters of the model we look in:

In [50]:
ds.E_fitter.params  # <-- pandas DataFrame, one row per channel
Out[50]:
p1_amplitude p1_center p1_sigma p2_amplitude p2_center p2_sigma p3_amplitude p3_center p3_sigma
0 0.235592 0.830905 0.119578 0.629798 0.357341 0.178132 0.135013 0.0368783 0.0616214

ALEX plots

We can create a simple E-S scatter plot with scatter_alex:

In [51]:
dplot(ds, scatter_alex, figsize=(4,4), mew=1, ms=4, mec='black', color='purple');
/Users/anto/miniconda3/lib/python3.5/site-packages/matplotlib/cbook.py:2644: UserWarning: Saw kwargs ['ms', 'markersize'] which are all aliases for 'markersize'.  Kept value from 'markersize'
  seen=seen, canon=canonical, used=seen[-1]))
/Users/anto/miniconda3/lib/python3.5/site-packages/matplotlib/cbook.py:2644: UserWarning: Saw kwargs ['mew', 'markeredgewidth'] which are all aliases for 'markeredgewidth'.  Kept value from 'markeredgewidth'
  seen=seen, canon=canonical, used=seen[-1]))
/Users/anto/miniconda3/lib/python3.5/site-packages/matplotlib/cbook.py:2644: UserWarning: Saw kwargs ['mec', 'markeredgecolor'] which are all aliases for 'markeredgecolor'.  Kept value from 'markeredgecolor'
  seen=seen, canon=canonical, used=seen[-1]))