PyBroMo - 3. Generate and export smFRET data¶

This notebook is part of PyBroMo a python-based single-molecule Brownian motion diffusion simulator that simulates confocal smFRET experiments. You can find the full list of notebooks in Usage Examples.

Generate a smFRET experiment¶

In this notebooks it is assumed that you have already generated a series of timestamps for different emission levels and different background rates. You can do this by running the notebook: 2. Generate timestamps - Parallel

Once we simulated timestamps for different emission rates (and possibly for different background rates) it is easy to obtain smFRET data. We just need to assign donor timestamps and acceptor timestamps. The FRET efficiency will be function of the ratio between max acceptor emission and max donor emission. Calling this rate $k$:

$$k = \mathrm{\frac{acceptor\;rate}{donor\; rate}}$$

We can compute the simulated FRET efficiency $E$ as:

$$ E = \frac{k}{k+1} $$

So if, for example, you want a FRET efficiency of 20% you need a $k$ of 0.25:

In [1]:

0.2/(1-0.2)

Out[1]:

0.25

Therefore you need two timestamps files with max emission rate of 150 kcps and 50 kcps. Make sure you already generated these timestamps (if not go back to the previous notebook). In this example you simply need to use the 150 kcps rate for the donor timestamps and the 50 kcps rate for the acceptor timestamps.

Note: You can simulate arbitrarly high emission rates, however as a rule of thumb for a good green-red FRET pair (for example ATTO550 and ATTO647N), you should not go above 200-300 kcps as total max emission rate (donor + acceptor). YMMV.

Load the simulated timestamps¶

In [2]:

%run -i load_bromo.py

/home/anto/Documents/ucla/src/brownian

All the avalilable timestamps should be in the folder SIM_PH_DIR (defined in path_def.py in PyBroMo folder):

In [3]:

SIM_PH_DIR

Out[3]:

'/home/anto/Documents/ucla/src/data/sim/brownian/ph_times/'

In [4]:

#ls $SIM_PH_DIR/*ID1+2+3+4+5+6*

Now we need to specify which simulation to load providing the simulation parameters.

ID indicates all the simulation ID that are fused in the timestamp file
t_tot the strings representing the total simulated times (in seconds)
num_p the number of simulated particles
pM the particles concentration in pico-Mol
t_step the simulation time-step used during simulation
D the diffusion coefficient used during simulation
dir_ the folder where the timestamps were saved

For convenience we save these parameters in a dictionary called pybromo_ts_params:

In [5]:

pybromo_ts_params = dict(
    ID = '1+2+3+4+5+6',
    t_tot = '480',
    num_p = '30',
    pM = '64',
    t_step = 0.5e-6,
    D = 1.2e-11,
    dir_=SIM_PH_DIR,
    )

The previous parameters do not change most of the times. What you probably want to change often are the emission and background rates, in order to perform some comparison. Let specify the emission and background rates (in kHz) for both the donor and the acceptor channel:

In [6]:

pybromo_ts_params.update(
    d_em_kHz = 20.,
    a_em_kHz = 180.,
    d_bg_kHz = 6,
    a_bg_kHz = 3,
    )

Now we use an helper function to obtain the timestamp filenames. As a side effect the simulated FRET efficiency is printed and returned:

In [7]:

fname_d, fname_a, name, clk_p, E_sim = lu.get_bromo_fnames_da(**pybromo_ts_params)

Simulated FRET value: 90.0%
D: EM 0020 BG 06.0 
A: EM 0180 BG 03.0 
ph_times_480s_D1.2e-11_30P_64pM_step0.5us_ID1+2+3+4+5+6_EM0020kHz_BG06.0kHz.npy
ph_times_480s_D1.2e-11_30P_64pM_step0.5us_ID1+2+3+4+5+6_EM0180kHz_BG03.0kHz.npy

We finally try to load the timestamps:

In [8]:

ph_times_d = np.load(fname_d)
ph_times_a = np.load(fname_a)

If the previous command fails you either specified wrong parameters (check the ID, t_tot, num_p, pM, etc...) or a wrong folder for timestamps (check dir_). Check if the filenames in fname_d and fname_a really exist.

The timestamps are float64 and represent the absolute time in seconds from the beginning of the simulated experiment. You can convert the array to integers (int64) and assign a clock period in order to have data that looks more like timestamps generated by an FPGA hardware:

In [9]:

ph_times, a_em = merge_DA_ph_times(ph_times_d, ph_times_a)

clk_p = t_step/32. # with t_step=0.5us -> 156.25 ns
ph_times_int = (ph_times/clk_p).astype('int64')

Save or export timestamps¶

Timestamps are Numpy arrays that can be saved in any conceivable format. Here I provide some examples.

Export to MATLAB .mat¶

Here we save two arrays ph_times_d and ph_times_a into a MATLAB file:

In [10]:

from scipy.io import savemat

In [11]:

data_to_save = dict(timestamps_donor=ph_times_d, 
                    timestamps_acceptor=ph_times_a)

savemat('timestamps_matlab_example', data_to_save, oned_as='row')

Export to ASCII format¶

Even if not very efficient sometimes can be handy to save an array to a txt file. We use the numpy array method tofile() that can save both in binary and ASCII format.

To save in ASCII just specify a separator (sep). Format is the standard python (or C) string format for numbers:

In [12]:

ph_times_d.tofile('timestamps_donor.txt', sep='\n', format="%15e")

For a more flexible function see savetxt().

Export to a binary format¶

Let see an example on how to save (and load) an array in an arbitrary binary format. We use the numpy method tofile() that saves the array in the same representation as is currently in memory. So we just need to convert the array in memory and then save it to a file.

Converting an array memory format and/or its declared type is explained pretty well in the numpy docs:

Numpy Basics Byte-swapping

Here let say we want to save the array ph_times_int. We can check the type and byte order asking to the array itself:

In [13]:

ph_times_int.dtype

Out[13]:

dtype('int64')

In [14]:

ph_times_int.dtype.byteorder

Out[14]:

'='

So it is an int64 array with native byte order '=' (the same as the CPU). You can safely assume that a modern CPU is little-endian. But if you are really paranoic you can check:

In [15]:

import sys
sys.byteorder

Out[15]:

'little'

Ok. Now, as an example, we want to convert this array to big-endian int32:

In [16]:

# Convert to int32
ph_times_int32 = ph_times_int.astype('int32')

# Swap the bytes in memory to change endianess (byteswap)
# and change also the "declared" byte order (newbyteorder) 
ph_times_int32_bigendian = ph_times_int32.byteswap().newbyteorder()

print ph_times_int32_bigendian.dtype

>i4

We can finally save the array:

In [17]:

ph_times_int32_bigendian.tofile('times_bigend_int32.dat')

In [18]:

new_ph_times = np.fromfile('times_bigend_int32.dat', dtype='>i4')

In [19]:

(new_ph_times == ph_times_int32_bigendian).all()

Out[19]:

True

Form the previous command we see that all the elements of the two arrays are equal.

METHOD 2 You may not trust fromfile: it may be able to load the array just because fromfile is the twin method of tofile (we are a bit paranoic here but just check).

So here we load the binary data and build an array from this binary stream:

In [20]:

with open('times_bigend_int32.dat', 'rb') as f:
    data = f.read()

In [21]:

new_ph_times = np.ndarray(len(data)/4, dtype='>i4', buffer=data)

In [22]:

(new_ph_times == ph_times_int32_bigendian).all()

Out[22]:

True

Again, all the elements of the two arrays are equal.

METHOD 3 Still not convinced? Let take look at the raw data (first two timestamps):

In [23]:

print new_ph_times[0], hex(new_ph_times[0])
print new_ph_times[1], hex(new_ph_times[1])

1856 0x740
4544 0x11c0

In data we loaded the byte-stream that was saved in the file. Here we convert each byte-string to integer (ord) and then to hexadecimal (hex):

In [24]:

[hex(ord(d)) for d in data[:8]]

Out[24]:

['0x0', '0x0', '0x7', '0x40', '0x0', '0x0', '0x11', '0xc0']

You can see that the first two timestamps correspond to what we have in new_ph_times and the order is big-endian (most significant byte first).

In [ ]:

In [1]:

from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom2.css", "r").read()
    return HTML(styles)
css_styling()

Out[1]:

In [ ]: