PyBroMo - 2. Generate timestamps - Parallel¶

This notebook is part of PyBroMo a python-based single-molecule Brownian motion diffusion simulator that simulates confocal smFRET experiments. You can find the full list of notebooks in Usage Examples.

Start the cluster¶

In this notebooks we load simulations saved in a cluster and generate timestamps for different emissions and background rates.

Follow the steps in HOWTO CLUSTER SETUP.txt to configure and start an IPython cluster. There is another txt file explaining how to manage remotely windows machines (so you can start the enigines without walking in front of the remote PC).

After the cluster is started we can test it:

In [3]:

from IPython.parallel import Client
from IPython.utils.path import get_ipython_dir
ipython_dir = get_ipython_dir() 

In [5]:

PROFILE = 'parallel'
rc = Client(ipython_dir+'/profile_%s/security/ipcontroller-client.json' % PROFILE)
dview = rc[:]
dview.block = True

In [6]:

rc.ids

Out[6]:

[0, 1, 2, 3, 4, 5, 6, 7]

The previous command returns a list of integers that are the ID of all the connected engines (remote or local).

Prepare the cluster¶

Run the inizialization script for the simulation. This will just set the correct folder and load brownian motion functions on the local computer:

In [1]:

%run -i load_bromo.py

/home/anto/Documents/ucla/src/brownian

Prepare the engines (change folder, define a unique eid, load PyBroMo software):

In [54]:

%px %reset # Not needed on the first excution

In [55]:

# Send a variable containing the engine ID to each engine
dview.scatter('eid', rc.ids, flatten=True)

In [56]:

%px eid

Out[0:1277]: 0

Out[1:1277]: 1

Out[2:1277]: 2

Out[3:1277]: 3

Out[4:1277]: 4

Out[5:1277]: 5

Out[6:1277]: 6

Out[7:1277]: 7

In [57]:

%%px
import os
if os.name == 'posix':
    BROWN_DIR = "/home/anto/Documents/ucla/src/brownian/"
elif os.name == 'nt':
    BROWN_DIR = r"C:/Data/Antonio/software/Dropbox/brownian/"

In [58]:

%%px 
%cd $BROWN_DIR

[stdout:0] C:\Data\Antonio\software\Dropbox\brownian
[stdout:1] C:\Data\Antonio\software\Dropbox\brownian
[stdout:2] C:\Data\Antonio\software\Dropbox\brownian
[stdout:3] C:\Data\Antonio\software\Dropbox\brownian
[stdout:4] C:\Data\Antonio\software\Dropbox\brownian
[stdout:5] C:\Data\Antonio\software\Dropbox\brownian
[stdout:6] C:\Data\Antonio\software\Dropbox\brownian
[stdout:7] C:\Data\Antonio\software\Dropbox\brownian

In [59]:

%px run -i brownian.py

Generate timestamps from a simulation (single FRET value, single ID)¶

Here we load in the remote engine the simulation containing the emission array. We let the remote engines to simulate the timestamps. Then we transfer the timestamps locally and merge them in a single timestamp array.

We start loading on each engine a simulation with ID=1 and containing the string t_max10.0s (so we select only 10s simualtions):

In [60]:

%%px
S = load_sim_id(ID=1, glob_str="*t_max10.0s*", dir_=SIM_DATA_DIR)

[stdout:0] 
Loaded:
C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID0-1.pickle
[stdout:1] 
Loaded:
C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID1-1.pickle
[stdout:2] 
Loaded:
C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID2-1.pickle
[stdout:3] 
Loaded:
C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID3-1.pickle
[stdout:4] 
Loaded:
C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID4-1.pickle
[stdout:5] 
Loaded:
C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID5-1.pickle
[stdout:6] 
Loaded:
C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID6-1.pickle
[stdout:7] 
Loaded:
C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID7-1.pickle

Then we generate the timestamps remotely and merge them in local arrays (ph_times_d and ph_times_a):

In [65]:

run -i brownian.py

In [62]:

ph_times_d, t_tot, sim_name = parallel_gen_timetag(dview, max_em_rate=1e6, bg_rate=2e3)

In [ ]:

ph_times_a, t_tot, sim_name = parallel_gen_timetag(dview, max_em_rate=5e5, bg_rate=4e3)

As a last optional step we change data format. We merge the two arrays ph_times_d and ph_times_a in a single array ph_times and use a boolean mask (array of booleans) a_em to signal if each timestamp is from the Acceptor (True) or from the Donor (False).

In [ ]:

ph_times, a_em = merge_DA_ph_times(ph_times_d, ph_times_a)

and test that the two representation are equivalent:

In [ ]:

(ph_times_d == ph_times[-a_em]).all()

In [ ]:

(ph_times_a == ph_times[a_em]).all()

In [ ]:

ph_times.size == ph_times_d.size+ph_times_a.size == a_em.size

In this examples we have generate a FRET efficiency of:

In [2]:

k = 5e5/1e6
E = k/(k+1)
print "%.1f%%" % (E*100)

33.3%

Generate sets of timestamps (multiple FRET values, multiple IDs)¶

In order to simulate different FRET values we need different levels of emission rates. Furthermore it is important to check different background rates in order to understand how background influence the burst analisys. Therefore we need to simulated different backgorund rate for each emission rate.

The process can easily diverge if we choose too many cases. Practically around 100 cases (for example 10 emission leveles and 10 background rates) can be simulated in around 20 minutes on a quad-core desktop running 4 engines.

Here we choose the values to simulate for the max emission rate and background rate:

In [15]:

import numpy as np
EM = np.r_[10,20:201:20, 190]*1e3
#EM = np.r_[90,99,101,110]*1e3
print EM

[  90000.   99000.  101000.  110000.]

In [16]:

BG = np.r_[1:7,8]*1e3
print BG

[ 1000.  2000.  3000.  4000.  5000.  6000.  8000.]

Graphical check of the values:

In [17]:

print EM[None,:].shape, BG.shape

(1, 4) (7,)

In [18]:

# Compute any possible k and E
M = np.dot(EM[:,None],1/EM[None,:])
k = np.sort(M.ravel())
E = k/(k+1)
M.shape

Out[18]:

(12, 12)

In [18]:

%pylab inline

Populating the interactive namespace from numpy and matplotlib

WARNING: pylab import has clobbered these variables: ['box', 'rc']
`%pylab --no-import-all` prevents importing * from pylab and numpy

In [19]:

plot(E, '.')
title("Check how many E values are possible");

Some helper functions to generate file names:

In [18]:

def gen_tot_sim_name(Sim_names):
    """Conatenates with '+' the ID of strings in `Sim_names`.
    Assumes all the string be the same except for
    a one-digit ID.
    Example: gen_tot_sim_name(['_ID0','_ID1']) returns '_ID0+1'
    """
    IDs = []
    for s in Sim_names:
        n = s.find('_ID') + 3
        IDs.append(s[n])
    name = Sim_names[0][:n] + '+'.join(IDs) + Sim_names[0][n+1:]
    return name

In [19]:

gen_tot_sim_name(['pippo_ID0_pluto','pippo_ID1_pluto', 'pippo_ID2_pluto'])

Out[19]:

'pippo_ID0+1+2_pluto'

In [20]:

def gen_ph_times_name(em, bg, sim_name, t_tot):
    """Generate a name for a timestamp array
    """
    EM_str = "%04d" % (em/1e3)
    BG_str = "%04.1f" % (bg/1e3)
    fname = "ph_times_{t}s_{sim}_EM{em}kHz_BG{bg}kHz.npy".format(
            em=EM_str, bg=BG_str, t=t_tot, sim=sim_name)
    return fname

In [21]:

gen_ph_times_name(5e5, 3e3, 'pippo', 10.)

Out[21]:

'ph_times_10.0s_pippo_EM0500kHz_BG03.0kHz.npy'

And now the "big loop" to generate timestamps for multiple:

background rates
max emission rates

This will also merge different simulation IDs:

In [22]:

#%qtconsole

In [ ]:

%%timeit -n1 -r1
IDs = [1, 2, 3, 4, 5, 6]
for bg in BG:
    for em in EM:
        ph_list = []
        t_tot_tot = 0
        Sim_name = []
        for ID in IDs:
            dview['ID'] = ID
            %px S = load_sim_id(ID=ID, glob_str="*t_max10.0s*", dir_=SIM_DATA_DIR)

            ph, t_tot, sim_name = parallel_gen_timetag(dview, 
                                    max_em_rate=em, bg_rate=bg)
            
            ph_list.append(ph)
            t_tot_tot += t_tot
            Sim_name.append(sim_name)
        
        # Use last sim `t_tot` to set time_block
        ph_times = merge_ph_times(ph_list, time_block=t_tot)
        
        tot_sim_name = gen_tot_sim_name(Sim_name)
        fname = gen_ph_times_name(em, bg, tot_sim_name, t_tot_tot)
        
        ph_times.dump(SIM_PH_DIR+fname)
        del ph_times

In [ ]:

plt.hist(ph_times, bins=arange(0,0.5,1e-3), histtype='step');

In [ ]:

Loading the timestamps¶

This is a quick example. For a better description see the next notebook: 3. Generate and export smFRET data

Specify here the parameters of the simulation to load:

In [ ]:

d_EM_kHz = 400.
d_BG_kHz = 4
a_EM_kHz = 200
a_BG_kHz = 2
FRET_val = 1.*a_EM_kHz/(a_EM_kHz+d_EM_kHz)
print "Simulated FRET value: %.2f" % FRET_val

In [ ]:

# These are used for the file name
ID = '0+1+2'
t_tot = '0.6'

# These are metadata associated to the timestamps
t_step = 0.5e-6
clk_p = t_step/32. # with t_step=0.5us -> 156.25 ns

Here we compute the file name from the parameter:

In [ ]:

d_EM_kHz_str = "%04d" % d_EM_kHz
a_EM_kHz_str = "%04d" % a_EM_kHz
d_BG_kHz_str = "%04.1f" % d_BG_kHz
a_BG_kHz_str = "%04.1f" % d_BG_kHz

print "D: EM %s BG %s "%(d_EM_kHz_str,d_BG_kHz_str)
print "A: EM %s BG %s "%(a_EM_kHz_str,a_BG_kHz_str)

In [ ]:

fname_d = "ph_times_{t_tot}s_D1.2e-11_10P_21pM_step0.5us_ID{ID}_EM{em}kHz_BG{bg}kHz.npy".format(em=d_EM_kHz_str, bg=d_BG_kHz_str, t_tot=t_tot, ID=ID)
fname_a = "ph_times_{t_tot}s_D1.2e-11_10P_21pM_step0.5us_ID{ID}_EM{em}kHz_BG{bg}kHz.npy".format(em=a_EM_kHz_str, bg=a_BG_kHz_str, t_tot=t_tot, ID=ID)
print fname_d
print fname_a

In [ ]:

SIM_PH_DIR

Load the timestamps arrays:

In [ ]:

ph_times_d = numpy.load(SIM_PH_DIR+fname_d)
ph_times_a = numpy.load(SIM_PH_DIR+fname_a)

In [ ]:

ph_times, a_em = merge_DA_ph_times(ph_times_d, ph_times_a)
ph_times_int = (ph_times/clk_p).astype('int64')

Now the timestamp arrays can be analized. For example you can perform a bust search and build a FRET histogram.

In [ ]:

In [3]:

from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom2.css", "r").read()
    return HTML(styles)
css_styling()

Out[3]: