This notebook is part of PyBroMo a python-based single-molecule Brownian motion diffusion simulator that simulates confocal smFRET experiments. You can find the full list of notebooks in Usage Examples.
In this notebooks we load simulations saved in a cluster and generate timestamps for different emissions and background rates.
Follow the steps in HOWTO CLUSTER SETUP.txt to configure and start an IPython cluster. There is another txt file explaining how to manage remotely windows machines (so you can start the enigines without walking in front of the remote PC).
After the cluster is started we can test it:
from IPython.parallel import Client
from IPython.utils.path import get_ipython_dir
ipython_dir = get_ipython_dir()
PROFILE = 'parallel'
rc = Client(ipython_dir+'/profile_%s/security/ipcontroller-client.json' % PROFILE)
dview = rc[:]
dview.block = True
rc.ids
[0, 1, 2, 3, 4, 5, 6, 7]
The previous command returns a list of integers that are the ID of all the connected engines (remote or local).
Run the inizialization script for the simulation. This will just set the correct folder and load brownian motion functions on the local computer:
%run -i load_bromo.py
/home/anto/Documents/ucla/src/brownian
Prepare the engines (change folder, define a unique eid, load PyBroMo software):
%px %reset # Not needed on the first excution
# Send a variable containing the engine ID to each engine
dview.scatter('eid', rc.ids, flatten=True)
%px eid
Out[0:1277]: 0
Out[1:1277]: 1
Out[2:1277]: 2
Out[3:1277]: 3
Out[4:1277]: 4
Out[5:1277]: 5
Out[6:1277]: 6
Out[7:1277]: 7
%%px
import os
if os.name == 'posix':
BROWN_DIR = "/home/anto/Documents/ucla/src/brownian/"
elif os.name == 'nt':
BROWN_DIR = r"C:/Data/Antonio/software/Dropbox/brownian/"
%%px
%cd $BROWN_DIR
[stdout:0] C:\Data\Antonio\software\Dropbox\brownian [stdout:1] C:\Data\Antonio\software\Dropbox\brownian [stdout:2] C:\Data\Antonio\software\Dropbox\brownian [stdout:3] C:\Data\Antonio\software\Dropbox\brownian [stdout:4] C:\Data\Antonio\software\Dropbox\brownian [stdout:5] C:\Data\Antonio\software\Dropbox\brownian [stdout:6] C:\Data\Antonio\software\Dropbox\brownian [stdout:7] C:\Data\Antonio\software\Dropbox\brownian
%px run -i brownian.py
Here we load in the remote engine the simulation containing the emission array. We let the remote engines to simulate the timestamps. Then we transfer the timestamps locally and merge them in a single timestamp array.
We start loading on each engine a simulation with ID=1
and containing the string t_max10.0s
(so we select only 10s simualtions):
%%px
S = load_sim_id(ID=1, glob_str="*t_max10.0s*", dir_=SIM_DATA_DIR)
[stdout:0] Loaded: C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID0-1.pickle [stdout:1] Loaded: C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID1-1.pickle [stdout:2] Loaded: C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID2-1.pickle [stdout:3] Loaded: C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID3-1.pickle [stdout:4] Loaded: C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID4-1.pickle [stdout:5] Loaded: C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID5-1.pickle [stdout:6] Loaded: C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID6-1.pickle [stdout:7] Loaded: C:/Data/Antonio/data/sim/brownian/objects\bromo_sim_D1.2e-11_30P_64pM_step0.5us_t_max10.0s_ID7-1.pickle
Then we generate the timestamps remotely and merge them in local arrays
(ph_times_d
and ph_times_a
):
run -i brownian.py
ph_times_d, t_tot, sim_name = parallel_gen_timetag(dview, max_em_rate=1e6, bg_rate=2e3)
ph_times_a, t_tot, sim_name = parallel_gen_timetag(dview, max_em_rate=5e5, bg_rate=4e3)
As a last optional step we change data format.
We merge the two arrays ph_times_d
and ph_times_a
in a single array ph_times
and use a boolean mask (array of booleans) a_em
to signal if each timestamp
is from the Acceptor (True
) or from the Donor (False
).
ph_times, a_em = merge_DA_ph_times(ph_times_d, ph_times_a)
and test that the two representation are equivalent:
(ph_times_d == ph_times[-a_em]).all()
(ph_times_a == ph_times[a_em]).all()
ph_times.size == ph_times_d.size+ph_times_a.size == a_em.size
In this examples we have generate a FRET efficiency of:
k = 5e5/1e6
E = k/(k+1)
print "%.1f%%" % (E*100)
33.3%
In order to simulate different FRET values we need different levels of emission rates. Furthermore it is important to check different background rates in order to understand how background influence the burst analisys. Therefore we need to simulated different backgorund rate for each emission rate.
The process can easily diverge if we choose too many cases. Practically around 100 cases (for example 10 emission leveles and 10 background rates) can be simulated in around 20 minutes on a quad-core desktop running 4 engines.
Here we choose the values to simulate for the max emission rate and background rate:
import numpy as np
EM = np.r_[10,20:201:20, 190]*1e3
#EM = np.r_[90,99,101,110]*1e3
print EM
[ 90000. 99000. 101000. 110000.]
BG = np.r_[1:7,8]*1e3
print BG
[ 1000. 2000. 3000. 4000. 5000. 6000. 8000.]
Graphical check of the values:
print EM[None,:].shape, BG.shape
(1, 4) (7,)
# Compute any possible k and E
M = np.dot(EM[:,None],1/EM[None,:])
k = np.sort(M.ravel())
E = k/(k+1)
M.shape
(12, 12)
%pylab inline
Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['box', 'rc'] `%pylab --no-import-all` prevents importing * from pylab and numpy
plot(E, '.')
title("Check how many E values are possible");
Some helper functions to generate file names:
def gen_tot_sim_name(Sim_names):
"""Conatenates with '+' the ID of strings in `Sim_names`.
Assumes all the string be the same except for
a one-digit ID.
Example: gen_tot_sim_name(['_ID0','_ID1']) returns '_ID0+1'
"""
IDs = []
for s in Sim_names:
n = s.find('_ID') + 3
IDs.append(s[n])
name = Sim_names[0][:n] + '+'.join(IDs) + Sim_names[0][n+1:]
return name
gen_tot_sim_name(['pippo_ID0_pluto','pippo_ID1_pluto', 'pippo_ID2_pluto'])
'pippo_ID0+1+2_pluto'
def gen_ph_times_name(em, bg, sim_name, t_tot):
"""Generate a name for a timestamp array
"""
EM_str = "%04d" % (em/1e3)
BG_str = "%04.1f" % (bg/1e3)
fname = "ph_times_{t}s_{sim}_EM{em}kHz_BG{bg}kHz.npy".format(
em=EM_str, bg=BG_str, t=t_tot, sim=sim_name)
return fname
gen_ph_times_name(5e5, 3e3, 'pippo', 10.)
'ph_times_10.0s_pippo_EM0500kHz_BG03.0kHz.npy'
And now the "big loop" to generate timestamps for multiple:
This will also merge different simulation IDs:
#%qtconsole
%%timeit -n1 -r1
IDs = [1, 2, 3, 4, 5, 6]
for bg in BG:
for em in EM:
ph_list = []
t_tot_tot = 0
Sim_name = []
for ID in IDs:
dview['ID'] = ID
%px S = load_sim_id(ID=ID, glob_str="*t_max10.0s*", dir_=SIM_DATA_DIR)
ph, t_tot, sim_name = parallel_gen_timetag(dview,
max_em_rate=em, bg_rate=bg)
ph_list.append(ph)
t_tot_tot += t_tot
Sim_name.append(sim_name)
# Use last sim `t_tot` to set time_block
ph_times = merge_ph_times(ph_list, time_block=t_tot)
tot_sim_name = gen_tot_sim_name(Sim_name)
fname = gen_ph_times_name(em, bg, tot_sim_name, t_tot_tot)
ph_times.dump(SIM_PH_DIR+fname)
del ph_times
plt.hist(ph_times, bins=arange(0,0.5,1e-3), histtype='step');
This is a quick example. For a better description see the next notebook: 3. Generate and export smFRET data
Specify here the parameters of the simulation to load:
d_EM_kHz = 400.
d_BG_kHz = 4
a_EM_kHz = 200
a_BG_kHz = 2
FRET_val = 1.*a_EM_kHz/(a_EM_kHz+d_EM_kHz)
print "Simulated FRET value: %.2f" % FRET_val
# These are used for the file name
ID = '0+1+2'
t_tot = '0.6'
# These are metadata associated to the timestamps
t_step = 0.5e-6
clk_p = t_step/32. # with t_step=0.5us -> 156.25 ns
Here we compute the file name from the parameter:
d_EM_kHz_str = "%04d" % d_EM_kHz
a_EM_kHz_str = "%04d" % a_EM_kHz
d_BG_kHz_str = "%04.1f" % d_BG_kHz
a_BG_kHz_str = "%04.1f" % d_BG_kHz
print "D: EM %s BG %s "%(d_EM_kHz_str,d_BG_kHz_str)
print "A: EM %s BG %s "%(a_EM_kHz_str,a_BG_kHz_str)
fname_d = "ph_times_{t_tot}s_D1.2e-11_10P_21pM_step0.5us_ID{ID}_EM{em}kHz_BG{bg}kHz.npy".format(em=d_EM_kHz_str, bg=d_BG_kHz_str, t_tot=t_tot, ID=ID)
fname_a = "ph_times_{t_tot}s_D1.2e-11_10P_21pM_step0.5us_ID{ID}_EM{em}kHz_BG{bg}kHz.npy".format(em=a_EM_kHz_str, bg=a_BG_kHz_str, t_tot=t_tot, ID=ID)
print fname_d
print fname_a
SIM_PH_DIR
Load the timestamps arrays:
ph_times_d = numpy.load(SIM_PH_DIR+fname_d)
ph_times_a = numpy.load(SIM_PH_DIR+fname_a)
ph_times, a_em = merge_DA_ph_times(ph_times_d, ph_times_a)
ph_times_int = (ph_times/clk_p).astype('int64')
Now the timestamp arrays can be analized. For example you can perform a bust search and build a FRET histogram.
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom2.css", "r").read()
return HTML(styles)
css_styling()