PairID
to extract PSA data¶Here we will use the convenience class PSAIdentifier
to extract Hausdorff pair analysis data generated by PSA.
%matplotlib inline
%load_ext autoreload
%autoreload 2
# Suppress FutureWarning about element-wise comparison to None
# Occurs when calling PSA plotting functions
import warnings
warnings.filterwarnings('ignore')
PSA
using MDAnalysis
¶from MDAnalysis import Universe
from MDAnalysis.analysis.psa import PSAnalysis
from pair_id import PairID
Initialize lists for the methods on which to perform PSA. PSA will be performed for four different simulations methods with three runs for each: DIMS, FRODA, rTMD-F, and rTMD-S. Also initialize a PSAIdentifier
object to keep track of the data corresponding to comparisons between pairs of simulations.
method_names = ['DIMS','FRODA','rTMD-F','rTMD-S']
labels = [] # Heat map labels (not plotted in this example)
simulations = [] # List of simulation topology/trajectory filename pairs
universes = [] # List of MDAnalysis Universes representing simulations
For each method, get the topology and each of three total trajectories (per method). Each simulation is represented as a (topology, trajectory)
pair of file names, which is appended to a master list of simulations.
for method in method_names:
# Note: DIMS uses the PSF topology format
topname = 'top.psf' if 'DIMS' in method or 'TMD' in method else 'top.pdb'
pathname = 'fitted_psa.dcd'
method_dir = 'methods/{}'.format(method)
if method is not 'LinInt':
for run in xrange(1, 4): # 3 runs per method
run_dir = '{}/{:03n}'.format(method_dir, run)
topology = '{}/{}'.format(method_dir, topname)
trajectory = '{}/{}'.format(run_dir, pathname)
labels.append(method + '(' + str(run) + ')')
simulations.append((topology, trajectory))
else: # only one LinInt trajectory
topology = '{}/{}'.format(method_dir, topname)
trajectory = '{}/{}'.format(method_dir, pathname)
labels.append(method)
simulations.append((topology, trajectory))
Generate a list of universes from the list of simulations.
for sim in simulations:
universes.append(Universe(*sim))
Initialize a PSA comparison from the universe list using a C$_\alpha$ trajectory representation.
psa_hpa = PSAnalysis(universes, path_select='name CA', labels=labels)
Generate PSA Path
s from the trajectories
psa_hpa.generate_paths()
Perform a Hausdorff pairs analysis on all of the Path
s
psa_hpa.run_pairs_analysis(hausdorff_pairs=True, neighbors=True)
identifier = PairID()
for name in method_names:
identifier.add_sim(name, [1,2,3])
Get the PSA ID for the second DIMS simulation (DIMS 2) and third rTMD-F simulation (rTMD-F 3)
pid = identifier.get_pair_id('DIMS 2', 'rTMD-F 3')
Get the indices of the frames for the DIMS 2 and rTMD-F 3 paths corresponding to the Hausdorff pair
psa_hpa.hausdorff_pairs[pid]
{'distance': 1.8656396, 'frames': (48, 97)}
psa_hpa.hausdorff_pairs[pid]['frames']
(48, 97)
Get the rmsd separating the Hausdorff pair (this is the Hausdorff distance!)
psa_hpa.hausdorff_pairs[pid]['distance']
1.8656396
Get the indices of the nearest neighbor frames for the DIMS 2 and rTMD-F 3 paths
psa_hpa.nearest_neighbors[pid]['frames']
(array([ 3, 3, 7, 8, 13, 13, 17, 17, 19, 22, 22, 22, 28, 29, 30, 36, 36, 36, 37, 42, 42, 43, 49, 51, 52, 56, 57, 57, 57, 57, 63, 63, 68, 68, 68, 69, 76, 80, 80, 83, 84, 84, 89, 90, 90, 91, 91, 93, 97, 102, 102, 105, 105, 109, 110, 110, 115, 116, 116, 116, 116, 116, 117, 125, 127, 127, 127, 129, 129, 129, 129, 130, 129, 133, 133, 133, 133, 133, 133, 134, 133, 133, 137, 138, 134, 138, 138, 138, 138, 139, 138, 139]), array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 10, 10, 10, 10, 10, 10, 12, 12, 14, 14, 14, 14, 15, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 22, 22, 26, 29, 29, 29, 29, 29, 29, 29, 29, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 37, 37, 42, 42, 42, 42, 42, 42, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 52, 52, 54, 58, 59, 61, 61, 61, 61, 61, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 89, 89, 89, 89, 89, 89, 89, 89, 89, 91, 91, 90, 90, 90, 90, 90, 90, 90, 91, 91]))
Get the nearest neighbor rmsds for the paths
psa_hpa.nearest_neighbors[pid]['distances']
(array([ 0.61047804, 0.72486514, 0.78072226, 0.83813328, 0.93167013, 1.01148224, 1.06094921, 1.11567676, 1.15830064, 1.18572509, 1.19752169, 1.24201703, 1.28473079, 1.3059119 , 1.32284486, 1.34145117, 1.36143172, 1.36866331, 1.45567334, 1.54358423, 1.5964545 , 1.63075888, 1.64360261, 1.68257856, 1.72496045, 1.71841514, 1.71702886, 1.72773194, 1.76599729, 1.72124612, 1.77787733, 1.76189709, 1.75216496, 1.74139595, 1.7261765 , 1.73064542, 1.76115847, 1.7780937 , 1.8280741 , 1.81312251, 1.81970263, 1.79710293, 1.77227151, 1.7905525 , 1.76451468, 1.77083576, 1.80586505, 1.83057499, 1.86563957, 1.85736096, 1.86288309, 1.82903266, 1.7895925 , 1.78329408, 1.74680781, 1.72502816, 1.72101748, 1.66805542, 1.63964784, 1.60171819, 1.57724261, 1.5326587 , 1.55232775, 1.51061881, 1.47200549, 1.44320154, 1.38229811, 1.31344652, 1.28408015, 1.24469936, 1.19692671, 1.17440617, 1.11309803, 1.0645541 , 1.02498984, 0.99263138, 0.93730855, 0.91037399, 0.86864704, 0.82913756, 0.79435873, 0.76110363, 0.73129284, 0.68911695, 0.64772373, 0.60788435, 0.58289891, 0.55806595, 0.52546072, 0.4935227 , 0.4636465 , 0.45959264], dtype=float32), array([ 0.63109344, 0.62719834, 0.61601019, 0.61047804, 0.62155843, 0.63865268, 0.66227907, 0.67063749, 0.69626302, 0.72733957, 0.75623101, 0.7901932 , 0.83100504, 0.85794991, 0.88361055, 0.90336204, 0.92149943, 0.93565702, 0.97083294, 0.99374866, 1.03374887, 1.06611073, 1.08965945, 1.13067162, 1.17012823, 1.20783472, 1.22551477, 1.23616135, 1.23949754, 1.25503623, 1.27730811, 1.29305518, 1.31983471, 1.32525945, 1.33354521, 1.33722591, 1.33787072, 1.35226214, 1.3751452 , 1.39359438, 1.39420414, 1.39986932, 1.40463161, 1.41709375, 1.4386462 , 1.45734608, 1.47568536, 1.49810481, 1.51763892, 1.53081644, 1.55413353, 1.57935131, 1.6103493 , 1.65109909, 1.68054557, 1.69345903, 1.7044816 , 1.71702886, 1.72613454, 1.72539341, 1.72467172, 1.72730267, 1.72880113, 1.72190535, 1.73856974, 1.75418186, 1.7464515 , 1.73684514, 1.7261765 , 1.72830319, 1.73841536, 1.74597847, 1.74871075, 1.74822044, 1.75148296, 1.74700916, 1.74772978, 1.75728726, 1.77193272, 1.77165914, 1.7767179 , 1.78658187, 1.79900873, 1.78998554, 1.78027713, 1.78107965, 1.77923036, 1.77341914, 1.7738812 , 1.76738906, 1.76451468, 1.76607394, 1.77978122, 1.7792269 , 1.79973316, 1.80040133, 1.80270863, 1.81065071, 1.82419837, 1.83073819, 1.83595932, 1.82261348, 1.80349576, 1.79456735, 1.77086103, 1.74088705, 1.70968091, 1.68400657, 1.67217898, 1.64538348, 1.62183571, 1.56865823, 1.50714755, 1.4620024 , 1.39992738, 1.34150422, 1.28506386, 1.23358619, 1.1904099 , 1.13146091, 1.08869743, 1.04396403, 0.98481226, 0.92711079, 0.8842231 , 0.83658326, 0.77844107, 0.72915876, 0.68647271, 0.63805151, 0.60030681, 0.56835544, 0.54370099, 0.50965744, 0.48632449, 0.47754329, 0.47594741, 0.46904811, 0.45977202, 0.45959264], dtype=float32))
Display the pandas
DataFrame
containing the set of simulations
df = identifier.data
df
Sim ID | ||
---|---|---|
Name | Run ID | |
DIMS | 1 | 0 |
2 | 1 | |
3 | 2 | |
FRODA | 1 | 3 |
2 | 4 | |
3 | 5 | |
rTMD-F | 1 | 6 |
2 | 7 | |
3 | 8 | |
rTMD-S | 1 | 9 |
2 | 10 | |
3 | 11 |
Get the simulation IDs for DIMS simulations 1, 2, and 3
df.loc[('DIMS',[1,2,3]), 'Sim ID']
Name Run ID DIMS 1 0 2 1 3 2 Name: Sim ID, dtype: int64