PyRosettaCluster Tutorial 3 demonstrates how multiple tasks (specified by kwargs
) may be run several times (specified by nstruct
). Additionally, user-provided PyRosetta protocols may yield
or return
multiple Pose
or PackedPose
objects to be efficiently parallelized on the user's compute resources.
Warning: This notebook uses pyrosetta.distributed.viewer
code, which runs in jupyter notebook
and might not run if you're using jupyterlab
.
Note: This Jupyter notebook uses parallelization and is not meant to be executed within a Google Colab environment.
Note: This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the --serialization
flag or installing PyRosetta from the RosettaCommons conda channel
Please see Chapter 16.00 for setup instructions
Note: This Jupyter notebook is intended to be run within Jupyter Lab, but may still be run as a standalone Jupyter notebook.
import bz2
import glob
import logging
import os
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.viewer as viewer
from pyrosetta.distributed.cluster import PyRosettaCluster
logging.basicConfig(level=logging.INFO)
dask
¶See Tutorial 1A for review:
Inject client code here, then run the cell:
if not os.getenv("DEBUG"):
from dask.distributed import Client
client = Client("tcp://127.0.0.1:40329")
else:
client = None
client
Client
|
Cluster
|
Pose
or PackedPose
objects:¶PyRosettaCluster automatically passes returned or yielded Pose
or PackedPose
objects through the user-provided PyRosetta protocols. If a protocol produces n
poses, the subsequent protocol runs n
times, once for each pose. By default, the Pose
and PackedPose
objects returned by the final protocol are written to disk.
Multiple Pose
and PackedPose
objects may be yielded iteratively, or returned in a list
or tuple
:
To yield
multiple poses:
for _ in range(n_results):
yield backrub(ppose.pose.clone())
Note: yield
does not add the yielded object to the queue for parallelization until all objects are yielded.
To return
multiple poses in a list
:
return list_of_poses
To return
multiple poses in a tuple
:
return pose1, pose2, pose3
def protocol1(packed_pose_in=None, **kwargs):
"""
Performs backrub on a `PackedPose` object, which may be (a) input
to the function or (b) accessed through the 's' `kwargs` keyword
argument.
Args:
packed_pose: A `PackedPose` object. Optional.
**kwargs: PyRosettaCluster keyword arguments.
Returns:
Multiple `PackedPose` objects.
"""
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
if packed_pose_in == None:
packed_pose_in = io.pose_from_file(kwargs["s"])
xml = """
<ROSETTASCRIPTS>
<MOVERS>
<Backrub name="backrub" pivot_residues="22A,23A,24A,25A,26A,27A"/>
</MOVERS>
<PROTOCOLS>
<Add mover="backrub"/>
</PROTOCOLS>
</ROSETTASCRIPTS>
"""
backrub = rosetta_scripts.SingleoutputRosettaScriptsTask(xml)
n_results = 3
for _ in range(n_results):
yield backrub(packed_pose_in.pose.clone())
def protocol2(packed_pose_in, **kwargs):
"""
Performs sequence design using 'ALLAAxc' resfile command on input
`kwargs['resnums']` residue numbers on the input `PackedPose` object.
Args:
packed_pose_in: A `PackedPose` object to be designed.
**kwargs: PyRosettaCluster keyword arguments.
Returns:
A `PackedPose` object that has been designed.
"""
import pyrosetta
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
xml = """
<ROSETTASCRIPTS>
<RESIDUE_SELECTORS>
<Index name="my_resnums" resnums="{resnums}" />
<Not name="not_my_resnums" selector="my_resnums" />
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<ResfileCommandOperation name="design" command="ALLAAxc" residue_selector="my_resnums"/>
<OperateOnResidueSubset name="prevent_repacking" selector="not_my_resnums">
<PreventRepackingRLT/>
</OperateOnResidueSubset>
</TASKOPERATIONS>
<MOVERS>
<PackRotamersMover name="design_mover" task_operations="design,prevent_repacking"/>
</MOVERS>
<PROTOCOLS>
<Add mover="design_mover"/>
</PROTOCOLS>
</ROSETTASCRIPTS>
""".format(resnums=kwargs["resnums"])
return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(packed_pose_in.pose.clone())
kwargs
:¶Returning a list of dictionaries or yielding dictionaries allows the user to run through the chain of user-provided PyRosetta protocols multiple times with different inputs, and the unique kwargs
can be accessed within each user-provided PyRosetta protocol.
dict_of_options = {
"-out:level": "300",
"-multithreading:total_threads": "1",
}
def create_tasks():
for resnum in range(22, 28):
yield {
"options": "-ex1",
"extra_options": dict_of_options,
"set_logging_handler": "interactive",
"s": os.path.join(os.getcwd(), "inputs", "1QYS.pdb"),
"resnums": str(resnum) + "A",
}
distribute()
:¶We also will use the PyRosettaCluster
nstruct
attribute, which is an int
object specifying the number of repeats of the first user-provided PyRosetta protocol.
if not os.getenv("DEBUG"):
output_path = os.path.join(os.getcwd(), "outputs_3")
PyRosettaCluster(
tasks=create_tasks,
client=client,
scratch_dir=output_path,
output_path=output_path,
nstruct=2,
).distribute(protocols=[protocol1, protocol2, protocol1])
INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'options': '-run:constant_seed 1 -multithreading:total_threads 1', 'extra_options': '-mute all', 'silent': True} INFO:pyrosetta.rosetta:Found rosetta database at: /shared/home/jklima/.conda/envs/jupyterlab/lib/python3.7/site-packages/pyrosetta/database; using it.... INFO:pyrosetta.rosetta:PyRosetta-4 2020 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python37.Release 2020.15+release.3121c734db02d2b62dd1974dcb8daface3f50057 2020-04-10T09:29:24] retrieved from: http://www.pyrosetta.org (C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!
Initially there are 12
simulations running in parallel: 6
tasks from the create_tasks
generator, with each task executed using nstruct=2
. After protocol1
runs to completion, more PackedPose
objects are added to the queue. After protocol2
runs to completion, more PackedPose
objects are added to the queue. The process continues until all tasks are run through the chain of user-provided PyRosetta protocols.
Gather output decoys from disk into memory:
if not os.getenv("DEBUG"):
results = glob.glob(os.path.join(output_path, "decoys", "*", "*.pdb.bz2"))
packed_poses = []
for i, bz2file in enumerate(results, start=1):
with open(bz2file, "rb") as f:
packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))
logging.info("Percent done loading: {0:0.1f} %".format((i * 100.) / len(results)))
INFO:root:Percent done loading: 0.9 % INFO:root:Percent done loading: 1.9 % INFO:root:Percent done loading: 2.8 % INFO:root:Percent done loading: 3.7 % INFO:root:Percent done loading: 4.6 % INFO:root:Percent done loading: 5.6 % INFO:root:Percent done loading: 6.5 % INFO:root:Percent done loading: 7.4 % INFO:root:Percent done loading: 8.3 % INFO:root:Percent done loading: 9.3 % INFO:root:Percent done loading: 10.2 % INFO:root:Percent done loading: 11.1 % INFO:root:Percent done loading: 12.0 % INFO:root:Percent done loading: 13.0 % INFO:root:Percent done loading: 13.9 % INFO:root:Percent done loading: 14.8 % INFO:root:Percent done loading: 15.7 % INFO:root:Percent done loading: 16.7 % INFO:root:Percent done loading: 17.6 % INFO:root:Percent done loading: 18.5 % INFO:root:Percent done loading: 19.4 % INFO:root:Percent done loading: 20.4 % INFO:root:Percent done loading: 21.3 % INFO:root:Percent done loading: 22.2 % INFO:root:Percent done loading: 23.1 % INFO:root:Percent done loading: 24.1 % INFO:root:Percent done loading: 25.0 % INFO:root:Percent done loading: 25.9 % INFO:root:Percent done loading: 26.9 % INFO:root:Percent done loading: 27.8 % INFO:root:Percent done loading: 28.7 % INFO:root:Percent done loading: 29.6 % INFO:root:Percent done loading: 30.6 % INFO:root:Percent done loading: 31.5 % INFO:root:Percent done loading: 32.4 % INFO:root:Percent done loading: 33.3 % INFO:root:Percent done loading: 34.3 % INFO:root:Percent done loading: 35.2 % INFO:root:Percent done loading: 36.1 % INFO:root:Percent done loading: 37.0 % INFO:root:Percent done loading: 38.0 % INFO:root:Percent done loading: 38.9 % INFO:root:Percent done loading: 39.8 % INFO:root:Percent done loading: 40.7 % INFO:root:Percent done loading: 41.7 % INFO:root:Percent done loading: 42.6 % INFO:root:Percent done loading: 43.5 % INFO:root:Percent done loading: 44.4 % INFO:root:Percent done loading: 45.4 % INFO:root:Percent done loading: 46.3 % INFO:root:Percent done loading: 47.2 % INFO:root:Percent done loading: 48.1 % INFO:root:Percent done loading: 49.1 % INFO:root:Percent done loading: 50.0 % INFO:root:Percent done loading: 50.9 % INFO:root:Percent done loading: 51.9 % INFO:root:Percent done loading: 52.8 % INFO:root:Percent done loading: 53.7 % INFO:root:Percent done loading: 54.6 % INFO:root:Percent done loading: 55.6 % INFO:root:Percent done loading: 56.5 % INFO:root:Percent done loading: 57.4 % INFO:root:Percent done loading: 58.3 % INFO:root:Percent done loading: 59.3 % INFO:root:Percent done loading: 60.2 % INFO:root:Percent done loading: 61.1 % INFO:root:Percent done loading: 62.0 % INFO:root:Percent done loading: 63.0 % INFO:root:Percent done loading: 63.9 % INFO:root:Percent done loading: 64.8 % INFO:root:Percent done loading: 65.7 % INFO:root:Percent done loading: 66.7 % INFO:root:Percent done loading: 67.6 % INFO:root:Percent done loading: 68.5 % INFO:root:Percent done loading: 69.4 % INFO:root:Percent done loading: 70.4 % INFO:root:Percent done loading: 71.3 % INFO:root:Percent done loading: 72.2 % INFO:root:Percent done loading: 73.1 % INFO:root:Percent done loading: 74.1 % INFO:root:Percent done loading: 75.0 % INFO:root:Percent done loading: 75.9 % INFO:root:Percent done loading: 76.9 % INFO:root:Percent done loading: 77.8 % INFO:root:Percent done loading: 78.7 % INFO:root:Percent done loading: 79.6 % INFO:root:Percent done loading: 80.6 % INFO:root:Percent done loading: 81.5 % INFO:root:Percent done loading: 82.4 % INFO:root:Percent done loading: 83.3 % INFO:root:Percent done loading: 84.3 % INFO:root:Percent done loading: 85.2 % INFO:root:Percent done loading: 86.1 % INFO:root:Percent done loading: 87.0 % INFO:root:Percent done loading: 88.0 % INFO:root:Percent done loading: 88.9 % INFO:root:Percent done loading: 89.8 % INFO:root:Percent done loading: 90.7 % INFO:root:Percent done loading: 91.7 % INFO:root:Percent done loading: 92.6 % INFO:root:Percent done loading: 93.5 % INFO:root:Percent done loading: 94.4 % INFO:root:Percent done loading: 95.4 % INFO:root:Percent done loading: 96.3 % INFO:root:Percent done loading: 97.2 % INFO:root:Percent done loading: 98.1 % INFO:root:Percent done loading: 99.1 % INFO:root:Percent done loading: 100.0 %
Your designed Top7 (PDB ID: 1QYS) decoys are visualized below with residue numbers designed during the simulation shown.
There are 108 resulting decoys: 6 (kwargs
) x 2 (nstruct
) x 3 (protocol1
) x 1 (protocol2
) x 3 (protocol1
)
if not os.getenv("DEBUG"):
assert 6 * 2 * 3 * 1 * 3 == len(results)
if not os.getenv("DEBUG"):
resis = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector("22A,23A,24A,25A,26A,27A")
view = viewer.init(packed_poses, window_size=(800, 600))
view.add(viewer.setStyle())
view.add(viewer.setStyle(residue_selector=resis, colorscheme="whiteCarbon", radius=0.35))
view.add(viewer.setHydrogenBonds())
view.add(viewer.setHydrogens(polar_only=True))
view()
interactive(children=(IntSlider(value=0, continuous_update=False, description='Decoys', max=107), Output()), _…
<function pyrosetta.distributed.viewer.core.Viewer.show.<locals>.view(i=0)>
You have successfully run a multiple-protocol PyRosetta trajectory with PyRosettaCluster
!