This notebook contains material from PyRosetta; content is available on Github.

Running Rosetta in Parallel

Notes

The following notebooks contain examples of how to run Rosetta in parallel either locally on a compute cluster through the pyrosetta.distributed package that is now included with PyRosetta.

The first three notebooks contain two high-level examples of parallel PyRosett through a Rosetta-PyData integration using the pyrosetta.distributed namespace.

Setup

For Chapter 16, Running PyRosetta in Parallel, you will need to use a specific version of PyRosetta that is built for parallelization. This is the serialization build. Besides manually building it from the Rosetta C++ source code, the general way to obtain this is through the use of a conda environment.

A conda environment is a way to run code that has specific versions of required packages, that instead of being installed globally, will be installed as a local virtual environment that you may run whenever you wish. This is extremely useful when some packages require specific versions of other packages, as is the case for some rosetta distributed code.

You will need to pass the username and password of PyRosetta to conda.
In order to do this, we will create a file in your home directory called ~/.condarc. The file should look like:

channels:
  - https://USERNAME:[email protected]
  - defaults

Here, instead of USERNAME and PASSWORD, enter the USERNAME and PASSWORD you were given while gaining access to PyRosetta.

If you already have this file, please edit it instead of overriding it (below).

Using python:

In [4]:
import os
condarc = os.path.join(os.environ["HOME"], ".condarc")
if not os.path.exists(condarc):
    with open(condarc, "w") as f:
        f.write("channels:\n")
        f.write("  - https://{USERNAME}:{PASSWORD}@conda.graylab.jhu.edu\n".format(
            USERNAME="USERNAME", PASSWORD="PASSWORD")
        )
        f.write("  - defaults\n")

Alternatively, using bash:

echo "channels:" >> $HOME/.condarc
echo "  - https://USERNAME:[email protected]" >> $HOME/.condarc
echo "  - defaults" >> $HOME/.condarc

Create the conda environment with the provided environment.yml file:

conda env create -f environment.yml

then activate your environment:

conda activate PyRosetta.notebooks

Each time you wish to run this environment, use conda activate PyRosetta.notebooks to create the local virtual environment. You may wish to put this in your system configuration on startup.

For your new conda environment to show up as a kernel option in Jupyter, you may have to register your custom kernel with Jupyter:

python -m ipykernel install --user --name PyRosetta.notebooks

Installed kernels are listed with:

jupyter kernelspec list

NOTE: When using a notebook with this environment - the python Kernel must be set to this env. In the following notebooks, this is done for you, but if you wish to use this environment in other notebooks, make sure to manually change this! You can do this by looking at the jupyter menu - Kernel is after Cell and before Widgets. The option is 'Change Kernel`. This is how you would run python2 vs python3 or run a kernal with other conda environments you have installed on your computer.

Citation

Citation PyData integration notebooks:

Integration of the Rosetta Suite with the Python Software Stack via reproducible packaging and core programming interfaces for distributed simulation

Alexander S. Ford, Brian D. Weitzner, Christopher D. Bahl

Manual

Documentation for the pyrosetta.distributed namespace can be found here: https://nbviewer.jupyter.org/github/proteininnovation/Rosetta-PyData_Integration/blob/master/distributed_overview.ipynb

In [ ]:
 

Chapter contributors:

  • Jason C. Klima (University of Washington; Lyell Immunopharma)
  • Brian Weitzner (University of Washington; Lyell Immunopharma)
  • Jared Adolf-Bryfogle (Scripps; Institute for Protein Innovation)