This notebook contains material from PyRosetta; content is available on Github.

Refinement Protocol

The entire standard Rosetta refinement protocol, similar to that presented in Bradley, Misura, & Baker 2005, is available as a Mover. Note that the protocol can require ~40 minutes for a 100-residue protein.

sfxn = get_fa_scorefxn()
pose = pose_from_pdb("1YY8.clean.pdb")
relax = pyrosetta.rosetta.protocols.relax.ClassicRelax()
relax.set_scorefxn(sfxn)
relax.apply(pose)

Note that this protocol is DEPRECATED and has been for quite some time. You will want to FastRelax() instead. It still takes quite a while. Replace the ClassicRelax() with FastRelax() and run it now. You will see the FastRelax mover used in many tutorials from here on out. FastRelax with constraints on each atom is useful to get a crystal structure into the Rosetta energy function. FastRelax can also be used for flexible-backbone design. These will all be covered in due time.

In [2]:
# Notebook setup
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

from pyrosetta import *
from pyrosetta.teaching import *
init()
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.Release.python36.mac r208 2019.04+release.fd666910a5e fd666910a5edac957383b32b3b4c9d10020f34c1 http://www.pyrosetta.org 2019-01-22T15:55:37
core.init: command: PyRosetta -ex1 -ex2aro -database /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-1509889871 seed_offset=0 real_seed=-1509889871
core.init.random: RandomGenerator:init: Normal mode, seed=-1509889871 RG_type=mt19937
/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/teaching.py:13: UserWarning: Import of 'rosetta' as a top-level module is deprecated and may be removed in 2018, import via 'pyrosetta.rosetta'.
  from rosetta.core.scoring import *

Make sure you are in the directory with the pdb files:

cd google_drive/My\ Drive/student-notebooks/

In [ ]:
### BEGIN SOLUTION
sfxn = get_score_function()
pose = pose_from_pdb("inputs/1YY8.clean.pdb")
relax = pyrosetta.rosetta.protocols.relax.FastRelax()
relax.set_scorefxn(sfxn)

#Skip for tests
if not os.getenv("DEBUG"):
    relax.apply(pose)
    
### END SOLUTION
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 696 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 1.07793 seconds.
basic.io.database: Database file opened: scoring/score_functions/rama/fd/all.ramaProb
basic.io.database: Database file opened: scoring/score_functions/rama/fd/prepro.ramaProb
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.all.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.gly.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.pro.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.valile.txt
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n
core.scoring.P_AA: shapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated.
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop
core.import_pose.import_pose: File 'inputs/1YY8.clean.pdb' automatically determined to be of type PDB
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue ARG 18
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue ARG 18
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NE  on residue ARG 18
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CZ  on residue ARG 18
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NH1 on residue ARG 18
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NH2 on residue ARG 18
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue GLN:NtermProteinFull 214
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue GLN:NtermProteinFull 214
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE1 on residue GLN:NtermProteinFull 214
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NE2 on residue GLN:NtermProteinFull 214
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue ARG 452
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue ARG 452
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NE  on residue ARG 452
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CZ  on residue ARG 452
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NH1 on residue ARG 452
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NH2 on residue ARG 452
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue GLN:NtermProteinFull 648
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue GLN:NtermProteinFull 648
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE1 on residue GLN:NtermProteinFull 648
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NE2 on residue GLN:NtermProteinFull 648
core.conformation.Conformation: Found disulfide between residues 23 88
core.conformation.Conformation: current variant for 23 CYS
core.conformation.Conformation: current variant for 88 CYS
core.conformation.Conformation: current variant for 23 CYD
core.conformation.Conformation: current variant for 88 CYD
core.conformation.Conformation: Found disulfide between residues 134 194
core.conformation.Conformation: current variant for 134 CYS
core.conformation.Conformation: current variant for 194 CYS
core.conformation.Conformation: current variant for 134 CYD
core.conformation.Conformation: current variant for 194 CYD
core.conformation.Conformation: Found disulfide between residues 235 308
core.conformation.Conformation: current variant for 235 CYS
core.conformation.Conformation: current variant for 308 CYS
core.conformation.Conformation: current variant for 235 CYD
core.conformation.Conformation: current variant for 308 CYD
core.conformation.Conformation: Found disulfide between residues 359 415
core.conformation.Conformation: current variant for 359 CYS
core.conformation.Conformation: current variant for 415 CYS
core.conformation.Conformation: current variant for 359 CYD
core.conformation.Conformation: current variant for 415 CYD
core.conformation.Conformation: Found disulfide between residues 457 522
core.conformation.Conformation: current variant for 457 CYS
core.conformation.Conformation: current variant for 522 CYS
core.conformation.Conformation: current variant for 457 CYD
core.conformation.Conformation: current variant for 522 CYD
core.conformation.Conformation: Found disulfide between residues 568 628
core.conformation.Conformation: current variant for 568 CYS
core.conformation.Conformation: current variant for 628 CYS
core.conformation.Conformation: current variant for 568 CYD
core.conformation.Conformation: current variant for 628 CYD
core.conformation.Conformation: Found disulfide between residues 669 742
core.conformation.Conformation: current variant for 669 CYS
core.conformation.Conformation: current variant for 742 CYS
core.conformation.Conformation: current variant for 669 CYD
core.conformation.Conformation: current variant for 742 CYD
core.conformation.Conformation: Found disulfide between residues 793 849
core.conformation.Conformation: current variant for 793 CYS
core.conformation.Conformation: current variant for 849 CYS
core.conformation.Conformation: current variant for 793 CYD
core.conformation.Conformation: current variant for 849 CYD
core.pack.pack_missing_sidechains: packing residue number 18 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 214 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 452 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 648 because of missing atom number 6 atom name  CG
core.pack.task: Packer task: initialize from command line()
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat
core.scoring.elec.util: Read 40 countpair representative atoms
core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true.
core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.475769 seconds to load from binary
core.pack.pack_rotamers: built 85 rotamers at 4 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
protocols.relax.ClassicRelax: Setting up default relax setting
protocols.relax.ClassicRelax: 
protocols.relax.ClassicRelax: 
protocols.relax.ClassicRelax: ===================================================================
protocols.relax.ClassicRelax:    Stage 1
protocols.relax.ClassicRelax:    Ramping repulsives with 8 outer cycles and 1 inner cycles
core.pack.task: Packer task: initialize from command line()
core.pack.pack_rotamers: built 33948 rotamers at 868 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
core.pack.interaction_graph.interaction_graph_factory: High IG memory usage (>25 MB). If this becomes an issue, consider using a different interaction graph type.

Programming Exercises

  1. Use the Mover constructs to create a complex folding algorithm. Create a program to do the following:
    1. Five small moves
    2. Minimize
    3. Five shear moves
    4. Minimize
    5. Monte Carlo Metropolis criterion
    6. Repeat a–e 100 times
    7. Repeat a–f five times, each time decreasing the magnitude of the small and shear moves from 25° to 5° in 5° increments.

Sketch a flowchart, and submit both the flowchart and your code.

In [ ]:
 
  1. Ab initio folding algorithm. Based on the Monte Carlo energy optimization algorithm from Workshop #4, write a complete program that will fold a protein. A suggested algorithm involves preliminary low-resolution modifications by fragment insertion (first 9-mers, then 3-mers), followed by high-resolution refinement using small, shear, and minimization movers. Output both your low-resolution intermediate structure and the final refined, high-resolution decoy.

    Test your code by attempting to fold domain 2 of the RecA protein (the last 60 amino acid residues of PDB ID 2REB). How do your results compare with the crystal structure? (Consider both your low-resolution and high-resolution results.) If your lowest-energy conformation is different than the native structure, explain why this is so in terms of the limitations of the computational approach.

    Bonus: After using the PyMOL_Mover or PyMOL_Observer to record the trajectory, export the frames and tie them together to create an animation. Search the Internet for “PyMOL animation” for additional tools and tips. Animated GIF files are probably the best quality; MPEG and QuickTime formats are also popular and widely compatible and uploadable to YouTube.

In [ ]:
 

Thought Questions

  1. With $kT$ = 1, what is the change in propensity of the rama score component that has a 50% chance of being accepted as a small move?
  1. How would you test whether an algorithm is effective? That is, what kind of measures can you use? What can you vary within an algorithm to make it more effective?
In [ ]: