Keywords: centroid, SwitchResidueTypeSetMover(), create_score_function(), score3, fa_standard, ScoreFunction(), set_weight(), read_fragment_file(), ClassicFragmentMover()
# Notebook setup
import sys
if 'google.colab' in sys.modules:
!pip install pyrosettacolabsetup
import pyrosettacolabsetup
pyrosettacolabsetup.setup()
print ("Notebook is set for PyRosetta use in Colab. Have fun!")
from pyrosetta import *
from pyrosetta.teaching import *
init()
Make sure you are in the directory with the pdb files:
cd google_drive/My\ Drive/student-notebooks/
Following the treatment of Simons et al. (1999), Rosetta can score a protein conformation using a low-resolution representation. This will make the energy calculation faster.
Load chain A of Ras, a protein from a the previous workshop 3. Also calculate the full-atom energy of the pose.
pose = pyrosetta.pose_from_pdb("6Q21_A.pdb")
sfxn = pyrosetta.get_score_function()
sfxn(pose)
### BEGIN SOLUTION
pose = pyrosetta.pose_from_pdb("inputs/6Q21_A.pdb")
sfxn = pyrosetta.get_score_function()
sfxn(pose)
### END SOLUTION
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set. Created 696 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 1.19396 seconds.
core.import_pose.import_pose: File '6Q21_A.pdb' automatically determined to be of type PDB
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv
basic.io.database: Database file opened: scoring/score_functions/rama/fd/all.ramaProb
basic.io.database: Database file opened: scoring/score_functions/rama/fd/prepro.ramaProb
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.all.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.gly.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.pro.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.valile.txt
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n
core.scoring.P_AA: shapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated.
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop
basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat
core.scoring.elec.util: Read 40 countpair representative atoms
core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true.
core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.456743 seconds to load from binary
1215.729069796814
Question: Print residue 5. Note the number of atoms and coordinates of residue 5.
print(pose.residue(5))
### BEGIN SOLUTION
print(pose.residue(5))
### END SOLUTION
Residue 5: LYS (LYS, K): Base: LYS Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA Variant types: Main-chain atoms: N CA C Backbone atoms: N CA C O H HA Side-chain atoms: CB CG CD CE NZ 1HB 2HB 1HG 2HG 1HD 2HD 1HE 2HE 1HZ 2HZ 3HZ Atom Coordinates: N : 20.315, 43.835, 78.015 CA : 20.418, 42.863, 79.118 C : 19.697, 43.46, 80.329 O : 20.096, 44.486, 80.897 CB : 21.858, 42.487, 79.491 CG : 22.791, 42.176, 78.316 CD : 22.406, 40.943, 77.485 CE : 23.009, 40.932, 76.075 NZ : 22.748, 42.169, 75.307 H : 21.0493, 44.5172, 77.8902 HA : 19.9193, 41.9417, 78.815 1HB : 22.3125, 43.3019, 80.0551 2HB : 21.8492, 41.6078, 80.1356 1HG : 22.8124, 43.0262, 77.6332 2HG : 23.8008, 42.0064, 78.6884 1HD : 22.7418, 40.0399, 77.9965 2HD : 21.3219, 40.8985, 77.3807 1HE : 24.088, 40.801, 76.1421 2HE : 22.5982, 40.0953, 75.5101 1HZ : 23.1708, 42.0938, 74.3926 2HZ : 21.751, 42.2999, 75.2069 3HZ : 23.1434, 42.9592, 75.7961 Mirrored relative to coordinates in ResidueType: FALSE
Now, convert the pose
to the centroid form by using a SwitchResidueTypeSetMover
object and the apply method:
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(pose)
print(pose.residue(5))
Question: How many atoms are now in residue 5? How is this different than before switching it into centroid mode?
### BEGIN SOLUTION
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(pose)
print(pose.residue(5))
### END SOLUTION
core.chemical.GlobalResidueTypeSet: Finished initializing centroid residue type set. Created 62 residue types core.chemical.GlobalResidueTypeSet: Total time to initialize 0.058948 seconds. Residue 5: LYS (LYS, K): Base: LYS Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE SIDECHAIN_AMINE ALPHA_AA L_AA Variant types: Main-chain atoms: N CA C Backbone atoms: N CA C O H Side-chain atoms: CB CEN Atom Coordinates: N : 20.315, 43.835, 78.015 CA : 20.418, 42.863, 79.118 C : 19.697, 43.46, 80.329 O : 20.096, 44.486, 80.897 CB : 21.8754, 42.543, 79.454 CEN: 23.4957, 41.1851, 79.3707 H : 21.0493, 44.5172, 77.8902 Mirrored relative to coordinates in ResidueType: FALSE
Score the new, centroid-based pose by creating and using the standard centroid score function "score3".
cen_sfxn = pyrosetta.create_score_function("score3")
cen_sfxn(pose)
Question: What is the new total score? What scoring terms are included in "score3" (print
the cen_sfxn
)? Do these match Simons?
### BEGIN SOLUTION
cen_sfxn = pyrosetta.create_score_function("score3")
cen_sfxn(pose)
### END SOLUTION
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/env_log.txt basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/cbeta_den.txt basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/pair_log.txt basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/cenpack_log.txt basic.io.database: Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.HS.resmooth basic.io.database: Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.SS.resmooth
-4.509358254597608
Convert the pose
back to all-atom form by using another switch object, SwitchResidueTypeSetMover("fa_standard")
.
fa_switch = SwitchResidueTypeSetMover("fa_standard")
fa_switch.apply(pose)
print(pose.residue(5))
Question: Confirm that you have all the atoms back. Are the atoms in the same coordinate position as before?
### BEGIN SOLUTION
fa_switch = SwitchResidueTypeSetMover("fa_standard")
fa_switch.apply(pose)
print(pose.residue(5))
### END SOLUTION
Residue 5: LYS (LYS, K): Base: LYS Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA Variant types: Main-chain atoms: N CA C Backbone atoms: N CA C O H HA Side-chain atoms: CB CG CD CE NZ 1HB 2HB 1HG 2HG 1HD 2HD 1HE 2HE 1HZ 2HZ 3HZ Atom Coordinates: N : 20.315, 43.835, 78.015 CA : 20.418, 42.863, 79.118 C : 19.697, 43.46, 80.329 O : 20.096, 44.486, 80.897 CB : 21.8754, 42.5429, 79.4539 CG : 22.8944, 43.244, 78.5655 CD : 22.2113, 44.1202, 77.5262 CE : 20.6967, 44.0574, 77.6573 NZ : 20.2706, 43.1587, 78.7642 H : 21.0493, 44.5172, 77.8902 HA : 19.9306, 41.9373, 78.8101 1HB : 22.0814, 42.8255, 80.4867 2HB : 22.0409, 41.4686, 79.3693 1HG : 23.5468, 43.8654, 79.1801 2HG : 23.5056, 42.4999, 78.0558 1HD : 22.5361, 45.154, 77.6512 2HD : 22.4937, 43.7874, 76.5274 1HE : 20.3065, 45.0564, 77.8461 2HE : 20.2655, 43.6934, 76.7248 1HZ : 19.2619, 43.1443, 78.8176 2HZ : 20.6116, 42.2236, 78.5898 3HZ : 20.6485, 43.497, 79.6375 Mirrored relative to coordinates in ResidueType: FALSE
Go back and adjust your folding algorithm to use centroid mode. Create a ScoreFunction
that uses only van der Waals (fa_atr
and fa_rep
) and hbond_sr_bb
energy score terms.
Question: How much faster does your program run?
polyA = pyrosetta.pose_from_sequence('A' * 10)
polyA.pdb_info().name("polyA")
# Apply the SwitchResidueTypeSetMover to the pose polyA
### BEGIN SOLUTION
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(polyA)
### END SOLUTION
/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/teaching.py:13: UserWarning: Import of 'rosetta' as a top-level module is deprecated and may be removed in 2018, import via 'pyrosetta.rosetta'. from rosetta.core.scoring import *
# Create new score function with only VDW and hbond_sr_bb energy score terms.
### BEGIN SOLUTION
cen_sfxn = ScoreFunction()
cen_sfxn.set_weight(fa_atr, 1.0)
cen_sfxn.set_weight(fa_rep, 1.0)
cen_sfxn.set_weight(hbond_sr_bb, 1.0)
### END SOLUTION
# Use the basic_folding function in the previous chapter,
# overwrite your scoring subroutine, and run the program.
Movers
¶Not counting the PyMOLMover
, which is a special case, SwitchResidueTypeSetMover
is the first example we have seen of a Mover
class in PyRosetta. Every Mover
object in PyRosetta has been designed to apply specific and complex changes (or “moves”) to a pose
. Every Mover
must be “constructed” and have any options set before being applied to a pose
with the apply()
method. SwitchResidueTypeSetMover
has a relatively simple construction with only the single option "centroid"
. (Some Movers
, as we shall see, require no options and are programmed to operate with default values).
Look at the provided 3mer.frags
fragments. These fragments are generated from the Robetta server (http://robetta.bakerlab.org/fragmentsubmit.jsp) for a given sequence. You should see sets of three-lines describing each fragment.
Questions: For the first fragment, which PDB file does it come from? Is this fragment helical, sheet, in a loop, or a combination? What are the φ, ψ, and ω angles of the middle residue of the first fragment window?
Create a new subroutine in your folding code for an alternate random move based upon a “fragment insertion”. A fragment insertion is the replacement of the torsion angles for a set of consecutive residues with new torsion angles pulled at random from a fragment library file. Prior to calling the subroutine, load the set of fragments from the fragment file:
from pyrosetta.rosetta.core.fragment import *
fragset = ConstantLengthFragSet(3)
fragset.read_fragment_file("3mer.frags")
### BEGIN SOLUTION
from pyrosetta.rosetta.core.fragment import *
fragset = ConstantLengthFragSet(3)
fragset.read_fragment_file("inputs/3mer.frags")
### END SOLUTION
Next, we will construct another Mover
object — this time a FragmentMover
— using the above fragment set and a MoveMap
object as options. A MoveMap
specifies which degrees of freedom are allowed to change in the pose
when the Mover
is applied (in this case, all backbone torsion angles):
from pyrosetta.rosetta.protocols.simple_moves import ClassicFragmentMover
movemap = MoveMap()
movemap.set_bb(True)
mover_3mer = ClassicFragmentMover(fragset, movemap)
### BEGIN SOLUTION
from pyrosetta.rosetta.protocols.simple_moves import ClassicFragmentMover
movemap = MoveMap()
movemap.set_bb(True)
mover_3mer = ClassicFragmentMover(fragset, movemap)
### END SOLUTION
Note that when a MoveMap is constructed, all degrees of freedom are set to False initially. If you still have a PyMOL_Mover instantiated, you can quickly visualize which degrees of freedom will be allowed by sending your move map to PyMOL with
test_pose = pyrosetta.pose_from_sequence("RFPMMSTFKVLLCGAVLSRIDAG")
pmm.apply(test_pose)
pmm.send_movemap(test_pose, movemap)
### BEGIN SOLUTION
test_pose = pyrosetta.pose_from_sequence("RFPMMSTFKVLLCGAVLSRIDAG")
pmm = PyMOLMover()
pmm.apply(test_pose)
pmm.send_movemap(test_pose, movemap)
### END SOLUTION
Each time this mover is applied, it will select a random 3-mer window and insert only the backbone torsion angles from a random matching fragment in the fragment set. Here is an example using the above test_pose
:
mover_3mer.apply(test_pose)
pmm.apply(test_pose)
### BEGIN SOLUTION
mover_3mer.apply(test_pose)
pmm.apply(test_pose)
### END SOLUTION
Question: When you change your random move in your poly-alanine folding algorithm to a fragment insertion, how much faster is your protocol? Does it converge to a protein-like conformation more quickly?
Fold a 10-mer poly-alanine using 100 independent trajectories, using any variant of the folding algorithm that you like. (A trajectory is a path through the conformation space traveled during the calculation. The end result of each independent trajectory is called a “decoy”. Given enough sampling, the lowest energy decoy may correspond to the global minimum.) Create a Ramachandran plot using the lowest-scoring conformations (decoys) from all 100 independent trajectories. Repeat this for a 10-mer poly-glycine. How do the plots differ? Compare with the plots in Richardson’s article.
Test your folding program’s ability to predict a real fold from scratch. Choose a small protein to keep the computation time down, such as Hox-B1 homeobox protein (1B72) or RecA (2REB). How many iterations and how many independent trajectories do you need to run to find a good structure?
Modify your folding program to include a simulated annealing temperature schedule, decaying exponentially from kT = 100 to kT = 0.1 over the course of the search. Again, fold a test protein. Does this approach work better?
Modify your folding program to remove the Metropolis criterion and instead accept trial moves only when the energy decreases. Plot energy vs. iteration and examine the final output structures from multiple runs. How is the convergence and performance affected? Why?
[Introductory] What are the limitations of these types of folding algorithms?
[Advanced] How might you design an intermediate-resolution representation of side chains that has more detail than the centroid approach yet is faster than the full-atom approach? Which types of residues would most benefit from this type of representation?