This notebook contains material from PyRosetta; content is available on Github.

Low-Res Scoring and Fragments

Keywords: centroid, SwitchResidueTypeSetMover(), create_score_function(), score3, fa_standard, ScoreFunction(), set_weight(), read_fragment_file(), ClassicFragmentMover()

In [ ]:
# Notebook setup
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

from pyrosetta import *
from pyrosetta.teaching import *
init()

Make sure you are in the directory with the pdb files:

cd google_drive/My\ Drive/student-notebooks/

Low-Resolution (Centroid) Scoring

Following the treatment of Simons et al. (1999), Rosetta can score a protein conformation using a low-resolution representation. This will make the energy calculation faster.

Load chain A of Ras, a protein from a the previous workshop 3. Also calculate the full-atom energy of the pose.

pose = pyrosetta.pose_from_pdb("6Q21_A.pdb")
sfxn = pyrosetta.get_score_function()
sfxn(pose)
In [7]:
### BEGIN SOLUTION
pose = pyrosetta.pose_from_pdb("inputs/6Q21_A.pdb")
sfxn = pyrosetta.get_score_function()
sfxn(pose)
### END SOLUTION
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 696 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 1.19396 seconds.
core.import_pose.import_pose: File '6Q21_A.pdb' automatically determined to be of type PDB
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv
basic.io.database: Database file opened: scoring/score_functions/rama/fd/all.ramaProb
basic.io.database: Database file opened: scoring/score_functions/rama/fd/prepro.ramaProb
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.all.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.gly.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.pro.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.valile.txt
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n
core.scoring.P_AA: shapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated.
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop
basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat
core.scoring.elec.util: Read 40 countpair representative atoms
core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true.
core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.456743 seconds to load from binary
Out[7]:
1215.729069796814

Question: Print residue 5. Note the number of atoms and coordinates of residue 5.

print(pose.residue(5))
In [8]:
### BEGIN SOLUTION
print(pose.residue(5))
### END SOLUTION
Residue 5: LYS (LYS, K):
Base: LYS
 Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA
 Variant types:
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H    HA 
 Side-chain atoms:  CB   CG   CD   CE   NZ  1HB  2HB  1HG  2HG  1HD  2HD  1HE  2HE  1HZ  2HZ  3HZ 
Atom Coordinates:
   N  : 20.315, 43.835, 78.015
   CA : 20.418, 42.863, 79.118
   C  : 19.697, 43.46, 80.329
   O  : 20.096, 44.486, 80.897
   CB : 21.858, 42.487, 79.491
   CG : 22.791, 42.176, 78.316
   CD : 22.406, 40.943, 77.485
   CE : 23.009, 40.932, 76.075
   NZ : 22.748, 42.169, 75.307
   H  : 21.0493, 44.5172, 77.8902
   HA : 19.9193, 41.9417, 78.815
  1HB : 22.3125, 43.3019, 80.0551
  2HB : 21.8492, 41.6078, 80.1356
  1HG : 22.8124, 43.0262, 77.6332
  2HG : 23.8008, 42.0064, 78.6884
  1HD : 22.7418, 40.0399, 77.9965
  2HD : 21.3219, 40.8985, 77.3807
  1HE : 24.088, 40.801, 76.1421
  2HE : 22.5982, 40.0953, 75.5101
  1HZ : 23.1708, 42.0938, 74.3926
  2HZ : 21.751, 42.2999, 75.2069
  3HZ : 23.1434, 42.9592, 75.7961
Mirrored relative to coordinates in ResidueType: FALSE

SwitchResidueTypeSetMover

Now, convert the pose to the centroid form by using a SwitchResidueTypeSetMover object and the apply method:

switch = SwitchResidueTypeSetMover("centroid")
switch.apply(pose)
print(pose.residue(5))

Question: How many atoms are now in residue 5? How is this different than before switching it into centroid mode?

In [9]:
### BEGIN SOLUTION
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(pose)
print(pose.residue(5))
### END SOLUTION
core.chemical.GlobalResidueTypeSet: Finished initializing centroid residue type set.  Created 62 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 0.058948 seconds.
Residue 5: LYS (LYS, K):
Base: LYS
 Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE SIDECHAIN_AMINE ALPHA_AA L_AA
 Variant types:
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H  
 Side-chain atoms:  CB   CEN
Atom Coordinates:
   N  : 20.315, 43.835, 78.015
   CA : 20.418, 42.863, 79.118
   C  : 19.697, 43.46, 80.329
   O  : 20.096, 44.486, 80.897
   CB : 21.8754, 42.543, 79.454
   CEN: 23.4957, 41.1851, 79.3707
   H  : 21.0493, 44.5172, 77.8902
Mirrored relative to coordinates in ResidueType: FALSE

Score the new, centroid-based pose by creating and using the standard centroid score function "score3".

cen_sfxn = pyrosetta.create_score_function("score3")
cen_sfxn(pose)

Question: What is the new total score? What scoring terms are included in "score3" (print the cen_sfxn)? Do these match Simons?

In [10]:
### BEGIN SOLUTION
cen_sfxn = pyrosetta.create_score_function("score3")
cen_sfxn(pose)
### END SOLUTION
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/env_log.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/cbeta_den.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/pair_log.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/cenpack_log.txt
basic.io.database: Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.HS.resmooth
basic.io.database: Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.SS.resmooth
Out[10]:
-4.509358254597608

Convert the pose back to all-atom form by using another switch object, SwitchResidueTypeSetMover("fa_standard").

fa_switch = SwitchResidueTypeSetMover("fa_standard")
fa_switch.apply(pose)
print(pose.residue(5))

Question: Confirm that you have all the atoms back. Are the atoms in the same coordinate position as before?

In [11]:
### BEGIN SOLUTION
fa_switch = SwitchResidueTypeSetMover("fa_standard")
fa_switch.apply(pose)
print(pose.residue(5))
### END SOLUTION
Residue 5: LYS (LYS, K):
Base: LYS
 Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA
 Variant types:
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H    HA 
 Side-chain atoms:  CB   CG   CD   CE   NZ  1HB  2HB  1HG  2HG  1HD  2HD  1HE  2HE  1HZ  2HZ  3HZ 
Atom Coordinates:
   N  : 20.315, 43.835, 78.015
   CA : 20.418, 42.863, 79.118
   C  : 19.697, 43.46, 80.329
   O  : 20.096, 44.486, 80.897
   CB : 21.8754, 42.5429, 79.4539
   CG : 22.8944, 43.244, 78.5655
   CD : 22.2113, 44.1202, 77.5262
   CE : 20.6967, 44.0574, 77.6573
   NZ : 20.2706, 43.1587, 78.7642
   H  : 21.0493, 44.5172, 77.8902
   HA : 19.9306, 41.9373, 78.8101
  1HB : 22.0814, 42.8255, 80.4867
  2HB : 22.0409, 41.4686, 79.3693
  1HG : 23.5468, 43.8654, 79.1801
  2HG : 23.5056, 42.4999, 78.0558
  1HD : 22.5361, 45.154, 77.6512
  2HD : 22.4937, 43.7874, 76.5274
  1HE : 20.3065, 45.0564, 77.8461
  2HE : 20.2655, 43.6934, 76.7248
  1HZ : 19.2619, 43.1443, 78.8176
  2HZ : 20.6116, 42.2236, 78.5898
  3HZ : 20.6485, 43.497, 79.6375
Mirrored relative to coordinates in ResidueType: FALSE

Exercise 1: Centroid Folding Algorithm

Go back and adjust your folding algorithm to use centroid mode. Create a ScoreFunction that uses only van der Waals (fa_atr and fa_rep) and hbond_sr_bb energy score terms.

Question: How much faster does your program run?

In [12]:
polyA = pyrosetta.pose_from_sequence('A' * 10)
polyA.pdb_info().name("polyA")

# Apply the SwitchResidueTypeSetMover to the pose polyA
### BEGIN SOLUTION
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(polyA)
### END SOLUTION
/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/teaching.py:13: UserWarning: Import of 'rosetta' as a top-level module is deprecated and may be removed in 2018, import via 'pyrosetta.rosetta'.
  from rosetta.core.scoring import *
In [13]:
# Create new score function with only VDW and hbond_sr_bb energy score terms.
### BEGIN SOLUTION
cen_sfxn = ScoreFunction()
cen_sfxn.set_weight(fa_atr, 1.0)
cen_sfxn.set_weight(fa_rep, 1.0)
cen_sfxn.set_weight(hbond_sr_bb, 1.0)
### END SOLUTION
In [14]:
# Use the basic_folding function in the previous chapter,
# overwrite your scoring subroutine, and run the program.

Note about Movers

Not counting the PyMOLMover, which is a special case, SwitchResidueTypeSetMover is the first example we have seen of a Mover class in PyRosetta. Every Mover object in PyRosetta has been designed to apply specific and complex changes (or “moves”) to a pose. Every Mover must be “constructed” and have any options set before being applied to a pose with the apply() method. SwitchResidueTypeSetMover has a relatively simple construction with only the single option "centroid". (Some Movers, as we shall see, require no options and are programmed to operate with default values).

Protein Fragments

Look at the provided 3mer.frags fragments. These fragments are generated from the Robetta server (http://robetta.bakerlab.org/fragmentsubmit.jsp) for a given sequence. You should see sets of three-lines describing each fragment.

Questions: For the first fragment, which PDB file does it come from? Is this fragment helical, sheet, in a loop, or a combination? What are the φ, ψ, and ω angles of the middle residue of the first fragment window?

In [ ]:
 

Create a new subroutine in your folding code for an alternate random move based upon a “fragment insertion”. A fragment insertion is the replacement of the torsion angles for a set of consecutive residues with new torsion angles pulled at random from a fragment library file. Prior to calling the subroutine, load the set of fragments from the fragment file:

from pyrosetta.rosetta.core.fragment import *
fragset = ConstantLengthFragSet(3)
fragset.read_fragment_file("3mer.frags")
In [15]:
### BEGIN SOLUTION
from pyrosetta.rosetta.core.fragment import *
fragset = ConstantLengthFragSet(3)
fragset.read_fragment_file("inputs/3mer.frags")
### END SOLUTION

Using FragmentMover and MoveMap

Next, we will construct another Mover object — this time a FragmentMover — using the above fragment set and a MoveMap object as options. A MoveMap specifies which degrees of freedom are allowed to change in the pose when the Mover is applied (in this case, all backbone torsion angles):

from pyrosetta.rosetta.protocols.simple_moves import ClassicFragmentMover
movemap = MoveMap()
movemap.set_bb(True)
mover_3mer = ClassicFragmentMover(fragset, movemap)
In [17]:
### BEGIN SOLUTION
from pyrosetta.rosetta.protocols.simple_moves import ClassicFragmentMover
movemap = MoveMap()
movemap.set_bb(True)
mover_3mer = ClassicFragmentMover(fragset, movemap)
### END SOLUTION

Note that when a MoveMap is constructed, all degrees of freedom are set to False initially. If you still have a PyMOL_Mover instantiated, you can quickly visualize which degrees of freedom will be allowed by sending your move map to PyMOL with

test_pose = pyrosetta.pose_from_sequence("RFPMMSTFKVLLCGAVLSRIDAG")
pmm.apply(test_pose)
pmm.send_movemap(test_pose, movemap)
In [20]:
### BEGIN SOLUTION
test_pose = pyrosetta.pose_from_sequence("RFPMMSTFKVLLCGAVLSRIDAG")
pmm = PyMOLMover()
pmm.apply(test_pose)
pmm.send_movemap(test_pose, movemap)
### END SOLUTION

Each time this mover is applied, it will select a random 3-mer window and insert only the backbone torsion angles from a random matching fragment in the fragment set. Here is an example using the above test_pose:

mover_3mer.apply(test_pose)
pmm.apply(test_pose)
In [21]:
### BEGIN SOLUTION
mover_3mer.apply(test_pose)
pmm.apply(test_pose)
### END SOLUTION

Exercise 2: Fragment Folding Algorithm

Question: When you change your random move in your poly-alanine folding algorithm to a fragment insertion, how much faster is your protocol? Does it converge to a protein-like conformation more quickly?

In [ ]:
 

Programming Exercises

  • Fold a 10-mer poly-alanine using 100 independent trajectories, using any variant of the folding algorithm that you like. (A trajectory is a path through the conformation space traveled during the calculation. The end result of each independent trajectory is called a “decoy”. Given enough sampling, the lowest energy decoy may correspond to the global minimum.) Create a Ramachandran plot using the lowest-scoring conformations (decoys) from all 100 independent trajectories. Repeat this for a 10-mer poly-glycine. How do the plots differ? Compare with the plots in Richardson’s article.
  • Test your folding program’s ability to predict a real fold from scratch. Choose a small protein to keep the computation time down, such as Hox-B1 homeobox protein (1B72) or RecA (2REB). How many iterations and how many independent trajectories do you need to run to find a good structure?
  • Modify your folding program to include a simulated annealing temperature schedule, decaying exponentially from kT = 100 to kT = 0.1 over the course of the search. Again, fold a test protein. Does this approach work better? Modify your folding program to remove the Metropolis criterion and instead accept trial moves only when the energy decreases. Plot energy vs. iteration and examine the final output structures from multiple runs. How is the convergence and performance affected? Why?

Thought Questions

  • [Introductory] What are the limitations of these types of folding algorithms?
  • [Advanced] How might you design an intermediate-resolution representation of side chains that has more detail than the centroid approach yet is faster than the full-atom approach? Which types of residues would most benefit from this type of representation?
In [ ]: