This notebook contains material from PyRosetta; content is available on Github.

Basic Folding Algorithm

Keywords: pose_from_sequence(), random move, scoring move, Metropolis, assign(), Pose()

In [ ]:
# Notebook setup
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

from pyrosetta import *
from pyrosetta.teaching import *
init()

Building the Pose

In this workshop, you will be folding a 10 residue protein by building a simple de novo folding algorithm. Start by initializing PyRosetta as usual.

Create a simple poly-alanine pose with 10 residues for testing your folding algorithm. Store the pose in a variable called "polyA."

In [35]:
### BEGIN SOLUTION
polyA = pyrosetta.pose_from_sequence('A' * 10)
### END SOLUTION

polyA.pdb_info().name("polyA")

Question: Check the backbone dihedrals of a few residues (except the first and last) using the .phi() and .psi() methods in Pose. What are the values of $\phi$ and $\psi$ dihedrals? You should see ideal bond lengths and angles, but the dihedrals may not be as realistic.

In [36]:
### BEGIN SOLUTION
print("phi: %i" %polyA.phi(9))
print("psi: %i" %polyA.psi(9))
### END SOLUTION
phi: 180
psi: 180

OPTIONAL: We may want to visualize folding as it happens. Before starting with the folding protocol, instantiate a PyMOL mover and use a UNIQUE port number between 10,000 and 65,536. We will retain history in order to view the entire folding process by utilizing the .keep_history() method. Make sure it says PyMOL <---> PyRosetta link started! on its command line.

In [37]:
pmm = PyMOLMover()
pmm.keep_history(True)

Use the PyMOL mover to view the polyA Pose. You should see a long thread-like structure in PyMOL.

In [38]:
pmm.apply(polyA)

Building A Basic de Novo Folding Algorithm

Now, write a program that implements a Monte Carlo algorithm to optimize the protein conformation. You can do this here in the notebook, or you may use a code editor to write a .py file and execute in a Python or iPython shell.

Our main program will include 100 iterations of making a random trial move, scoring the protein, and accepting/rejecting the move. Therefore, we can break this algorithm down into three smaller subroutines: random, score, and decision.

Step 1: Random Move

For the random trial move, write a subroutine to choose one residue at random using random.randint() and then randomly perturb either the φ or ψ angles by a random number chosen from a Gaussian distribution. Use the Python built-in function random.gauss() from the random library with a mean of the current angle and a standard deviation of 25°. After changing the torsion angle, use pmm.apply(polyA) to update the structure in PyMOL.

In [39]:
import math
import random

def randTrial(your_pose):
### BEGIN SOLUTION
    randNum = random.randint(2, your_pose.total_residue())
    currPhi = your_pose.phi(randNum)
    currPsi = your_pose.psi(randNum)
    newPhi = random.gauss(currPhi, 25)
    newPsi = random.gauss(currPsi, 25)
    your_pose.set_phi(randNum,newPhi) 
    your_pose.set_psi(randNum,newPsi)
    pmm.apply(your_pose)
### END SOLUTION
    return your_pose

Step 2: Scoring Move

For the scoring step, we need to create a scoring function and make a subroutine that simply returns the numerical energy score of the pose.

In [40]:
sfxn = get_fa_scorefxn()

def score(your_pose):
    ### BEGIN SOLUTION
    return sfxn(your_pose)
    ### END SOLUTION
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015

Step 3: Accepting/Rejecting Move

For the decision step, we need to make a subroutine that either accepts or rejects the new conformatuon based on the Metropolis criterion. The Metropolis criterion has a probability of accepting a move as $P = \exp( -\Delta G / kT )$. When $ΔE ≥ 0$, the Metropolis criterion probability of accepting the move is $P = \exp( -\Delta G / kT )$. When $ΔE < 0$, the Metropolis criterion probability of accepting the move is $P = 1$. Use $kT = 1$ Rosetta Energy Unit (REU).

In [41]:
def decision(before_pose, after_pose):
    ### BEGIN SOLUTION
    E = score(after_pose) - score(before_pose)
    if E < 0:
        return after_pose
    elif random.uniform(0, 1) >= math.exp(-E/1):
        return before_pose
    else:
        return after_pose
    ### END SOLUTION

Step 4: Execution

Now we can put these three subroutines together in our main program! Write a loop in the main program so that it performs 100 iterations of: making a random trial move, scoring the protein, and accepting/rejecting the move.

After each iteration of the search, output the current pose energy and the lowest energy ever observed. The final output of this program should be the lowest energy conformation that is achieved at any point during the simulation. Be sure to use low_pose.assign(pose) rather than low_pose = pose, since the latter will only copy a pointer to the original pose.

In [43]:
def basic_folding(your_pose):
    """Your basic folding algorithm that completes 100 Monte-Carlo iterations on a given pose"""
    
    lowest_pose = Pose() # Create an empty pose for tracking the lowest energy pose.
    
    ### BEGIN SOLUTION
    for i in range(100):
        if i == 0:
            lowest_pose.assign(your_pose)
            
        before_pose = Pose()
        before_pose.assign(your_pose) # keep track of pose before random move

        after_pose = Pose()
        after_pose.assign(randTrial(your_pose)) # do random move and store the pose
        
        your_pose.assign(decision(before_pose, after_pose)) # keep the new pose or old pose
        
        if score(your_pose) < score(lowest_pose): # updating lowest pose
            lowest_pose.assign(your_pose)
        
        print("Iteration # %i" %i) # output   
        print("Current pose score: %1.3f" %score(your_pose)) # output
        print("Lowest pose score: %1.3f" %score(lowest_pose)) # output
        
    ### END SOLUTION
    return lowest_pose

Finally, output the last pose and the lowest-scoring pose observed and view them in PyMOL. Plot the energy and lowest-energy observed vs. cycle number. What are the energies of the initial, last, and lowest-scoring pose? Is your program working? Has it converged to a good solution?

In [44]:
basic_folding(polyA)
Iteration # 0
Current pose score: 29.926
Lowest pose score: 29.926
Iteration # 1
Current pose score: 29.926
Lowest pose score: 29.926
Iteration # 2
Current pose score: 27.155
Lowest pose score: 27.155
Iteration # 3
Current pose score: 26.881
Lowest pose score: 26.881
Iteration # 4
Current pose score: 26.881
Lowest pose score: 26.881
Iteration # 5
Current pose score: 26.881
Lowest pose score: 26.881
Iteration # 6
Current pose score: 25.102
Lowest pose score: 25.102
Iteration # 7
Current pose score: 26.531
Lowest pose score: 25.102
Iteration # 8
Current pose score: 26.531
Lowest pose score: 25.102
Iteration # 9
Current pose score: 26.116
Lowest pose score: 25.102
Iteration # 10
Current pose score: 26.116
Lowest pose score: 25.102
Iteration # 11
Current pose score: 25.488
Lowest pose score: 25.102
Iteration # 12
Current pose score: 24.199
Lowest pose score: 24.199
Iteration # 13
Current pose score: 24.442
Lowest pose score: 24.199
Iteration # 14
Current pose score: 24.442
Lowest pose score: 24.199
Iteration # 15
Current pose score: 24.442
Lowest pose score: 24.199
Iteration # 16
Current pose score: 22.996
Lowest pose score: 22.996
Iteration # 17
Current pose score: 22.849
Lowest pose score: 22.849
Iteration # 18
Current pose score: 22.849
Lowest pose score: 22.849
Iteration # 19
Current pose score: 22.849
Lowest pose score: 22.849
Iteration # 20
Current pose score: 26.195
Lowest pose score: 22.849
Iteration # 21
Current pose score: 26.195
Lowest pose score: 22.849
Iteration # 22
Current pose score: 25.703
Lowest pose score: 22.849
Iteration # 23
Current pose score: 25.703
Lowest pose score: 22.849
Iteration # 24
Current pose score: 23.290
Lowest pose score: 22.849
Iteration # 25
Current pose score: 23.100
Lowest pose score: 22.849
Iteration # 26
Current pose score: 23.472
Lowest pose score: 22.849
Iteration # 27
Current pose score: 23.472
Lowest pose score: 22.849
Iteration # 28
Current pose score: 23.472
Lowest pose score: 22.849
Iteration # 29
Current pose score: 21.248
Lowest pose score: 21.248
Iteration # 30
Current pose score: 21.664
Lowest pose score: 21.248
Iteration # 31
Current pose score: 20.803
Lowest pose score: 20.803
Iteration # 32
Current pose score: 20.803
Lowest pose score: 20.803
Iteration # 33
Current pose score: 20.803
Lowest pose score: 20.803
Iteration # 34
Current pose score: 20.803
Lowest pose score: 20.803
Iteration # 35
Current pose score: 21.181
Lowest pose score: 20.803
Iteration # 36
Current pose score: 21.181
Lowest pose score: 20.803
Iteration # 37
Current pose score: 21.181
Lowest pose score: 20.803
Iteration # 38
Current pose score: 21.181
Lowest pose score: 20.803
Iteration # 39
Current pose score: 20.994
Lowest pose score: 20.803
Iteration # 40
Current pose score: 20.994
Lowest pose score: 20.803
Iteration # 41
Current pose score: 20.994
Lowest pose score: 20.803
Iteration # 42
Current pose score: 21.001
Lowest pose score: 20.803
Iteration # 43
Current pose score: 21.001
Lowest pose score: 20.803
Iteration # 44
Current pose score: 21.001
Lowest pose score: 20.803
Iteration # 45
Current pose score: 18.952
Lowest pose score: 18.952
Iteration # 46
Current pose score: 18.959
Lowest pose score: 18.952
Iteration # 47
Current pose score: 18.959
Lowest pose score: 18.952
Iteration # 48
Current pose score: 18.959
Lowest pose score: 18.952
Iteration # 49
Current pose score: 18.959
Lowest pose score: 18.952
Iteration # 50
Current pose score: 18.959
Lowest pose score: 18.952
Iteration # 51
Current pose score: 18.959
Lowest pose score: 18.952
Iteration # 52
Current pose score: 18.844
Lowest pose score: 18.844
Iteration # 53
Current pose score: 19.462
Lowest pose score: 18.844
Iteration # 54
Current pose score: 16.719
Lowest pose score: 16.719
Iteration # 55
Current pose score: 16.719
Lowest pose score: 16.719
Iteration # 56
Current pose score: 16.719
Lowest pose score: 16.719
Iteration # 57
Current pose score: 16.591
Lowest pose score: 16.591
Iteration # 58
Current pose score: 16.591
Lowest pose score: 16.591
Iteration # 59
Current pose score: 16.284
Lowest pose score: 16.284
Iteration # 60
Current pose score: 15.937
Lowest pose score: 15.937
Iteration # 61
Current pose score: 15.937
Lowest pose score: 15.937
Iteration # 62
Current pose score: 16.357
Lowest pose score: 15.937
Iteration # 63
Current pose score: 16.357
Lowest pose score: 15.937
Iteration # 64
Current pose score: 16.357
Lowest pose score: 15.937
Iteration # 65
Current pose score: 17.403
Lowest pose score: 15.937
Iteration # 66
Current pose score: 17.386
Lowest pose score: 15.937
Iteration # 67
Current pose score: 17.373
Lowest pose score: 15.937
Iteration # 68
Current pose score: 16.537
Lowest pose score: 15.937
Iteration # 69
Current pose score: 16.537
Lowest pose score: 15.937
Iteration # 70
Current pose score: 16.760
Lowest pose score: 15.937
Iteration # 71
Current pose score: 16.760
Lowest pose score: 15.937
Iteration # 72
Current pose score: 15.756
Lowest pose score: 15.756
Iteration # 73
Current pose score: 15.756
Lowest pose score: 15.756
Iteration # 74
Current pose score: 15.756
Lowest pose score: 15.756
Iteration # 75
Current pose score: 15.881
Lowest pose score: 15.756
Iteration # 76
Current pose score: 15.881
Lowest pose score: 15.756
Iteration # 77
Current pose score: 15.881
Lowest pose score: 15.756
Iteration # 78
Current pose score: 16.610
Lowest pose score: 15.756
Iteration # 79
Current pose score: 15.449
Lowest pose score: 15.449
Iteration # 80
Current pose score: 14.957
Lowest pose score: 14.957
Iteration # 81
Current pose score: 14.957
Lowest pose score: 14.957
Iteration # 82
Current pose score: 14.957
Lowest pose score: 14.957
Iteration # 83
Current pose score: 14.957
Lowest pose score: 14.957
Iteration # 84
Current pose score: 14.957
Lowest pose score: 14.957
Iteration # 85
Current pose score: 14.986
Lowest pose score: 14.957
Iteration # 86
Current pose score: 15.092
Lowest pose score: 14.957
Iteration # 87
Current pose score: 15.092
Lowest pose score: 14.957
Iteration # 88
Current pose score: 17.306
Lowest pose score: 14.957
Iteration # 89
Current pose score: 17.306
Lowest pose score: 14.957
Iteration # 90
Current pose score: 17.306
Lowest pose score: 14.957
Iteration # 91
Current pose score: 17.077
Lowest pose score: 14.957
Iteration # 92
Current pose score: 17.077
Lowest pose score: 14.957
Iteration # 93
Current pose score: 17.077
Lowest pose score: 14.957
Iteration # 94
Current pose score: 17.077
Lowest pose score: 14.957
Iteration # 95
Current pose score: 17.077
Lowest pose score: 14.957
Iteration # 96
Current pose score: 17.097
Lowest pose score: 14.957
Iteration # 97
Current pose score: 18.420
Lowest pose score: 14.957
Iteration # 98
Current pose score: 18.398
Lowest pose score: 14.957
Iteration # 99
Current pose score: 18.247
Lowest pose score: 14.957
Out[44]:
<pyrosetta.rosetta.core.pose.Pose at 0x129b6fc38>

Here's an example of the PyMOL view:

In [1]:
from IPython.display import Image
Image('./Media/folding.gif',width='300')
Out[1]:
<IPython.core.display.Image object>

Exercise 1: Comparing to Alpha Helices

Using the program you wrote for Workshop #2, force the $A_{10}$ sequence into an ideal α-helix.

Questions: Does this helical structure have a lower score than that produced by your folding algorithm above? What does this mean about your sampling or discrimination?

Exercise 2: Optimizing Algorithm

Since your program is a stochastic search algorithm, it may not produce an ideal structure consistently, so try running the simulation multiple times or with a different number of cycles (if necessary). Using a kT of 1, your program may need to make up to 500,000 iterations.

In [ ]: