This notebook contains material from PyRosetta; content is available on Github.

Pose Basics

Keywords: pose_from_pdb(), sequence(), cleanATOM, annotated_sequence()

In this lab, we will get practice working with the Pose class in PyRosetta. We will load in a protein from a PDB files, use the Pose class to learn about the geometry of the protein, make changes to this geometry, and visualize the changes easily with PyMOL and PyRosetta's PyMOLMover.

On the corresponding Pose lab found on the PyRosetta website, you will find various useful commands to interrogate poses; these may come in handy for the exercises.

PyRosetta Installation: The following two lines will load in the PyRosetta library and load in database files. If this does not work, please notify the professor or the TA.

In [5]:
# Notebook setup
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")
In [1]:
from pyrosetta import *
init()
PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.33+release.1e60c63beb532fd475f0f704d68d462b8af2a977 2019-08-09T15:19:57] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Reading fconfig.../Users/jadolfbr/.rosetta/flags/common
core.init: 
core.init: 
core.init: Rosetta version: PyRosetta4.Release.python36.mac r230 2019.33+release.1e60c63beb5 1e60c63beb532fd475f0f704d68d462b8af2a977 http://www.pyrosetta.org 2019-08-09T15:19:57
core.init: command: PyRosetta -ex1 -ex2aro -database /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyrosetta-2019.33+release.1e60c63beb5-py3.6-macosx-10.6-intel.egg/pyrosetta/database
basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=1859671084 seed_offset=0 real_seed=1859671084
basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=1859671084 RG_type=mt19937

Loading in a PDB File

Protein Data Bank (PDB) is a text file format for describing 3D molecular structures and other information. Rosetta can read in PDB files and can output them as well. In addition to PDB, mmTF and mmCIF are a couple other file formats that are used with Rosetta.

We will spend some time today looking at the crystal structure for the protein PafA (PDB ID: 5tj3) using Pyrosetta. PafA is an alkaline phosphatase, which removes a phosphate group from a phosphate monoester. In this structure, a modified amino acid, phosphothreonine, is used to mimic the substrate in the active site. Let's load in this structure with PyRosetta (make sure that you have the PDB file located in your current directory):

cd google_drive/My\ Drive/student-notebooks/

pose = pose_from_pdb("5tj3.pdb")

Here we are inputting the PDB file using the pose_from_pdb method. However, we can also load this structure from the internet with pose_from_rcsb("5TJ3").

In [2]:
### BEGIN SOLUTION
pose = pose_from_pdb("inputs/5tj3.pdb")
### END SOLUTION
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 980 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 0.940273 seconds.
core.import_pose.import_pose: File 'inputs/5tj3.pdb' automatically determined to be of type PDB
core.io.pdb.pdb_reader: Parsing 4225 .pdb records with unknown format to search for Rosetta-specific comments.
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 233
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 233
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 233
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 233
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue ASP 350
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OD1 on residue ASP 350
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OD2 on residue ASP 350
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 353
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 353
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 353
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 353
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue GLU 354
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue GLU 354
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE1 on residue GLU 354
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE2 on residue GLU 354
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 382
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 382
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 382
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 382
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue TYR 454
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD1 on residue TYR 454
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD2 on residue TYR 454
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE1 on residue TYR 454
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE2 on residue TYR 454
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CZ  on residue TYR 454
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OH  on residue TYR 454
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OXT on residue GLY:CtermProteinFull 520
core.pack.pack_missing_sidechains: packing residue number 233 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 350 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 353 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 354 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 382 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 454 because of missing atom number 6 atom name  CG
core.pack.task: Packer task: initialize from command line()
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv
basic.io.database: Database file opened: scoring/score_functions/rama/fd/all.ramaProb
basic.io.database: Database file opened: scoring/score_functions/rama/fd/prepro.ramaProb
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.all.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.gly.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.pro.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.valile.txt
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n
core.scoring.P_AA: shapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated.
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop
basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat
core.scoring.elec.util: Read 40 countpair representative atoms
core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true.
core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyrosetta-2019.33+release.1e60c63beb5-py3.6-macosx-10.6-intel.egg/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyrosetta-2019.33+release.1e60c63beb5-py3.6-macosx-10.6-intel.egg/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.239879 seconds to load from binary
core.pack.pack_rotamers: built 96 rotamers at 6 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph

What is a Pose?

The Pose class includes various types of information that describe a structure. Some of the core components include the Energies, PDBInfo, and Conformation. See the Rosetta3 paper to learn more: https://www.sciencedirect.com/science/article/pii/B9780123812704000196

As an example, let's use our pose to look at the sequence of 5TJ3: pose.sequence()

In [3]:
# print out the sequence of the pose
### BEGIN SOLUTION
pose.sequence()
### END SOLUTION
Out[3]:
'NAVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVTAIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIGZZZZ'

Sometimes PDB files do not conform to standards and need to be cleaned to be loaded successfully with PyRosetta. One way to make sure the file is loaded successfully is to only include the ATOM lines from the PDB file. Alternatively, you could use the cleanATOM function in pyrosetta.toolbox to achieve the same:

In [3]:
from pyrosetta.toolbox import cleanATOM
cleanATOM("inputs/5tj3.pdb")

This method will create a cleaned 5tj3.clean.pdb file for you. Lets load this into PyRosetta as well:

In [5]:
pose_clean = pose_from_pdb("inputs/5tj3.clean.pdb")
core.import_pose.import_pose: File '5tj3.clean.pdb' automatically determined to be of type PDB
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 232
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 232
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 232
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 232
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue ASP 349
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OD1 on residue ASP 349
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OD2 on residue ASP 349
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 352
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 352
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 352
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 352
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue GLU 353
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue GLU 353
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE1 on residue GLU 353
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OE2 on residue GLU 353
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue LYS 381
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD  on residue LYS 381
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE  on residue LYS 381
core.conformation.Conformation: [ WARNING ] missing heavyatom:  NZ  on residue LYS 381
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CG  on residue TYR 453
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD1 on residue TYR 453
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CD2 on residue TYR 453
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE1 on residue TYR 453
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CE2 on residue TYR 453
core.conformation.Conformation: [ WARNING ] missing heavyatom:  CZ  on residue TYR 453
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OH  on residue TYR 453
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OXT on residue GLY:CtermProteinFull 519
core.pack.pack_missing_sidechains: packing residue number 232 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 349 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 352 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 353 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 381 because of missing atom number 6 atom name  CG
core.pack.pack_missing_sidechains: packing residue number 453 because of missing atom number 6 atom name  CG
core.pack.task: Packer task: initialize from command line()
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.pack.pack_rotamers: built 90 rotamers at 6 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph

In our case, we could load in the PDB file for 5tj3 without cleaning it. In fact, we've lost some residues when cleaning the PDB file with cleanATOM. What is the difference in the sequence of the pose_clean now, compared to before?

In [6]:
# print out the sequence of the pose_clean
### BEGIN SOLUTION
pose_clean.sequence()
### END SOLUTION
Out[6]:
'NAVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVAIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIG'

With the function annotated_sequence below, we can start to see in more detail what the differences are. Note that non-canonical amino acids and hetatms are spelled out more explicitly now.

In [7]:
pose.annotated_sequence()
Out[7]:
'N[ASN:NtermProteinFull]AVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVT[THR:phosphorylated]AIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIG[GLY:CtermProteinFull]Z[ZN]Z[ZN]Z[ZN]Z[ZN]'
In [8]:
pose_clean.annotated_sequence()
Out[8]:
'N[ASN:NtermProteinFull]AVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVAIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIG[GLY:CtermProteinFull]'

Exercise 1: Inspecting pose sequences

Visually inspect the sequences to find the difference(s) between the pose_clean.sequence() and pose.sequence(). Were residues removed? Which ones?

In [ ]:
 

Bonus Exercise 1: Identifying differences in sequences

(Optional) Write a program to automatically find the differences between these two sequences

In [ ]: