This notebook contains material from PyRosetta; content is available on Github.

RosettaCarbohydrates

Keywords: carbohydrate, glycan, sugar, glucose, mannose, sugar, GlycanTreeSet, saccharide, furanose, pyranose, aldose, ketose

Overview

In this chapter, we will focus on a special subset of non-peptide oligo- and polymers — carbohydrates.</p>

Modeling carbohydrates — also known as saccharides, glycans, or simply sugars — comes with some special challenges. For one, most saccharide residues contain a ring as part of their backbone. This ring provides potentially new degrees of freedom when sampling. Additionally, carbohydrate structures are often branched, leading in Rosetta to more complicated FoldTrees.

This chapter includes a quick overview of carbohydrate nomenclature, structure, and basic interactions within Rosetta.

Carbohydrate Chemistry Background

Figure 1. A pyranose (left) and a furanose (right).

Sugars (saccharides) are defined as hyroxylated aldehydes and ketones. A typical monosaccharide has an equal number of carbon and oxygen atoms. For example, glucose has the molecular formula C6H12O6.

Sugars containing more than three carbons will spontaneously cyclize in aqueous environments to form five- or six-membered hemiacetals and hemiketals. Sugars with five-membered rings are called furanoses; those with six-membered rings are called pyranoses (Fig. 1).

Figure 2. An aldose (left) and a ketose (right).

A sugar is classified as an aldose or ketose, depending on whether it has an aldehyde or ketone in its linear form (Fig. 2).

The different sugars have different names, depending on the stereochemistry at each of the carbon atoms in the molecule. For example, glucose has one set of stereochemistries, while mannose has another.

In addition to their full names, many individual saccharide residues have three-letter codes, just like amino acid residues do. Glucose is "Glc" and mannose is "Man".

Backbone Torsions, Residue Connections, and side-chains

A glycan tree is made up of many sugar residues, each residue a ring. The 'backbone' of a glycan is the connection between one residue and another. The chemical makeup of each sugar residue in this 'linkage' effects the propensity/energy of each bacbone dihedral angle. In addition, sugars can be attached via different carbons of the parent glycan. In this way, the chemical makeup and the attachment position effects the dihedral propensities. Typically, there are two backbone dihedral angles, but this could be up to 4+ angles depending on the connection.

In IUPAC, the dihedrals of N are defined as the dihedrals between N and N-1 (IE - the parent linkage). The ASN (or other glycosylated protein residue's) dihedrals become part of the first glycan residue that is connected. For this first first glycan residue that is connected to an ASN, it has 4 torsions, while the ASN now has none!

If you are creating a movemap for dihedral residues, please use the MoveMapFactory as this has the IUPAC nomenclature of glycan residues built in in order to allow proper DOF sampling of the backbone residues, especially for branching glycan trees. In general, all of our samplers should use residue selectors and use the MoveMapFactory to build movemaps internally.

A sugar's side-chains are the constitutents of the glycan ring, which are typically an OH group or an acetyl group. These are sampled together at 60 degree angles by default during packing. A higher granularity of rotamers cannot currently be handled in Rosetta, but 60 degrees seems adequete for our purposes.

Within Rosetta, glycan connectivity information is stored in the GlycanTreeSet, which is continually updated to reflect any residue changes or additions to the pose. This info is always available through the function

    pose.glycan_tree_set()

Chemical information of each glycan residue can be accessed through the CarbohydrateInfo object, which is stored in each ResidueType object:

    pose.residue_type(i).carbohydrate_info()

We will cover both of these classes in the next tutorial.

Documentation

https://www.rosettacommons.org/docs/latest/application_documentation/carbohydrates/WorkingWithGlycans

References

Residue centric modeling and design of saccharide and glycoconjugate structures Jason W. Labonte Jared Adolf-Bryfogle William R. Schief Jeffrey J. Gray Journal of Computational Chemistry, 11/30/2016 - https://doi.org/10.1002/jcc.24679

Automatically Fixing Errors in Glycoprotein Structures with Rosetta Brandon Frenz, Sebastian Rämisch, Andrew J. Borst, Alexandra C. Walls Jared Adolf-Bryfogle, William R. Schief, David Veesler, Frank DiMaio Structure, 1/2/2019

Initialization

Let's use Pyrosetta to compare some common monosaccharide residues and see how they differ. As usual, we start by importing the `pyrosetta` and `rosetta` namespaces.

In [2]:
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")
In [2]:
from pyrosetta import *
from pyrosetta.teaching import *
from pyrosetta.rosetta import *

First, one needs the -include_sugars option, which will tell Rosetta to load sugars and add the sugar_bb energy term to a default scorefunction. This scoreterm is like rama for the sugar dihedrals which connect each sugar residue.

In [3]:
init('-include_sugars')
PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.39+release.93456a567a8125cafdf7f8cb44400bc20b570d81 2019-09-26T14:24:44] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Reading fconfig.../Users/jadolfbr/.rosetta/flags/common
core.init: 
core.init: 
core.init: Rosetta version: PyRosetta4.Release.python36.mac r233 2019.39+release.93456a567a8 93456a567a8125cafdf7f8cb44400bc20b570d81 http://www.pyrosetta.org 2019-09-26T14:24:44
core.init: command: PyRosetta -include_sugars -database /Users/jadolfbr/Library/Python/3.6/lib/python/site-packages/pyrosetta-2019.39+release.93456a567a8-py3.6-macosx-10.6-intel.egg/pyrosetta/database
basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=1177525307 seed_offset=0 real_seed=1177525307
basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=1177525307 RG_type=mt19937

When loading structures from the PDB that include glycans, we use these options. This includes an option to write out the structures in pdb format instead of the (better) Rosetta format. We will be using these options in the next tutorial.

    -maintain_links
    -auto_detect_glycan_connections
    -alternate_3_letter_codes pdb_sugar
    -write_glycan_pdb_codes
    -load_PDB_components false
  • Set up the `PyMOLMover` for viewing structures.
In [4]:
pm = PyMOLMover()

Creating Saccharides from Sequence

We will use the function, pose_from_saccharide_sequence(), which must be imported from the core.pose namespace. Unlike with peptide chains, one-letter-codes will not suffice when specifying saccharide chains, because there is too much information to convey; we must use at least four letters. The first three letters are the sugar's three-letter code; the fourth letter designates whether the residue is a furanose (f) or pyranose (p).

In [5]:
from pyrosetta.rosetta.core.pose import pose_from_saccharide_sequence
In [6]:
glucose = pose_from_saccharide_sequence('Glcp')
galactose = pose_from_saccharide_sequence('Galp')
mannose = pose_from_saccharide_sequence('Manp')
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 1251 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 1.25647 seconds.
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
  • Use the `PyMOLMover` to compare the three monosacharides in PyMOL.
  • At which carbons do the three sugars differ?

L and D Forms

Just like with peptides, saccharides come in two enantiomeric forms, labelled l and d. (Note the small-caps, used in print.) These can be loaded into PyRosetta using the prefixes `L-` and `D-`.

In [9]:
L_glucose = pose_from_saccharide_sequence('L-Glcp')
D_glucose = pose_from_saccharide_sequence('D-Glcp')
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
  • Compare the two structures in PyMOL. Notice that all stereocenters are inverted between the two monosaccharides.
  • Which enantiomer is loaded by PyRosetta by default if l or d are not specified?

Anomers

The carbon that is at a higher oxidation state — that is, the carbon of the hemiacetal/-ketal in the cyclic form or the carbon that is the carbonyl carbon of the aldehyde or ketone in the linear form — is called the anomeric carbon. Because the carbonyl of an aldehyde or ketone is planar, a sugar molecule can cyclize into one of two forms, one in which the resulting hydroxyl group is pointing "up" and another in which the same hydroxyl group is pointing "down". These two anomers are labelled α and β.

  • Create a one-residue `Pose` for both α- and β-d-glucopyranose and use PyMOL to compare both.
In [11]:
alpha_D_glucose = pose_from_saccharide_sequence('a-D-Glcp')
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
  • For which anomer is the C1 hydroxyl group axial to the chair conformation of the six-membered pyranose ring?
  • Which anomer of d-glucose would you predict to be the most stable? (Hint: remember what you learned in organic chemistry about axial and equatorial substituents.)

Linear Oligosaccharides & IUPAC Sequences

Oligo- and polysaccharides are composed of simple monosaccharide residues connected by acetal and ketal linkages called glycosidic bonds. Any of the monosaccharide's hydroxyl groups can be used to form a linkage to the anomeric carbon of another monosaccharide, leading to both linear and branched molecules.

Rosetta can create both linear and branched oligosaccharides from an IUPAC sequence. (IUPAC is the international organization dedicated to chemical nomenclature.)

To properly build a linear oligosaccharide, Rosetta must know the following details about each sugar residue being created in the following order:

  • Main-chain connectivity — →2) (->2)), →4) (->4)), →6) (->6)), etc.; default value is ->4)-
  • Anomeric form — α (a or alpha) or β (b or beta); default value is alpha
  • Enantiomeric form — l (L) or d (D); default value is D
  • 3-Letter code — required; uses sentence case
  • Ring form code — f (for a furanose/5-membered ring), p (for a pyranose/6-membered ring); required

Residues must be separated by hyphens. Glycosidic linkages can be specified with full IUPAC notation, e.g., -(1->4)- for “-(1→4)-”. (This means that the residue on the left connects from its C1 (anomeric) position to the hydoxyl oxygen at C4 of the residue on the right.) Rosetta will assume -(1-> for aldoses and -(2-> for ketoses.</p>

Note that the standard is to write the IUPAC sequence of a saccharide chain in reverse order from how they are numbered. Lets create three new oligosacharides from sequence.

In [7]:
maltotriose = pose_from_saccharide_sequence('a-D-Glcp-' * 3)
lactose = pose_from_saccharide_sequence('b-D-Galp-(1->4)-a-D-Glcp')
isomaltose = pose_from_saccharide_sequence('->6)-Glcp-' * 2)
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.

General Residue Information

When you print a Pose containing carbohydrate residues, the sugar residues will be listed as Z in the sequence.

In [14]:
print("maltotriose\n", maltotriose)
print("\nisomaltose\n", isomaltose)
print("\nlactose\n", lactose)
maltotriose
 PDB file name: alpha-D-Glcp-(1->4)-alpha-D-Glcp-(1->4)-alpha-D-Glcp
Total residues: 3
Sequence: ZZZ
Fold tree:
FOLD_TREE  EDGE 1 3 -1 

isomaltose
 PDB file name: alpha-D-Glcp-(1->6)-alpha-D-Glcp
Total residues: 2
Sequence: ZZ
Fold tree:
FOLD_TREE  EDGE 1 2 -1 

lactose
 PDB file name: beta-D-Galp-(1->4)-alpha-D-Glcp
Total residues: 2
Sequence: ZZ
Fold tree:
FOLD_TREE  EDGE 1 2 -1 

However, you can have Rosetta print out the sequences for individual chains, using the chain_sequence() method. If you do this, Rosetta is smart enough to give you a distinct sequence format for saccharide chains. (You may have noticed that the default file name for a .pdb file created from this Pose will be the same sequence.)

In [17]:
print(maltotriose.chain_sequence(1))
alpha-D-Glcp-(1->4)-alpha-D-Glcp-(1->4)-alpha-D-Glcp
In [18]:
print(isomaltose.chain_sequence(1))
alpha-D-Glcp-(1->6)-alpha-D-Glcp
In [19]:
print(lactose.chain_sequence(1))
beta-D-Galp-(1->4)-alpha-D-Glcp

Again, the standard is to show the sequence of a saccharide chain in reverse order from how they are numbered.

This is also how phi, psi, and omega are defined. From i+1 to i.

In [22]:
for res in lactose.residues: print(res.seqpos(), res.name())
1 ->4)-alpha-D-Glcp:reducing_end
2 ->4)-beta-D-Galp:non-reducing_end

Notice that for polysaccharides, the upstream residue is called the reducing end, while the downstream residue is called the non-reducing end.

You will also see the terms parent and child being used across Rosetta. Here, for Residue 2, residue 1 is the parent. For Residue 1, Residue 2 is the child. Due to branching, residues can have more than one child/non-reducing-end, but only a single parent residue.

Rosetta stores carbohydrate-specific information within `ResidueType`. If you print a residue, this additional information will be displayed.

In [23]:
print(glucose.residue(1))
Residue 1: ->4)-alpha-D-Glcp:reducing_end:non-reducing_end (Glc, Z):
Base: ->4)-alpha-D-Glcp
 Properties: POLYMER CARBOHYDRATE LOWER_TERMINUS UPPER_TERMINUS POLAR CYCLIC HEXOSE ALDOSE D_SUGAR PYRANOSE ALPHA_SUGAR
 Variant types: UPPER_TERMINUS_VARIANT LOWER_TERMINUS_VARIANT
 Main-chain atoms:  C1   C2   C3   C4   O4 
 Backbone atoms:    C1   C2   C3   C4   O4   C5   O5   VO5  VC1  H1   H2   H3   H4   HO4  H5 
 Ring atoms:    C1   C2   C3   C4   C5   O5 
 Side-chain atoms:  O1   O2   O3   C6   O6   HO1  HO2  HO3 1H6  2H6   HO6
Carbohydrate Properties for this Residue:
 Basic Name: glucose
 IUPAC Name: alpha-D-glucopyranose
 Abbreviation: alpha-D-Glcp
 Classification: aldohexose
 Stereochemistry: D
 Ring Form: pyranose
 Anomeric Form: alpha
 Modifications: 
  none
 Polymeric Information:
  Main chain connection: N/A
  Branch connections: none
Ring Conformer: 4C1 (chair): C-P parameters (q, phi, theta): 0.55, 180, 0; nu angles (degrees): 60, -60, 60, -60, 60, -60
  O1 : axial
  O2 : equatorial
  O3 : equatorial
  O4 : equatorial
  C6 : equatorial
Atom Coordinates:
   C1 : 0, 0, 0
   C2 : 1.55, 0, 0
   C3 : 2.04812, 1.44664, 0
   C4 : 1.50806, 2.11919, -1.26369
   O4 : 1.94666, 3.46908, -1.30661
   C5 : -0.0200415, 2.06186, -1.21358
   O5 : -0.475077, 0.686176, -1.1593
   VO5: -0.492509, 0.676579, -1.17187 (virtual)
   VC1: 0.031762, 0.00822503, 0.00564973 (virtual)
   O1 : -0.494034, 0.697555, 1.2082
   O2 : 2.02401, -0.669275, 1.15922
   O3 : 3.4779, 1.4716, 1.64563e-16
   C6 : -0.614146, 2.71298, -2.43962
   O6 : -0.225074, 4.07556, -2.53127
   H1 : -0.370662, -1.03564, 0.00767336
   H2 : 1.90812, -0.520035, -0.900727
   H3 : 1.67301, 1.95456, 0.900727
   H4 : 1.88381, 1.57916, -2.14527
   HO4: 1.61609, 3.94572, -0.516717
   H5 : -0.369153, 2.59396, -0.316372
   HO1: -0.167832, 1.62167, 1.20877
   HO2: 3.00401, -0.669275, 1.15922
   HO3: 3.78886, 2.40096, 5.03844e-17
  1H6 : -1.71106, 2.65811, -2.3783
  2H6 : -0.261365, 2.17983, -3.33478
   HO6: -0.621924, 4.47587, -3.33293
Mirrored relative to coordinates in ResidueType: FALSE

  • Scanning the output from printing a glucose `Residue`, what is the general term for an aldose with six carbon atoms?

Exploring Carbohydrate Structure

Torsion Angles

Most bioolymers have predefined, named torsion angles for their main-chain and side-chain bonds, such as φ, ψ, and ω and the various χs for amino acid residues. The same is true for saccharide residues. The torsion angles of sugars are as follows:

    Figure 3. A disaccharide's main-chain torsion angles.
    Figure 4. A monosaccharide's internal ring torsion angles.
    Figure 5. A monosaccharide's side-chain torsion angles.
  • φ — The 1st glycosidic torsion back to the previous (n−1) residue. The angle is defined by the cyclic oxygen, the two atoms across the bond, and the cyclic carbon numbered one less than the glycosidic linkage position. For aldopyranoses, φ(n) is thus defined as O5(n)–C1(n)–OX(n−1)–CX(n−1), where X is the position of the glycosidic linkage. For aldofuranoses, φ(n) is defined as O4(n)–C1(n)–OX(n−1)–CX(n−1) For 2-ketopyranoses, φ(n) is defined as O6(n)–C2(n)–OX(n−1)–CX(n−1). For 2-ketofuranoses, φ(n) is defined as O5(n)–C2(n)–OX(n−1)–CX(n−1). Et cetera….
  • ψ — The 2nd glycosidic torsion back to the previous (n−1) residue. The angle is defined by the anomeric carbon, the two atoms across the bond, and the cyclic carbon numbered two less than the glycosidic linkage position. ψ(n) is thus defined as Canomeric(n)–OX(n−1)–CX(n−1)–CX−1(n−1), where X is the position of the glycosidic linkage.
  • ω — The 3rd (and any subsequent) glycosidic torsion(s) back to the previous residue. ω1(n) is defined as OX(n−1)–CX(n−1)–CX−1(n−1)–CX−2(n−1), where X is the position of the glycosidic linkage. (This only applies to sugars with exocyclic connectivities.). The connection in Figure 3 has an exocyclic carbon, but the other potential connection points do not - so only phi and psi would available as bacbone torsion angles for those connection points.
  • ν1 – νn — The internal ring torsion angles, where n is the number of atoms in the ring. ν1 defines the torsion across bond C1–C2, etc.
  • χ1 – χn — The side-chain torsion angles, where n is the number of carbons in the sugar residue. The angle is defined by the carbon numbered one less than the glycosidic linkage position, the two atoms across the bond, and the polar hydrogen. The cyclic ring counts as carbon 0. For an aldopyranose, χ1 is thus defined by O5–C1–O1–HO1, and χ2 is defined by C1–C2–O2–HO2. χ5 is defined by C4–C5–C6–O6, because it rotates the exocyclic carbon rather than twists the ring. χ6 is defined by C5–C6–O6–HO6.

Take special note of how φ, ψ, and ω are defined in the reverse order as the angles of the same names for amino acid residues!

The chi() method of Pose works with sugar residues in the same way that it works with amino acid residues, where the first argument is the χ subscript and the second is the residue number of the Pose.

In [24]:
galactose.chi(1, 1)
Out[24]:
-60.49356158672178
In [25]:
galactose.chi(2, 1)
Out[25]:
-180.0
In [26]:
galactose.chi(3, 1)
Out[26]:
180.0
In [27]:
galactose.chi(4, 1)
Out[27]:
-59.999999999999986
In [28]:
galactose.chi(5, 1)
Out[28]:
-59.999999999999986
In [29]:
galactose.chi(6, 1)
Out[29]:
180.0

Likewise, we can use set_chi() to change these torsion angles and observe the changes in PyMOL, setting the option to keep history to true.

In [30]:
from pyrosetta.rosetta.protocols.moves import AddPyMOLObserver
In [31]:
observer = AddPyMOLObserver(galactose, True)
In [32]:
pm.apply(galactose)
  • Perform the following torsion angle changes to galactose using `set_chi()` and observe which torsions move in PyMOL.
    • Set χ1 to 120°.
    • Set χ2 to 60°.
    • Set χ3 to 60°.
    • Set χ4 to 0°.
    • Set χ5 to 60°.
    • Set χ6 to −60°.
In [37]:
galactose.set_chi(1, 1, 180)
In [38]:
## BEGIN SOLUTION

for chi_angle in zip([x for x in range(1, 6)], [120, 60, 60, 0, 60, -60]):
    print(chi_angle)
    galactose.set_chi(chi_angle[0] , 1, chi_angle[1])
    
## END SOLUTION
(1, 120)
(2, 60)
(3, 60)
(4, 0)
(5, 60)

Creating Saccharides from a PDB file

The phi(), set_phi(), psi(), set_psi(), omega(), and set_omega() methods of Pose also work with sugars. However, since pose_from_saccharide_sequence() may create a Pose with angles that cause the residues to wrap around onto each other, instead, let's reload some Pose's from .pdb files.

In [21]:
maltotriose = pose_from_file('inputs/glycans/maltotriose.pdb')
isomaltose = pose_from_file('inputs/glycans/isomaltose.pdb')
core.import_pose.import_pose: File 'inputs/maltotriose.pdb' automatically determined to be of type PDB
core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc1 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc2 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc3 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
core.import_pose.import_pose: File 'inputs/isomaltose.pdb' automatically determined to be of type PDB
core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc1 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc2 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
  • Now, try out the torsion angle getters and setters for the glycosydic bonds.
In [42]:
pm.apply(maltotriose)
In [43]:
maltotriose.phi(1)
Out[43]:
0.0
In [44]:
maltotriose.psi(1)
Out[44]:
0.0
In [45]:
maltotriose.phi(2)
Out[45]:
96.93460655617179
In [46]:
maltotriose.psi(2)
Out[46]:
109.94421849476633
In [47]:
maltotriose.omega(2)
Out[47]:
0.0
In [48]:
maltotriose.phi(3)
Out[48]:
103.21420435050914
In [49]:
maltotriose.psi(3)
Out[49]:
118.64096726060517

Notice how φ1 and ψ1 are undefined—the first residue is not connected to anything

In [50]:
observer = AddPyMOLObserver(maltotriose, True)
In [51]:
for i in (2, 3):
    maltotriose.set_phi(i, 180)
    maltotriose.set_psi(i, 180)

Isomaltose is composed of (1→6) linkages, so in this case omega torsions are defined. Get and set φ2, ψ2, ω2</p> for isomaltose

In [57]:
observer = AddPyMOLObserver(isomaltose, True)
In [71]:
## BEGIN SOLUTION

print(isomaltose.phi(2))
print(isomaltose.psi(2))
print(isomaltose.omega(2))

## END SOLUTION
44.32677030464958
-170.8693381707546
49.383018645410004

Any cyclic residue also stores its ν angles.

In [58]:
pm.apply(glucose)
In [53]:
Glc1 = glucose.residue(1)
In [55]:
for i in range(1, 6): print(Glc1.nu(i))
59.99999999999999
-59.99999999999999
60.00000000000001
-59.999999999999986
59.99999999999999

However, we generally care more about the ring conformation of a cyclic residue’s rings, in this case, its only ring with index of 1. (The output values here are the ideal angles, not the actual angles, which we viewed above.)

In [56]:
print(Glc1.ring_conformer(1))
4C1 (chair): C-P parameters (q, phi, theta): 0.55, 180, 0; nu angles (degrees): 60, -60, 60, -60, 60, -60

RingConformers

The output above warrants a brief explanation. First, what does `4C1` mean? Most of us likely remember learning about chair and boat conformations in Organic Chemistry. Do you recall how there are two distinct chair conformations that can interconvert between each other? The names for these specific conformations are 4C1 and 1C4. The nomenclature is as follows: Superscripts to the left of the capital letter are above the plane of the ring if it is oriented such that its carbon atoms proceed in a clockwise direction when viewed from above. Subscripts to the right of the letter are below the plane of the ring. The letter itself is an abbreviation, where, for example, C indicates a chair conformation and B a boat conformation. In all, there are 38 different ideal ring conformations that any six-membered cycle can take.

`C-P parameters` refers to the Cremer–Pople parameters for this conformation (Cremer D, Pople JA. J Am Chem Soc. 1975;97:1354–1358.). C–P parameters are an alternative coordinate system used to refer to a ring conformation.

Finally, a `RingConformer` in Rosetta includes the values of the ν angles. Each conformer has a unique set of angles. `Pose::set_nu()` does not exist, because it would rip a ring apart. Instead, to change a ring conformation, we need to use the `set_ring_conformer()` method, which takes a `RingConformer` object. Most of the time, you will not need to adjust the ring conformers, but you should be aware of it. We can ask a cyclic `ResidueType` for one of its `RingConformerSet`s to give us the `RingConformer` we want. (Each `RingConformerSet` includes the list of possible idealized ring conformers that such a ring can attain as well as information about the most energetically favorable one.) Then, we can et the conformation for our residue through `Pose`. (The arguments for `set_ring_conformer()` are the `Pose`’s sequence position, ring number, and the new conformer, respectively.)

    Figure 5. The two chair conformations of α-d-glucopyranose. In the 1C4 conformation (left), all of the substituents are axial; in the 4C1 conformation (right), they are equatorial. 4C1 is the most stable conformation for the majority of the α-d-aldohexopyranoses. In this nomenclature, a superscript means that that numbered carbon is above the ring, if the atoms are arranged in a clockwise manner from C1. A subscripted number indicates a carbon below the plane of the ring.

In [57]:
ring_set = Glc1.type().ring_conformer_set(1)
In [58]:
conformer = ring_set.get_ideal_conformer_by_name('1C4')
In [59]:
glucose.set_ring_conformation(1, 1, conformer)
In [60]:
pm.apply(glucose)

Modified sugars can also be created in Rosetta, either from sequence or from file. In the former case, simply use the proper abbreviation for the modification after the “ring form code”. For example, the abbreviation for an N-acetyl group is “NAc”. Note the N-acetyl group in the PyMOL window.

In [23]:
LacNAc = pose_from_saccharide_sequence('b-D-Galp-(1->4)-a-D-GlcpNAc')
pm.apply(LacNAc)
core.pose:  by appending by jump...
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.

Rosetta can handle branched oligosaccharides as well, but when loading from a sequence, this requires the use of brackets, which is the standard IUPAC notation. For example, here is how one would load Lewisx (Lex), a common branched glyco-epitope, into Rosetta by sequence.

In [24]:
Lex = pose_from_saccharide_sequence('b-D-Galp-(1->4)-[a-L-Fucp-(1->3)]-D-GlcpNAc')
pm.apply(Lex)
core.pose:  by appending by jump...
core.conformation.Conformation: appending residue by a chemical bond in the foldtree: 3 ->4)-alpha-L-Fucp:non-reducing_end anchor:  O3     1 root:  C1
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.

One can also load branched carbohydrates from a .pdb file. These .pdb files must include LINK records, which are a standard part of the PDB format. Open the test/data/carbohydrates/Lex.pdb file and look bear the top to see an example LINK record, which looks like this:

LINK O3 Glc A 1 C1 Fuc B 1 1555 1555 1.5

It tells us that there is a covalent linkage between O3 of glucose A1 and C1 of fucose B1 with a bond length of 1.5 Å. (The 1555s indicate symmetry and are ignored by Rosetta.)

Note that if the LINK records are not in order, or HETNAM records are not in a Rosetta format, we will fail to load. In the next tutorial we will use auto-detection to do this. For now, we know Lex.pdb will load OK.

In [26]:
Lex = pose_from_file('inputs/glycans/Lex.pdb')
pm.apply(Lex)
core.import_pose.import_pose: File 'inputs/Lex.pdb' automatically determined to be of type PDB
core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc1 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Gal2 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Fuc3 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-alpha-L-Fucp:non-reducing_end 3.  Returning BOGUS ID instead.
core.conformation.Residue: [ WARNING ] missing an atom: 3  H1  that depends on a nonexistent polymer connection!
core.conformation.Residue: [ WARNING ]  --> generating it using idealized coordinates.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-alpha-L-Fucp:non-reducing_end 3.  Returning BOGUS ID instead.
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.

You may notice when viewing the structure in PyMOL that the hybridization of the carbonyl of the amido functionality of the N-acetyl group is wrong. This is because of an error in the model deposited in the PDB from which this file was generated. This is, unfortunately, a very common problem with sugar structures found in the PDB. It is always useful to use http://www.glycosciences.de to identify any errors in the solution PDB structure before working with them in Rosetta. The referenced paper, Automatically Fixing Errors in Glycoprotein Structures with Rosetta can be used as a guide to fixing these.

You may also have noticed that the inputs/glycans/Lex.pdb file indicated in its HETNAM records that Glc1 was actually an N-acetylglycosamine (GlcNAc) with the indication 2-acetylamino-2-deoxy-. This is optional and is helpful for human-readability, but Rosetta only needs to know the base ResidueType of each sugar residue; specific VariantTypes needed — and most sugar modifications are treated as VariantTypes — are determined automatically from the atom names in the HETATM records for the residue. Anything after the comma is ignored.</p>

  • Print out the Pose to see how the FoldTree is defined.

In [70]:
## BEGIN SOLUTION
print(Lex)
## END SOLUTION
PDB file name: inputs/Lex.pdb
Total residues: 3
Sequence: ZZZ
Fold tree:
FOLD_TREE  EDGE 1 2 -1  EDGE 1 3 -2  O3   C1  

Note the CHEMICAL Edge (-2). This is Rosetta’s way of indicating a branch backbone connection. Unlike a standard POLYMER Edge (-1), this one tells you which atoms are involved.

  • Print out the sequence of each chain.

  • Now print out information about each residue in the Pose to see which VariantTypes and ResiduePropertys are assigned to each.
  • What are the three `VariantType`s of residue 1?
  • Output the various torison angles and make sure that you understand to which angles they correspond.

Can you see now why φ and ψ are defined the way they are? If they were defined as in AA residues, they would not have unique definitions, since GlcNAc is a branch point. A monosaccharide can have multiple children, but it can never have more than a single parent.

Note that for this oligosaccharide χ3(1) is equivalent to ψ(3) and χ4(1) is equivalent to ψ(2). Make sure that you understand why!

In [75]:
Lex.chi(3, 1), Lex.psi(3)
Out[75]:
(-97.03727535363538, -97.03727535363538)
In [76]:
Lex.chi(4, 1), Lex.psi(2)
Out[76]:
(135.6468768989725, 135.6468768989725)

For chemically modified sugars, χ angles are redefined at the positions where substitution has occurred. For new χs that have come into existence from the addition of new atoms and bonds, new definitions are added to new indices. For example, for GlcN2Ac residue 1, χC2–N2–C′–Cα′ is accessed through `chi(7, 1)`.

In [77]:
Lex.chi(2, 1)
Out[77]:
-230.8915297047683
In [78]:
Lex.set_chi(2, 1, 180)
In [79]:
pm.apply(Lex)
In [80]:
Lex.chi(7, 1)
Out[80]:
179.81012671885887
In [81]:
Lex.set_chi(7, 1, 0)
In [82]:
pm.apply(Lex)
  • Play around with getting and setting the various torsion angles for Lex

N- and O-Linked Glycans

Branching does not have to occur at sugars; a glycan can be attached to the nitrogen of an ASN or the oxygen of a SER or THR. N-linked glycans themselves tend to be branched structures.

We will cover more on linked glycan trees in the next tutorial through the GlycanTreeSet object - which is always present in a pose that has carbohydrates.

In [27]:
N_linked = pose_from_file('inputs/glycans/N-linked_14-mer_glycan.pdb')
pm.apply(N_linked)
print(N_linked)
core.import_pose.import_pose: File 'inputs/N-linked_14-mer_glycan.pdb' automatically determined to be of type PDB
core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc6 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc7 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man8 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man9 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man10 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man11 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc12 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc13 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc14 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man15 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man16 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man17 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man18 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man19 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-beta-D-Glcp:2-AcNH 6.  Returning BOGUS ID instead.
core.conformation.Residue: [ WARNING ] missing an atom: 6  H1  that depends on a nonexistent polymer connection!
core.conformation.Residue: [ WARNING ]  --> generating it using idealized coordinates.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->3)-alpha-D-Manp:->6)-branch 15.  Returning BOGUS ID instead.
core.conformation.Residue: [ WARNING ] missing an atom: 15  H1  that depends on a nonexistent polymer connection!
core.conformation.Residue: [ WARNING ]  --> generating it using idealized coordinates.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->2)-alpha-D-Manp 18.  Returning BOGUS ID instead.
core.conformation.Residue: [ WARNING ] missing an atom: 18  H1  that depends on a nonexistent polymer connection!
core.conformation.Residue: [ WARNING ]  --> generating it using idealized coordinates.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-beta-D-Glcp:2-AcNH 6.  Returning BOGUS ID instead.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->3)-alpha-D-Manp:->6)-branch 15.  Returning BOGUS ID instead.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->2)-alpha-D-Manp 18.  Returning BOGUS ID instead.
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
PDB file name: inputs/N-linked_14-mer_glycan.pdb
Total residues: 19
Sequence: ANASAZZZZZZZZZZZZZZ
Fold tree:
FOLD_TREE  EDGE 1 5 -1  EDGE 2 6 -2  ND2  C1   EDGE 6 14 -1  EDGE 8 15 -2  O6   C1   EDGE 15 17 -1  EDGE 15 18 -2  O6   C1   EDGE 18 19 -1 
In [29]:
for i in range(4): print(N_linked.chain_sequence(i + 1))
ANASA
alpha-D-Glcp-(1->3)-alpha-D-Glcp-(1->3)-alpha-D-Glcp-(1->3)-alpha-D-Manp-(1->2)-alpha-D-Manp-(1->2)-alpha-D-Manp-(1->3)-beta-D-Manp-(1->4)-beta-D-GlcpNAc-(1->4)-beta-D-GlcpNAc-
alpha-D-Manp-(1->2)-alpha-D-Manp-(1->3)-alpha-D-Manp-
alpha-D-Manp-(1->2)-alpha-D-Manp-
  • Which residue number is glycosylated above?
In [30]:
O_linked = pose_from_file('inputs/glycans/O_glycan.pdb')
pm.apply(O_linked)
core.import_pose.import_pose: File 'inputs/O_glycan.pdb' automatically determined to be of type PDB
core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments.
core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc4 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-alpha-D-Glcp:non-reducing_end 4.  Returning BOGUS ID instead.
core.conformation.Residue: [ WARNING ] missing an atom: 4  H1  that depends on a nonexistent polymer connection!
core.conformation.Residue: [ WARNING ]  --> generating it using idealized coordinates.
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-alpha-D-Glcp:non-reducing_end 4.  Returning BOGUS ID instead.
core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees
core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
  • Print `O-linked` and the sequence of each of its chains.

set_phi() and set_psi() still work when a glycan is linked to a peptide. (Below, we use pdb_info() to give help us select the residue that we want. In this case, in the .pdb file, the glycan is chain B.)

In [31]:
N_linked.set_phi(N_linked.pdb_info().pdb2pose("B", 1), 180)
pm.apply(N_linked)
  • Set ψ(B1) to 0° and ω(B1) to 90° and view the results in PyMOL.

Notice that in this case ψ and ω affect the side-chain torsions (χs) of the asparagine residue. This is another case where there are multiple ways of both naming and accessing the same specific torsion angles.

One can also create conjugated glycans from sequences if performed in steps, first creating the peptide portion by loading from a `.pdb` file or from sequence and then using the `glycosylate_pose()` function, (which needs to be imported first.) For example, to glycosylate an ASA peptide with a single glucose at position 2 of the peptide, we perform the following:

Glycosylation by function

Here, we will glycosylate a simple peptide using the function, glycosylate_pose. In the next tutorial, we will use a Mover interface to this function.

In [36]:
peptide = pose_from_sequence('ASA')
In [37]:
pm.apply(peptide)
In [38]:
from pyrosetta.rosetta.core.pose.carbohydrates import glycosylate_pose, glycosylate_pose_by_file
In [40]:
glycosylate_pose(peptide, 2, 'Glcp')
pm.apply(peptide)
core.conformation.Conformation: appending residue by a chemical bond in the foldtree: 5 ->4)-alpha-D-Glcp:non-reducing_end anchor:  OG     2 root:  C1
core.pose.carbohydrates.util: Glycosylated pose with a(n) Glcp-OGSER2 bond.
core.pose.carbohydrates.util: Idealizing glycosidic torsions.

Here, we uset the main function to glycosylate a pose. In the next tutorial, we will use a Mover interface to do so.

It is also possible to glycosylate a pose with common glycans found in the database. These files end in the `.iupac` extension and are simply IUPAC sequences just as we have been using throughout this chapter.

Here is a list of some common iupacs.

bisected_fucosylated_N-glycan_core.iupac
bisected_N-glycan_core.iupac
common_names.txt
core_1_O-glycan.iupac
core_2_O-glycan.iupac
core_3_O-glycan.iupac
core_4_O-glycan.iupac
core_5_O-glycan.iupac
core_6_O-glycan.iupac
core_7_O-glycan.iupac
core_8_O-glycan.iupac
fucosylated_N-glycan_core.iupac
high-mannose_N-glycan_core.iupac
hybrid_bisected_fucosylated_N-glycan_core.iupac
hybrid_bisected_N-glycan_core.iupac
hybrid_fucosylated_N-glycan_core.iupac
hybrid_N-glycan_core.iupac
man5.iupac
man9.iupac
N-glycan_core.iupac
In [42]:
peptide = pose_from_sequence('ASA'); pm.apply(peptide)
In [44]:
glycosylate_pose_by_file(peptide, 2, 'core_5_O-glycan')
pm.apply(peptide)
core.conformation.Conformation: appending residue by a chemical bond in the foldtree: 6 ->3)-alpha-D-Galp:2-AcNH anchor:  OG     2 root:  C1
core.pose.carbohydrates.util: Glycosylated pose with a(n) a-D-GalpNAc-(1->3)-a-D-GalpNAc--OGSER2 bond.
core.pose.carbohydrates.util: Idealizing glycosidic torsions.

Conclusion

You now have a grasp on the basics of RosettaCarbohydrates. Please continue onto the next tutorial for more on glycan residue selection and various movers that can be of use when working with glycans.

In [ ]:
 

Chapter contributors:

  • Jared Adolf-Bryfogle (Scripps; Institute for Protein Innovation)
  • Jason Labonte (Jons Hopkins; Franklin and Marshall College)