Constructing, modifying, and adding to a Universe¶

Last updated: 2019

Sometimes you may want to construct a Universe from scratch, or add attributes that are not read from a file. For example, you may want to group a Universe into chains, or create custom segments for protein domains.

In this tutorial we:

create a Universe consisting of water molecules
merge this with a protein Universe loaded from a file
create custom segments labeling protein domains

In [1]:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB_small
import nglview as nv
import numpy as np
from IPython.core.display import Image

print("Using MDAnalysis version", mda.__version__)
print("Using NGLView version", nv.__version__)

_ColormakerRegistry()

Using MDAnalysis version 0.20.1
Using NGLView version 2.7.1

Creating and populating a Universe with water¶

Creating a blank Universe¶

The Universe.empty() method creates a blank Universe. The natoms (int) argument must be included. Optional arguments are:

* n_residues (int): number of residues
* n_segments (int): number of segments
* atom_resindex (list): list of resindices for each atom
* residue_segindex (list): list of segindices for each residue
* trajectory (bool): whether to attach a MemoryReader trajectory (default False)
* velocities (bool): whether to include velocities in the trajectory (default False)
* forces (bool): whether to include forces in the trajectory (default False)

We will create a Universe with 1000 water molecules.

In [2]:

n_residues = 1000
n_atoms = n_residues * 3

# create resindex list
resindices = np.repeat(range(n_residues), 3)
assert len(resindices) == n_atoms    
print("resindices:", resindices[:10])

# all water molecules belong to 1 segment
segindices = [0] * n_residues
print("segindices:", segindices[:10])

resindices: [0 0 0 1 1 1 2 2 2 3]
segindices: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [3]:

# create the Universe
sol = mda.Universe.empty(n_atoms, 
                         n_residues=n_residues, 
                         atom_resindex=resindices, 
                         residue_segindex=segindices,
                         trajectory=True) # necessary for adding coordinates
sol

Out[3]:

<Universe with 3000 atoms>

Adding topology attributes¶

There isn't much we can do with our current Universe because MDAnalysis has no information on the particle elements, positions, etc. We can add relevant information manually using TopologyAttrs.

names¶

In [4]:

sol.add_TopologyAttr('name', ['O', 'H1', 'H2']*n_residues)
sol.atoms.names

Out[4]:

array(['O', 'H1', 'H2', ..., 'O', 'H1', 'H2'], dtype=object)

elements ("types")¶

Elements are typically contained in the type topology attribute.

In [5]:

sol.add_TopologyAttr('type', ['O', 'H', 'H']*n_residues)
sol.atoms.types

Out[5]:

array(['O', 'H', 'H', ..., 'O', 'H', 'H'], dtype=object)

residue names ("resnames")¶

In [6]:

sol.add_TopologyAttr('resname', ['SOL']*n_residues)
sol.atoms.resnames

Out[6]:

array(['SOL', 'SOL', 'SOL', ..., 'SOL', 'SOL', 'SOL'], dtype=object)

residue counter ("resids")¶

In [7]:

sol.add_TopologyAttr('resid', list(range(1, n_residues+1)))
sol.atoms.resids

Out[7]:

array([   1,    1,    1, ..., 1000, 1000, 1000])

segment/chain names ("segids")¶

In [8]:

sol.add_TopologyAttr('segid', ['SOL'])
sol.atoms.segids

Out[8]:

array(['SOL', 'SOL', 'SOL', ..., 'SOL', 'SOL', 'SOL'], dtype=object)

Adding positions¶

Positions can simply be assigned, without adding a topology attribute.

For convenience, we will pretend that water has a bond angle of 90 degrees. This lets us place water molecules on a grid. The O-H bond length is around 0.96 Angstrom.

In [9]:

grid_size = 10
spacing = 8

coordinates = []

# generating coordinates for water molecules 
for i in range(n_residues):
    x = spacing * (i % grid_size)
    y = spacing * ((i // grid_size) % grid_size)
    z = spacing * (i // (grid_size * grid_size))
    
    coordinates.extend([[x,      y,      z],   # oxygen
                        [x,      y+0.96, z],   # hydrogen
                        [x+0.96, y,      z]])  # hydrogen

print(coordinates[:10])

[[0, 0, 0], [0, 0.96, 0], [0.96, 0, 0], [8, 0, 0], [8, 0.96, 0], [8.96, 0, 0], [16, 0, 0], [16, 0.96, 0], [16.96, 0, 0], [24, 0, 0]]

In [10]:

coord_array = np.array(coordinates)
assert coord_array.shape == (n_atoms, 3)
sol.atoms.positions = coord_array

We can view the atoms with NGLView, a library for visualising molecules. It guesses bonds based on distance.

In [27]:

sol_view = nv.show_mdanalysis(sol)
sol_view.add_representation('ball+stick', selection='all')
sol_view.center()
sol_view

NGLWidget()

Adding bonds¶

Currently, the sol universe doesn't contain any bonds.

In [12]:

sol.bonds

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/core/universe.py in __getattr__(self, key)
    509         try:
--> 510             segment = self._instant_selectors[key]
    511         except KeyError:

KeyError: 'bonds'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-12-076c3c5d8d4d> in <module>
----> 1 sol.bonds

~/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/core/universe.py in __getattr__(self, key)
    510             segment = self._instant_selectors[key]
    511         except KeyError:
--> 512             raise AttributeError('No attribute "{}".'.format(key))
    513         else:
    514             warnings.warn("Instant selector Universe.<segid> "

AttributeError: No attribute "bonds".

They can be important for defining 'fragments', which are groups of atoms where every atom is connected by a bond to another atom in the group (i.e. what is commonly called a molecule). You can pass a list of tuples of atom indices to add bonds as a topology attribute.

In [13]:

bonds = []
for o in range(0, n_atoms, 3):
    bonds.extend([(o, o+1), (o, o+2)])

bonds[:10]

Out[13]:

[(0, 1),
 (0, 2),
 (3, 4),
 (3, 5),
 (6, 7),
 (6, 8),
 (9, 10),
 (9, 11),
 (12, 13),
 (12, 14)]

In [14]:

sol.add_TopologyAttr('bonds', bonds)
sol.bonds

Out[14]:

<TopologyGroup containing 2000 bonds>

The bonds associated with each atom or the bonds within an AtomGroup can be accessed with the bonds attribute:

In [15]:

print(sol.atoms[0].bonds)
print(sol.atoms[-10:].bonds)

<TopologyGroup containing 2 bonds>
<TopologyGroup containing 7 bonds>

Merging with a protein¶

Now we can merge the water with a protein to create a combined system by using MDAnalysis.Merge to combine AtomGroup instances.

The protein is adenylate kinase (AdK), a phosphotransferase enzyme. [1]

In [16]:

protein = mda.Universe(PDB_small)
nv.show_mdanalysis(protein)

NGLWidget()

I will translate the centers of both systems to the origin, so they can overlap in space.

In [17]:

cog = sol.atoms.center_of_geometry()
print('Original solvent center of geometry: ', cog)
sol.atoms.positions -= cog
cog2 = sol.atoms.center_of_geometry()
print('New solvent center of geometry: ', cog2)

Original solvent center of geometry:  [36.31999976 36.31999976 36.        ]
New solvent center of geometry:  [0. 0. 0.]

In [18]:

cog = protein.atoms.center_of_geometry()
print('Original solvent center of geometry: ', cog)
protein.atoms.positions -= cog
cog2 = protein.atoms.center_of_geometry()
print('New solvent center of geometry: ', cog2)

Original solvent center of geometry:  [-3.66508082  9.60502842 14.33355791]
New solvent center of geometry:  [8.30580288e-08 3.49225059e-08 2.51332265e-08]

In [19]:

combined = mda.Merge(protein.atoms, sol.atoms)
combined_view = nv.show_mdanalysis(combined)
combined_view.add_representation("ball+stick", selection="not protein")
combined_view

NGLWidget()

Unfortunately, some water molecules overlap with the protein. We can create a new AtomGroup containing only the molecules where every atom is further away than 6 angstroms from the protein.

In [20]:

no_overlap = combined.select_atoms("same resid as (not around 6 protein)")

With this AtomGroup, we can then construct a new Universe.

In [21]:

u = mda.Merge(no_overlap)
view = nv.show_mdanalysis(u)
view.add_representation("ball+stick", selection="not protein")
view

NGLWidget()

Adding a new segment¶

Often you may want to assign atoms to a segment or chain -- for example, adding chain IDs to a PDB file. This requires adding a new Segment with Universe.add_Segment.

Adenylate kinase has three domains: CORE, NMP, and LID. As shown in the picture below,[1] these have the residues:

CORE: residues 1-29, 60-121, 160-214 (gray)
NMP: residues 30-59 (blue)
LID: residues 122-159 (yellow)

In [22]:

u.segments.segids

Out[22]:

array(['4AKE', 'SOL'], dtype=object)

On examining the Universe, we can see that the protein and solvent are already divided into two segments: protein ('4AKE') and solvent ('SOL'). We will add three more segments (CORE, NMP, and LID) and assign atoms to them.

First, add a Segment to the Universe with a segid. It will be empty:

In [23]:

core_segment = u.add_Segment(segid='CORE')
core_segment.atoms

Out[23]:

<AtomGroup with 0 atoms>

Residues can't be broken across segments. To put atoms in a segment, assign the segments attribute of their residues:

In [24]:

core_atoms = u.select_atoms('resid 1:29 or resid 60:121 or resid 160-214')
core_atoms.residues.segments = core_segment
core_segment.atoms

Out[24]:

<AtomGroup with 2744 atoms>

In [25]:

nmp_segment = u.add_Segment(segid='NMP')
lid_segment = u.add_Segment(segid='LID')

nmp_atoms = u.select_atoms('resid 30:59')
nmp_atoms.residues.segments = nmp_segment

lid_atoms = u.select_atoms('resid 122:159')
lid_atoms.residues.segments = lid_segment

We can check that we have the correct domains by visualising the protein. NGLView handles molecular structure by converting the MDAnalysis atoms into a PDB, where the chainID of each atom is the first letter of the segment that it belongs to.

In [26]:

domain_view = nv.show_mdanalysis(u)
domain_view.color_by('chainID')
domain_view

NGLWidget()

References¶

[1]: Beckstein O, Denning EJ, Perilla JR, Woolf TB. Zipping and unzipping of adenylate kinase: atomistic insights into the ensemble of open<-->closed transitions. J Mol Biol. 2009;394(1):160–176. doi:10.1016/j.jmb.2009.09.009