Last updated: 2019
Sometimes you may want to construct a Universe from scratch, or add attributes that are not read from a file. For example, you may want to group a Universe into chains, or create custom segments for protein domains.
In this tutorial we:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB_small
import nglview as nv
import numpy as np
from IPython.core.display import Image
print("Using MDAnalysis version", mda.__version__)
print("Using NGLView version", nv.__version__)
_ColormakerRegistry()
Using MDAnalysis version 0.20.1 Using NGLView version 2.7.1
The Universe.empty()
method creates a blank Universe. The natoms
(int) argument must be included. Optional arguments are:
* n_residues (int): number of residues
* n_segments (int): number of segments
* atom_resindex (list): list of resindices for each atom
* residue_segindex (list): list of segindices for each residue
* trajectory (bool): whether to attach a MemoryReader trajectory (default False)
* velocities (bool): whether to include velocities in the trajectory (default False)
* forces (bool): whether to include forces in the trajectory (default False)
We will create a Universe with 1000 water molecules.
n_residues = 1000
n_atoms = n_residues * 3
# create resindex list
resindices = np.repeat(range(n_residues), 3)
assert len(resindices) == n_atoms
print("resindices:", resindices[:10])
# all water molecules belong to 1 segment
segindices = [0] * n_residues
print("segindices:", segindices[:10])
resindices: [0 0 0 1 1 1 2 2 2 3] segindices: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# create the Universe
sol = mda.Universe.empty(n_atoms,
n_residues=n_residues,
atom_resindex=resindices,
residue_segindex=segindices,
trajectory=True) # necessary for adding coordinates
sol
<Universe with 3000 atoms>
There isn't much we can do with our current Universe because MDAnalysis has no information on the particle elements, positions, etc. We can add relevant information manually using TopologyAttrs.
sol.add_TopologyAttr('name', ['O', 'H1', 'H2']*n_residues)
sol.atoms.names
array(['O', 'H1', 'H2', ..., 'O', 'H1', 'H2'], dtype=object)
Elements are typically contained in the type
topology attribute.
sol.add_TopologyAttr('type', ['O', 'H', 'H']*n_residues)
sol.atoms.types
array(['O', 'H', 'H', ..., 'O', 'H', 'H'], dtype=object)
sol.add_TopologyAttr('resname', ['SOL']*n_residues)
sol.atoms.resnames
array(['SOL', 'SOL', 'SOL', ..., 'SOL', 'SOL', 'SOL'], dtype=object)
sol.add_TopologyAttr('resid', list(range(1, n_residues+1)))
sol.atoms.resids
array([ 1, 1, 1, ..., 1000, 1000, 1000])
sol.add_TopologyAttr('segid', ['SOL'])
sol.atoms.segids
array(['SOL', 'SOL', 'SOL', ..., 'SOL', 'SOL', 'SOL'], dtype=object)
Positions can simply be assigned, without adding a topology attribute.
For convenience, we will pretend that water has a bond angle of 90 degrees. This lets us place water molecules on a grid. The O-H bond length is around 0.96 Angstrom.
grid_size = 10
spacing = 8
coordinates = []
# generating coordinates for water molecules
for i in range(n_residues):
x = spacing * (i % grid_size)
y = spacing * ((i // grid_size) % grid_size)
z = spacing * (i // (grid_size * grid_size))
coordinates.extend([[x, y, z], # oxygen
[x, y+0.96, z], # hydrogen
[x+0.96, y, z]]) # hydrogen
print(coordinates[:10])
[[0, 0, 0], [0, 0.96, 0], [0.96, 0, 0], [8, 0, 0], [8, 0.96, 0], [8.96, 0, 0], [16, 0, 0], [16, 0.96, 0], [16.96, 0, 0], [24, 0, 0]]
coord_array = np.array(coordinates)
assert coord_array.shape == (n_atoms, 3)
sol.atoms.positions = coord_array
We can view the atoms with NGLView, a library for visualising molecules. It guesses bonds based on distance.
sol_view = nv.show_mdanalysis(sol)
sol_view.add_representation('ball+stick', selection='all')
sol_view.center()
sol_view
NGLWidget()
Currently, the sol
universe doesn't contain any bonds.
sol.bonds
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/core/universe.py in __getattr__(self, key) 509 try: --> 510 segment = self._instant_selectors[key] 511 except KeyError: KeyError: 'bonds' During handling of the above exception, another exception occurred: AttributeError Traceback (most recent call last) <ipython-input-12-076c3c5d8d4d> in <module> ----> 1 sol.bonds ~/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/core/universe.py in __getattr__(self, key) 510 segment = self._instant_selectors[key] 511 except KeyError: --> 512 raise AttributeError('No attribute "{}".'.format(key)) 513 else: 514 warnings.warn("Instant selector Universe.<segid> " AttributeError: No attribute "bonds".
They can be important for defining 'fragments', which are groups of atoms where every atom is connected by a bond to another atom in the group (i.e. what is commonly called a molecule). You can pass a list of tuples of atom indices to add bonds as a topology attribute.
bonds = []
for o in range(0, n_atoms, 3):
bonds.extend([(o, o+1), (o, o+2)])
bonds[:10]
[(0, 1), (0, 2), (3, 4), (3, 5), (6, 7), (6, 8), (9, 10), (9, 11), (12, 13), (12, 14)]
sol.add_TopologyAttr('bonds', bonds)
sol.bonds
<TopologyGroup containing 2000 bonds>
The bonds associated with each atom or the bonds within an AtomGroup can be accessed with the bonds
attribute:
print(sol.atoms[0].bonds)
print(sol.atoms[-10:].bonds)
<TopologyGroup containing 2 bonds> <TopologyGroup containing 7 bonds>
protein = mda.Universe(PDB_small)
nv.show_mdanalysis(protein)
NGLWidget()
I will translate the centers of both systems to the origin, so they can overlap in space.
cog = sol.atoms.center_of_geometry()
print('Original solvent center of geometry: ', cog)
sol.atoms.positions -= cog
cog2 = sol.atoms.center_of_geometry()
print('New solvent center of geometry: ', cog2)
Original solvent center of geometry: [36.31999976 36.31999976 36. ] New solvent center of geometry: [0. 0. 0.]
cog = protein.atoms.center_of_geometry()
print('Original solvent center of geometry: ', cog)
protein.atoms.positions -= cog
cog2 = protein.atoms.center_of_geometry()
print('New solvent center of geometry: ', cog2)
Original solvent center of geometry: [-3.66508082 9.60502842 14.33355791] New solvent center of geometry: [8.30580288e-08 3.49225059e-08 2.51332265e-08]
combined = mda.Merge(protein.atoms, sol.atoms)
combined_view = nv.show_mdanalysis(combined)
combined_view.add_representation("ball+stick", selection="not protein")
combined_view
NGLWidget()
Unfortunately, some water molecules overlap with the protein. We can create a new AtomGroup containing only the molecules where every atom is further away than 6 angstroms from the protein.
no_overlap = combined.select_atoms("same resid as (not around 6 protein)")
With this AtomGroup, we can then construct a new Universe.
u = mda.Merge(no_overlap)
view = nv.show_mdanalysis(u)
view.add_representation("ball+stick", selection="not protein")
view
NGLWidget()
Often you may want to assign atoms to a segment or chain -- for example, adding chain IDs to a PDB file. This requires adding a new Segment with Universe.add_Segment
.
Adenylate kinase has three domains: CORE, NMP, and LID. As shown in the picture below,[1] these have the residues:
u.segments.segids
array(['4AKE', 'SOL'], dtype=object)
On examining the Universe, we can see that the protein and solvent are already divided into two segments: protein ('4AKE') and solvent ('SOL'). We will add three more segments (CORE, NMP, and LID) and assign atoms to them.
First, add a Segment to the Universe with a segid
. It will be empty:
core_segment = u.add_Segment(segid='CORE')
core_segment.atoms
<AtomGroup with 0 atoms>
Residues can't be broken across segments. To put atoms in a segment, assign the segments attribute of their residues:
core_atoms = u.select_atoms('resid 1:29 or resid 60:121 or resid 160-214')
core_atoms.residues.segments = core_segment
core_segment.atoms
<AtomGroup with 2744 atoms>
nmp_segment = u.add_Segment(segid='NMP')
lid_segment = u.add_Segment(segid='LID')
nmp_atoms = u.select_atoms('resid 30:59')
nmp_atoms.residues.segments = nmp_segment
lid_atoms = u.select_atoms('resid 122:159')
lid_atoms.residues.segments = lid_segment
We can check that we have the correct domains by visualising the protein. NGLView handles molecular structure by converting the MDAnalysis atoms into a PDB, where the chainID of each atom is the first letter of the segment that it belongs to.
domain_view = nv.show_mdanalysis(u)
domain_view.color_by('chainID')
domain_view
NGLWidget()
[1]: Beckstein O, Denning EJ, Perilla JR, Woolf TB. Zipping and unzipping of adenylate kinase: atomistic insights into the ensemble of open<-->closed transitions. J Mol Biol. 2009;394(1):160–176. doi:10.1016/j.jmb.2009.09.009