#!/usr/bin/env python # coding: utf-8 # # Hole module: Basic trajectory analysis # # [MDAnalysis](https://www.mdanalysis.org) contains the [MDAnalysis.analysis.hole](https://www.mdanalysis.org/docs/documentation_pages/analysis/hole.html) module that uses Oliver Smart's [HOLE](http://www.holeprogram.org/) program. This notebook shows how to make use of the basic functionality in `MDAnalysis.analysis.hole`. We will analyze the fluctuations in the pore radius profile of an ion channel (gramicidin A) from a simulation trajectory. # In[1]: import numpy as np import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') plt.style.use('ggplot') import MDAnalysis as mda import MDAnalysis.analysis.hole # In[2]: mda.start_logging() # I am testing HOLE on a simple "trajectory": I submitted gramicidin A (1grm, test file `1gmr_single.pdb`) to the [ElNemo](http://www.sciences.univ-nantes.fr/elnemo/) elastic network server. From an ENM it computed a number of modes and provided multi-frame PDB files showing these modes. I just picked the first mode (which is labelled mode 7), which shows a twist motion along the pore axis. Note that this "trajectory" is exaggerated and does not correspond to real protein motion. # # The trajectory is included with the MDAnalysis test files. # In[3]: from MDAnalysis.tests.datafiles import MULTIPDB_HOLE u = mda.Universe(MULTIPDB_HOLE) # ## Running HOLE # You need to install `HOLE` first, which is available from http://www.holeprogram.org/ under the Apache open source license. # # Set up the `HOLEtraj` class; provide a path to the `hole` executable. # In[4]: H = MDAnalysis.analysis.hole.HOLEtraj(u, executable="~/hole2/exe/hole") # Then run `hole` on the individual frames (this takes a while...) # In[5]: H.run() # The trajectory contains 11 frames. We are typically interested in the *HOLE profile*, i.e., a plot of the pore radius as determined by HOLE against the reaction coordinate (often close to $z$ along the pore axis for simple channels). # # All profiles are stored as # ```python # H.profiles # ``` # which is a `dict`. The keys are the frame numbers (or a user supplied reaction coordinate for the conformation). The values are record arrays with columns `frame` (the reaction coordinate for the conformation), `rxncoord` (the HOLE pore reaction coordinate), and `radius` (radius in Angstrom). # # # The `H.plot()` method plots all the profiles together, the `H.plot3D()` stacks them: # In[6]: ax = H.plot() ax.set_ylim(1, 4) ax.set_xlim(-15, 25) plt.tight_layout() # In[9]: H.plot3D(rmax=4) # ## Working with HOLE profiles # The output from HOLE is a pore profile $R(\zeta)$ (radius $R$ vs pore coordinate $\zeta$). # # ### Pore profile data structure # All pore profiles are stored in a `dict` named `H.profiles` that is indexed by the frame number. Each pore profile itself is a array containig rows with *frame number* (all identical), *pore coordinate*, and *pore radius*. For more details, see the docs on [HOLE data structures](http://pythonhosted.org/MDAnalysis/documentation_pages/analysis/hole.html#data-structures). # H.profiles # ### Accessing profiles on a per-frame basis # To get at individual profiles in we can iterate over `H.profiles`. However, as a convenience, just iterating over the `HOLEtraj` instance itself will return the profiles in *frame-sorted order* (thanks to [HOLEtraj.sorted_profiles_iter()](http://pythonhosted.org/MDAnalysis/documentation_pages/analysis/hole.html#MDAnalysis.analysis.hole.HOLEtraj.sorted_profiles_iter)). # # For example, getting the minimum radius # In[7]: for frame, profile in H: print("frame {}: min(R) = {}".format(frame, profile.radius.min())) # (For a slightly more advanced [analysis of minimum pore radius as function of a order parameter](http://nbviewer.jupyter.org/gist/orbeckst/a0c856f58059112538591af108df6d59#Analysis-of-the-minimum-pore-radius) see the [Advanced HOLE Hacking Notebook](http://nbviewer.jupyter.org/gist/orbeckst/a0c856f58059112538591af108df6d59).) # # TODO: Change these links to Binder notebooks. # ### Extracting data to a file # Sometimes you want to have the pore profiles in a simple data file. Here we are using the popular [XMGRACE](http://plasma-gate.weizmann.ac.il/Grace/) XY format where each row will contain # ``` # pore_coordinate pore_radius # ``` # and datasets for frames are separated by # ``` # & # ``` # In[8]: with open("profiles.xvg", "w") as xvg: for frame, profile in H: for _, zeta, radius in profile: xvg.write("{0} {1}\n".format(zeta, radius)) xvg.write("&\n") # The corresponding output file looks like this: # In[9]: print("".join(open("profiles.xvg").readlines()[465:485])) # One can use the above code to write the profiles in any format; for instance, it would be easy to write each frame's profile to a separate file by moving the `with` statement *inside* the outer `for` loop. # ## Creating an "average" HOLE profile # At the moment (MDAnalysis 0.17.0) does not have a method to compute an average HOLE profile. However, in the following I demonstrate how this can be accomplished using the [numkit.timeseries.regularized_function](http://gromacswrapper.readthedocs.io/en/latest/numkit/timeseries.html#numkit.timeseries.regularized_function) (a function found in [GromacsWrapper](http://gromacswrapper.readthedocs.io/)) but we can also use it independently (I copied it from [numkit/timeseries.py](https://github.com/Becksteinlab/GromacsWrapper/blob/develop/numkit/timeseries.py#L519)): # In[10]: import numpy def regularized_function(x, y, func, bins=100, range=None): """Compute *func()* over data aggregated in bins. ``(x,y) --> (x', func(Y'))`` with ``Y' = {y: y(x) where x in x' bin}`` First the data is collected in bins x' along x and then *func* is applied to all data points Y' that have been collected in the bin. .. function:: func(y) -> float *func* takes exactly one argument, a numpy 1D array *y* (the values in a single bin of the histogram), and reduces it to one scalar float. .. Note:: *x* and *y* must be 1D arrays. :Arguments: x abscissa values (for binning) y ordinate values (func is applied) func a numpy ufunc that takes one argument, func(Y') bins number or array range limits (used with number of bins) :Returns: F,edges function and edges (``midpoints = 0.5*(edges[:-1]+edges[1:])``) (This function originated as :func:`recsql.sqlfunctions.regularized_function`.) """ _x = numpy.asarray(x) _y = numpy.asarray(y) if len(_x.shape) != 1 or len(_y.shape) != 1: raise TypeError("Can only deal with 1D arrays.") # setup of bins (taken from numpy.histogram) if (range is not None): mn, mx = range if (mn > mx): raise AttributeError('max must be larger than min in range parameter.') if not numpy.iterable(bins): if range is None: range = (_x.min(), _x.max()) mn, mx = [float(mi) for mi in range] if mn == mx: mn -= 0.5 mx += 0.5 bins = numpy.linspace(mn, mx, bins+1, endpoint=True) else: bins = numpy.asarray(bins) if (numpy.diff(bins) < 0).any(): raise ValueError('bins must increase monotonically.') sorting_index = numpy.argsort(_x) sx = _x[sorting_index] sy = _y[sorting_index] # boundaries in SORTED data that demarcate bins; position in bin_index is the bin number bin_index = numpy.r_[sx.searchsorted(bins[:-1], 'left'), sx.searchsorted(bins[-1], 'right')] # naive implementation: apply operator to each chunk = sy[start:stop] separately # # It's not clear to me how one could effectively block this procedure (cf # block = 65536 in numpy.histogram) because there does not seem to be a # general way to combine the chunks for different blocks, just think of # func=median F = numpy.zeros(len(bins)-1) # final function F[:] = [func(sy[start:stop]) for start,stop in zip(bin_index[:-1],bin_index[1:])] return F,bins # The strategy is to collect all data and the histogram the radii over the HOLE reactioncoordinate: # In[11]: rxncoord = np.concatenate([profile.rxncoord for frame, profile in H.sorted_profiles_iter()]) radii = np.concatenate([profile.radius for frame, profile in H.sorted_profiles_iter()]) # Now use `regularized_function`. # # Normally I would import it as # ```python # from numkit.timeseries import regularized_function # ``` # but for this notebook it is not necessary to install [numkit](https://github.com/Becksteinlab/numkit) (`pip install numkit`...). # The function to reduce the data in each bin is either just the mean (`np.mean`) or the standard deviation (`np.std`). (For more ideas what else to use see the numkit notes on [coarse graining timeseries](http://gromacswrapper.readthedocs.io/en/latest/numkit/timeseries.html#coarse-graining-time-series).) # In[12]: mean_r, q = regularized_function(rxncoord, radii, np.mean, bins=100) std_r, q = regularized_function(rxncoord, radii, np.std, bins=100) zeta = 0.5*(q[1:] + q[:-1]) # Note that `regularized_function()` returns the reduced data in each bin $i$ and the bin edges (the bin $i$ is defined as $\{q: q_i \leq q < q_{i+1}\}$). To plot over the bin midpoints, we compute them as $\zeta_i = \frac{1}{2}(q_i + q_{i+1})$. # In[13]: ax = plt.subplot(111) ax.fill_between(zeta, mean_r - std_r, mean_r + std_r, color="blue", alpha=0.3) ax.plot(zeta, mean_r, color="blue", lw=2) ax.set_xlabel(r"pore coordinate $\zeta$ ($\AA$)") ax.set_ylabel(r"HOLE radius $R$ ($\AA$)"); # This is a plot of the mean pore radius (solid line) with the bands showing the standard deviation of the radius over the trajectory. # ## Further reading # * [MDAnalysis.analysis.hole](https://www.mdanalysis.org/docs/documentation_pages/analysis/hole.html) documentation # * For more advanced uses see the notebook [Using MDAnalysis and HOLE to create pore profiles along structural order parameters](https://nbviewer.jupyter.org/github/MDAnalysis/binder-notebook/blob/master/analysis/hole-orderparameter.ipynb) # # In[ ]: