#!/usr/bin/env python
# coding: utf-8

# # Hole module: Basic trajectory analysis
# 
# [MDAnalysis](https://www.mdanalysis.org) contains the [MDAnalysis.analysis.hole](https://www.mdanalysis.org/docs/documentation_pages/analysis/hole.html) module that uses Oliver Smart's [HOLE](http://www.holeprogram.org/) program. This notebook shows how to make use of the basic functionality in `MDAnalysis.analysis.hole`. We will analyze the fluctuations in the pore radius profile of an ion channel (gramicidin A) from a simulation trajectory.

# In[1]:


import numpy as np
import matplotlib.pyplot as plt
get_ipython().run_line_magic('matplotlib', 'inline')
plt.style.use('ggplot')

import MDAnalysis as mda
import MDAnalysis.analysis.hole


# In[2]:


mda.start_logging()


# I am testing HOLE on a simple "trajectory": I submitted gramicidin A (1grm, test file `1gmr_single.pdb`) to the [ElNemo](http://www.sciences.univ-nantes.fr/elnemo/) elastic network server. From an ENM it computed a number of modes and provided multi-frame PDB files showing these modes. I just picked the first mode (which is labelled mode 7), which shows a twist motion along the pore axis. Note that this "trajectory" is exaggerated and does not correspond to real protein motion.
# 
# The trajectory is included with the MDAnalysis test files.

# In[3]:


from MDAnalysis.tests.datafiles import MULTIPDB_HOLE
u = mda.Universe(MULTIPDB_HOLE)


# ## Running HOLE
# You need to install `HOLE` first, which is available from http://www.holeprogram.org/ under the Apache open source license.
# 
# Set up the `HOLEtraj` class; provide a path to the `hole` executable.

# In[4]:


H = MDAnalysis.analysis.hole.HOLEtraj(u, executable="~/hole2/exe/hole")


# Then run `hole` on the individual frames (this takes a while...)

# In[5]:


H.run()


# The trajectory contains 11 frames. We are typically interested in the *HOLE profile*, i.e., a plot of the pore radius as determined by HOLE against the reaction coordinate (often close to $z$ along the pore axis for simple channels).
# 
# All profiles are stored as
# ```python
# H.profiles
# ```
# which is a `dict`. The keys are the frame numbers (or a user supplied reaction coordinate for the conformation). The values are record arrays with columns `frame` (the reaction coordinate for the conformation), `rxncoord` (the HOLE pore reaction coordinate), and `radius` (radius in Angstrom).
# 
# 
# The `H.plot()` method plots all the profiles together, the `H.plot3D()` stacks them:

# In[6]:


ax = H.plot()
ax.set_ylim(1, 4)
ax.set_xlim(-15, 25)
plt.tight_layout()


# In[9]:


H.plot3D(rmax=4)


# ## Working with HOLE profiles
# The output from HOLE is a pore profile $R(\zeta)$ (radius $R$ vs pore coordinate $\zeta$).
# 

# ### Pore profile data structure
# All pore profiles are stored in a `dict` named `H.profiles` that is indexed by the frame number. Each pore profile itself is a array containig rows with *frame number* (all identical), *pore coordinate*, and *pore radius*. For more details, see the docs on  [HOLE data structures](http://pythonhosted.org/MDAnalysis/documentation_pages/analysis/hole.html#data-structures).

# H.profiles

# ### Accessing profiles on a per-frame basis
# To get at individual profiles in we can iterate over `H.profiles`. However, as a convenience, just iterating over the `HOLEtraj` instance itself will return the profiles in *frame-sorted order* (thanks to [HOLEtraj.sorted_profiles_iter()](http://pythonhosted.org/MDAnalysis/documentation_pages/analysis/hole.html#MDAnalysis.analysis.hole.HOLEtraj.sorted_profiles_iter)).
# 
# For example, getting the minimum radius 

# In[7]:


for frame, profile in H:
    print("frame {}: min(R) = {}".format(frame, profile.radius.min()))


# (For a slightly more advanced [analysis of minimum pore radius as function of a order parameter](http://nbviewer.jupyter.org/gist/orbeckst/a0c856f58059112538591af108df6d59#Analysis-of-the-minimum-pore-radius) see the [Advanced HOLE Hacking Notebook](http://nbviewer.jupyter.org/gist/orbeckst/a0c856f58059112538591af108df6d59).)
# 
# TODO: Change these links to Binder notebooks.

# ###  Extracting data to a file
# Sometimes you want to have the pore profiles in a simple data file. Here we are using the popular [XMGRACE](http://plasma-gate.weizmann.ac.il/Grace/) XY format where each row will contain
# ```
#    pore_coordinate   pore_radius
# ```
# and datasets for frames are separated by
# ```
# &
# ```

# In[8]:


with open("profiles.xvg", "w") as xvg:
    for frame, profile in H:
        for _, zeta, radius in profile:
            xvg.write("{0} {1}\n".format(zeta, radius))
        xvg.write("&\n")


# The corresponding output file looks like this:

# In[9]:


print("".join(open("profiles.xvg").readlines()[465:485]))


# One can use the above code to write the profiles in any format; for instance, it would be easy to write each frame's profile to a separate file by moving the `with` statement *inside* the outer `for` loop.

# ## Creating an "average" HOLE profile 

# At the moment (MDAnalysis 0.17.0) does not have a method to compute an average HOLE profile. However, in the following I demonstrate how this can be accomplished using the [numkit.timeseries.regularized_function](http://gromacswrapper.readthedocs.io/en/latest/numkit/timeseries.html#numkit.timeseries.regularized_function) (a function found in [GromacsWrapper](http://gromacswrapper.readthedocs.io/)) but we can also use it independently (I copied it from [numkit/timeseries.py](https://github.com/Becksteinlab/GromacsWrapper/blob/develop/numkit/timeseries.py#L519)):

# In[10]:


import numpy

def regularized_function(x, y, func, bins=100, range=None):
    """Compute *func()* over data aggregated in bins.
    ``(x,y) --> (x', func(Y'))``  with ``Y' = {y: y(x) where x in x' bin}``
    First the data is collected in bins x' along x and then *func* is
    applied to all data points Y' that have been collected in the bin.
    .. function:: func(y) -> float
       *func* takes exactly one argument, a numpy 1D array *y* (the
       values in a single bin of the histogram), and reduces it to one
       scalar float.
    .. Note:: *x* and *y* must be 1D arrays.
    :Arguments:
       x
          abscissa values (for binning)
       y
          ordinate values (func is applied)
       func
          a numpy ufunc that takes one argument, func(Y')
       bins
          number or array
       range
          limits (used with number of bins)
    :Returns:
       F,edges
          function and edges (``midpoints = 0.5*(edges[:-1]+edges[1:])``)
    (This function originated as
    :func:`recsql.sqlfunctions.regularized_function`.)
    """
    _x = numpy.asarray(x)
    _y = numpy.asarray(y)

    if len(_x.shape) != 1 or len(_y.shape) != 1:
        raise TypeError("Can only deal with 1D arrays.")

    # setup of bins (taken from numpy.histogram)
    if (range is not None):
        mn, mx = range
        if (mn > mx):
            raise AttributeError('max must be larger than min in range parameter.')

    if not numpy.iterable(bins):
        if range is None:
            range = (_x.min(), _x.max())
        mn, mx = [float(mi) for mi in range]
        if mn == mx:
            mn -= 0.5
            mx += 0.5
        bins = numpy.linspace(mn, mx, bins+1, endpoint=True)
    else:
        bins = numpy.asarray(bins)
        if (numpy.diff(bins) < 0).any():
            raise ValueError('bins must increase monotonically.')

    sorting_index = numpy.argsort(_x)
    sx = _x[sorting_index]
    sy = _y[sorting_index]

    # boundaries in SORTED data that demarcate bins; position in bin_index is the bin number
    bin_index = numpy.r_[sx.searchsorted(bins[:-1], 'left'),
                         sx.searchsorted(bins[-1], 'right')]

    # naive implementation: apply operator to each chunk = sy[start:stop] separately
    #
    # It's not clear to me how one could effectively block this procedure (cf
    # block = 65536 in numpy.histogram) because there does not seem to be a
    # general way to combine the chunks for different blocks, just think of
    # func=median
    F = numpy.zeros(len(bins)-1)  # final function
    F[:] = [func(sy[start:stop]) for start,stop in zip(bin_index[:-1],bin_index[1:])]
    return F,bins


# The strategy is to collect all data and the histogram the radii over the HOLE reactioncoordinate:

# In[11]:


rxncoord = np.concatenate([profile.rxncoord for frame, profile in H.sorted_profiles_iter()])
radii = np.concatenate([profile.radius for frame, profile in H.sorted_profiles_iter()])


# Now use `regularized_function`.
# 
# Normally I would import it as
# ```python
# from numkit.timeseries import regularized_function
# ```
# but for this notebook it is not necessary to install [numkit](https://github.com/Becksteinlab/numkit) (`pip install numkit`...).

# The function to reduce the data in each bin is either just the mean (`np.mean`) or the standard deviation (`np.std`). (For more ideas what else to use see the numkit notes on [coarse graining timeseries](http://gromacswrapper.readthedocs.io/en/latest/numkit/timeseries.html#coarse-graining-time-series).)

# In[12]:


mean_r, q = regularized_function(rxncoord, radii, np.mean, bins=100)
std_r, q = regularized_function(rxncoord, radii, np.std, bins=100)
zeta = 0.5*(q[1:] + q[:-1])


# Note that `regularized_function()` returns the reduced data in each bin $i$ and the bin edges (the bin $i$ is defined as $\{q: q_i \leq q < q_{i+1}\}$). To plot over the bin midpoints, we compute them as $\zeta_i = \frac{1}{2}(q_i + q_{i+1})$.

# In[13]:


ax = plt.subplot(111)
ax.fill_between(zeta, mean_r - std_r, mean_r + std_r, color="blue", alpha=0.3)
ax.plot(zeta, mean_r, color="blue", lw=2)
ax.set_xlabel(r"pore coordinate $\zeta$ ($\AA$)")
ax.set_ylabel(r"HOLE radius $R$ ($\AA$)");


# This is a plot of the mean pore radius (solid line) with the bands showing the standard deviation of the radius over the trajectory.

# ## Further reading
# * [MDAnalysis.analysis.hole](https://www.mdanalysis.org/docs/documentation_pages/analysis/hole.html) documentation
# * For more advanced uses see the notebook [Using MDAnalysis and HOLE to create pore profiles along structural order parameters](https://nbviewer.jupyter.org/github/MDAnalysis/binder-notebook/blob/master/analysis/hole-orderparameter.ipynb)
# 

# In[ ]: