#!/usr/bin/env python
# coding: utf-8

# In[1]:


from IPython.display import display, HTML


# # Calling C Libraries from Numba Using CFFI
# 
# Joshua L. Adelman
# 11 February 2016
# 
# **TL;DR** - The python CFFI library provides an easy and efficient way to call C code from within a function jitted (just-in-time compiled) by Numba. This makes it simple to produce fast code with functionality that is not yet available directly in Numba. As a simple demonstration, I wrap several statistical functions from the Rmath library. 

# ## Background and Motivation
# 
# A large fraction of the code that I write has a performance requirement attached to it. Either I'm churning through a large amount of data in an analysis pipeline, or it is part of a real-time system and needs to complete a specific calculation in a constrained amount of time. Sometimes I can rely on numpy, pandas or existing Python packages that wrap C or Fortran code under the covers to get sufficient performance. Often times though, I'm dealing with algorithms that are difficult to implement efficiently using these tools.
# 
# Since I started coding primarily in Python ~6 years ago, in those instances I'd typically reach for [Cython](http://cython.org/) to either wrap something I or others wrote in C/C++/Fortran or to provide sufficient type information to my code so that Cython could generate a performant C-extension that I could call from Python. Although Cython has been a pretty rock solid solution for me, the amount of boilerplate often required and some of the strange semantic of mixing python and low-level C code often feels less than ideal. I also collaborate with people who know Python, but don't have backgrounds in C and/or haven't had enough experience with Cython to understand how it all fits together. 
# 
# More and more frequently, I find myself using [Numba](http://numba.pydata.org/) in instances that I had traditionally used Cython. In short, through a simple decorator mechanism, Numba converts a subset of Python code into efficient machine code using LLVM. It uses type inference so you don't have to specify the type of every variable in a function like you do in Cython to generate fast code. This subset primarily deals with numerical code operating on scalars or Numpy arrays, but that covers 95% of the cases where I need efficient code so it does not feel that limiting. That said, the most common mistake I see people making with Numba is trying to use it as a general Python compiler and then being confused/disappointed when it doesn't speed up their code. The library has matured incredibly over the last 6-12 months to the point where at work we have it deployed in a couple of critical pieces of production code. When I first seriously prototyped it maybe a year and a half ago, it was super buggy and missing a number of key features (e.g. caching of jitted functions, memory management of numpy arrays, etc). But now it feels stable and I rarely run into problems, although I've written a very extensive unit test suite for every bit of code that it touches.
# 
# One of the limitations that I do encounter semi-regularly though is when I need some specialized function that is available in Numpy or Scipy, but that function has not been re-implemented in the Numba core library so it can be called in the so-called ["nopython" mode](http://numba.pydata.org/numba-doc/0.23.1/glossary.html#term-nopython-mode). Basically this means that if you want to call one of these functions, you have to go through Numba's [object mode](http://numba.pydata.org/numba-doc/0.23.1/glossary.html#term-object-mode), which typically cannot generate nearly as efficient code.
# 
# While there is a [proposal](http://numba.pydata.org/numba-doc/dev/proposals/extension-points.html) [under development](http://numba.pydata.org/numba-doc/dev/extending/index.html) that should allow external libraries to define an interface to make usable in nopython mode, it is not complete and will them require adoption within the larger Scipy/PyData communities. I'm looking forward to that day, but currently you have to choose a different option. The first is to re-implement a function yourself using Numba. This is often possible for functionality that is small and limited in scope, but for anything non-trivial this approach can rapidly become untenable.  
# 
# In the remainder of this notebook, I'm going to describe a second technique that involves using [CFFI](https://cffi.readthedocs.org) to call external C code directly from within Numba jitted code. This turns out to be a really great solution if the functionality you want has already been written either in C or a language with a C interface. It is mentioned [in the Numba docs](http://numba.pydata.org/numba-doc/0.23.1/reference/pysupported.html#cffi), but there aren't any examples that I have seen, and looking at the tests only helped a little. 
# 
# I had not used CFFI before integrating it with Numba for a recent project. I had largely overlooked it for two reasons: (1) Cython covered the basic usecase of exposing external C code to python and I was already very comfortable with Cython, and (2) I had the (incorrect) impression that CFFI was mostly useful in the PyPy ecosystem. Since PyPy is a non-starter for all of my projects, I largely just ignored its existence. I'm thankfully correcting that mistake now. 

# ## Rmath. It's not just for R
# 
# Every once in a while I fire up R, usually through rpy2, to do something that I can't do using Statsmodel or Scikit-Learn. But for the most part I live squarely in the Python world, and my experience with R is rudimentary. So it wasn't totally surprising that I only recently discovered that the math library that underpins R, Rmath, can be built in a standalone mode without invoking R at all. In fact, the [Julia](http://julialang.org/) programming language uses Rmath for its probability distributions library and maintains a fork of the package called [Rmath-julia](https://github.com/JuliaLang/Rmath-julia).
# 
# Discovering Rmath over the summer, led to the following tweet (apologies for the Jupyter Notebook input cell) and a horrific amalgamation of code that worked, but was pretty difficult to maintain and extend:

# In[2]:


display(HTML('''<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Today&#39;s 
the sort of day that I wrote C code that used the Rmath-julia library and then called that from Python via Cython. 
Don&#39;t ask</p>&mdash; Joshua Adelman (@synapticarbors) 
<a href="https://twitter.com/synapticarbors/status/619684147544621056">July 11, 2015</a>
</blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>'''))


# As I began to introduce more and more Numba into various code bases at work, I recently decided to revisit this particular bit and see if I could re-implement the whole thing using Numba + CFFI + Rmath. This would cut out the C code that I wrote, the Cython wrapper that involved a bunch of boilerplate strewn across multiple .pyx and .pxd files, and hopefully would make the code easier to extend in the future by people who didn't know C or Cython, but could write some Python and apply the appropriate Numba jit decorator.
# 
# So to begin with, I vendorized the whole Rmath-julia library into our project under `externals/Rmath-julia`. I'll do the same here in this example. Now the fun begins...

# ## Building the Rmath library using CFFI
# 
# Since we are going to use what cffi calls the "API-level, out-of-line", we need to define a build script (`build_rmath.py`) that we will use to compile the Rmath source and produce an importable extension module. The notebook "cell magic", `%%file` will write the contents of the below cell to a file.

# In[3]:


get_ipython().run_cell_magic('file', 'build_rmath.py', "\nimport glob\nimport os\nimport platform\n\nfrom cffi import FFI\n\n\ninclude_dirs = [os.path.join('externals', 'Rmath-julia', 'src'),\n                os.path.join('externals', 'Rmath-julia', 'include')]\n\nrmath_src = glob.glob(os.path.join('externals', 'Rmath-julia', 'src', '*.c'))\n\n# Take out dSFMT dependant files; Just use the basic rng\nrmath_src = [f for f in rmath_src if ('librandom.c' not in f) and ('randmtzig.c' not in f)]\n\nextra_compile_args = ['-DMATHLIB_STANDALONE']\nif platform.system() == 'Windows':\n    extra_compile_args.append('-std=c99')\n\nffi = FFI()\nffi.set_source('_rmath_ffi', '#include <Rmath.h>',\n        include_dirs=include_dirs,\n        sources=rmath_src,\n        libraries=[],\n        extra_compile_args=extra_compile_args)\n\n# This is an incomplete list of the available functions in Rmath\n# but these are sufficient for our example purposes and gives a sense of\n# the types of functions we can get\nffi.cdef('''\\\n// Normal Distribution\ndouble dnorm(double, double, double, int);\ndouble pnorm(double, double, double, int, int);\n\n// Uniform Distribution\ndouble dunif(double, double, double, int);\ndouble punif(double, double, double, int, int);\n\n// Gamma Distribution\ndouble dgamma(double, double, double, int);\ndouble pgamma(double, double, double, int, int);\n''')\n\nif __name__ == '__main__':\n    # Normally set verbose to `True`, but silence output\n    # for reduced notebook noise\n    ffi.compile(verbose=False)\n")


# Then we simply run the script as below assuming we have a properly configured C compiler on our system. For larger projects [integration with setuptools is supported](http://cffi.readthedocs.org/en/release-1.5/cdef.html#preparing-and-distributing-modules). The exclamation point tells the notebook to execute the following command in a system shell.

# In[4]:


get_ipython().system('python build_rmath.py')


# We should now have an extension module named `_rmath_ffi` that gives us access to the functions whose prototypes we enumerated in the `ffi.cdef(...)`.

# ## An example of replicating scipy.stats with Numba
# 
# Now that we have built our module wrapping Rmath using cffi, we can write Numba jit-able versions of scipy stats functions that we can call without additional overhead

# In[5]:


import numpy as np
import numba as nb

import scipy.stats

# Import our Rmath module
import _rmath_ffi


# Now we can define a number of shorter aliases to the Rmath functions for use in the python namespace

# In[6]:


dnorm = _rmath_ffi.lib.dnorm
pnorm = _rmath_ffi.lib.pnorm

dunif = _rmath_ffi.lib.dunif
punif = _rmath_ffi.lib.punif

dgamma = _rmath_ffi.lib.dgamma
pgamma = _rmath_ffi.lib.pgamma


# In order for us to use these methods from within a function that we've jit-ed with Numba, we need to import `cffi_support` and register the module: 

# In[7]:


from numba import cffi_support

cffi_support.register_module(_rmath_ffi)


# I'll start off by writing a function that is equivalent to `scipy.stats.norm.cdf` using two different styles. In the first, `pnorm_nb`, I'll make the assumption that I'm going to be working on a 1d array, and in the second, I'll use `numba.vectorize` to lift that constraint and create a universal function that can operate on arbitrary dimensional numpy arrays:

# In[8]:


@nb.jit(nopython=True)
def pnorm_nb(x):
    y = np.empty_like(x)
    for k in xrange(x.shape[0]):
        y[k] = pnorm(x[k], 0.0, 1.0, 1, 0)
        
    return y

@nb.vectorize(nopython=True)
def pnorm_nb_vec(x):
    return pnorm(x, 0.0, 1.0, 1, 0)


# In[9]:


x = np.random.normal(size=(100,))

y1 = scipy.stats.norm.cdf(x)
y2 = pnorm_nb(x)
y3 = pnorm_nb_vec(x)

# Check that they all give the same results
print np.allclose(y1, y2)
print np.allclose(y1, y3)


# And now let's do the same calculation for 2D data, demonstrating that the vectorized form of the Numba function automatically create the appropriate universal function for the given dimensionality of the inputs:

# In[10]:


x = np.random.normal(size=(100,100))

y1 = scipy.stats.norm.cdf(x)
y2 = pnorm_nb_vec(x)

# Check that they all give the same results
print np.allclose(y1, y2)


# Timing the scipy and numba versions:

# In[11]:


get_ipython().run_line_magic('timeit', 'scipy.stats.norm.cdf(x)')
get_ipython().run_line_magic('timeit', 'pnorm_nb_vec(x)')


# We can see that our Numba version is almost 2x faster than the scipy version, with the added bonus that it can be called from within other Numba-ized methods without going through the python object layer, which can be quite slow.
# 
# Just for kicks, lets also try to take advantage of multiple cores using the `target` argument:

# In[12]:


@nb.vectorize([nb.float64(nb.float64),], nopython=True, target='parallel')
def pnorm_nb_vec_parallel(x):
    return pnorm(x, 0.0, 1.0, 1, 0)


# In[13]:


y3 = pnorm_nb_vec_parallel(x)
print np.allclose(y1, y3)
print

get_ipython().run_line_magic('timeit', 'pnorm_nb_vec_parallel(x)')


# So on my laptop with 4 physical cores, we get a nice additional speed-up over the serial numba version and the `scipy.stats` function.
# 
# Finally, I'm going to programmatically wrap the all of the Rmath functions I exposed and compare them to the equivalent scipy functions.

# In[14]:


from collections import OrderedDict

func_map = OrderedDict([
    ('norm_pdf', (scipy.stats.norm.pdf, dnorm)),
    ('norm_cdf', (scipy.stats.norm.cdf, pnorm)),
    ('unif_pdf', (scipy.stats.uniform.pdf, dunif)),
    ('unif_cdf', (scipy.stats.uniform.cdf, punif)),
    ('gamma_pdf', (scipy.stats.gamma.pdf, dgamma)),
    ('gamma_cdf', (scipy.stats.gamma.cdf, pgamma)),
    ])

def generate_function(name, rmath_func):
    if 'norm' in name or 'unif' in name:
        def cdf_func(x):
            return rmath_func(x, 0.0, 1.0, 1, 0)

        def pdf_func(x):
            return rmath_func(x, 0.0, 1.0, 0)
    
    elif 'gamma' in name:
        def cdf_func(x, shape):
            return rmath_func(x, shape, 1.0, 1, 0)

        def pdf_func(x, shape):
            return rmath_func(x, shape, 1.0, 0)
        
    
    if 'cdf' in name:
        return cdf_func
    elif 'pdf' in name:
        return pdf_func

x = np.random.normal(size=(100,100))
    
for name, (scipy_func, rmath_func) in func_map.iteritems():
    nb_func = nb.vectorize(nopython=True)(generate_function(name, rmath_func))
    
    
    print name
    if 'norm' in name or 'unif' in name:
        y1 = scipy_func(x)
        y2 = nb_func(x)
        print 'allclose: ', np.allclose(y1, y2)
        
        print 'scipy timing:'
        get_ipython().run_line_magic('timeit', 'scipy_func(x)')
        print 'numba timing:'
        get_ipython().run_line_magic('timeit', 'nb_func(x)')
    elif 'gamma' in name:
        shape = 1.0
        y1 = scipy_func(x, shape)
        y2 = nb_func(x, shape)
        print 'allclose: ', np.allclose(y1, y2)
        
        print 'scipy timing:'
        get_ipython().run_line_magic('timeit', 'scipy_func(x, shape)')
        print 'numba timing:'
        get_ipython().run_line_magic('timeit', 'nb_func(x, shape)')
    print


# ## Conclusion
# 
# To wrap up, CFFI + Numba provides a powerful and surprisingly simple way to generate fast python code and extend the currently limited repertoire of functionality that is baked into Numba. Pairing this approach with Rmath specifically has been particularly useful in my own work.

# ## Appendix
# 
# For completeness, I'll use Sebastian Raschka's watermark package to specify the libraries and hardware used to run these examples:

# In[ ]:


get_ipython().run_line_magic('install_ext', 'https://raw.githubusercontent.com/rasbt/watermark/master/watermark.py')


# In[16]:


get_ipython().run_line_magic('load_ext', 'watermark')
get_ipython().run_line_magic('watermark', '-vm -p numpy,scipy,numba,cffi')


# In[ ]: