Faster Computations with Numba

Some notes mostly for myself

Altough Python is fast compared to other high-level languages, it still is noat as fast as C, C++ or Fortran. Luckily, two open source projects Numba and Cython can be used to speed-up computations. Numba is sponsored by the producer of Anaconda, Continuum Analytics. Both projects allow you to convert your code to interpreted language so that it runs faster. Here I will use Numba, given its ease and Just-In-Time nature, although I still have to figure out how to save and use the compiled functions (pycc and static libraries it seems?). Cython on the other hand needs you to understand much more of C and Ctypes. But once you figure out a way to run your code and compile it, you have a library ready to use and import. See the code for the repository for my paper Optimal consumption under uncertainty, liquidity constraints, and bounded rationality where I used Cython for the dynsysf.pyx function.

Numba

Numba is quite easy to use. Start by importing it

import numba as nb

or some of its functions

from numba import jit, autojit, njit

Then define your function and notate it using the Numba commands jit, njit, autojit or their decorators @jit, @njit.

Examples:

In [455]:
from numba import jit, njit, autojit
import numba as nb
import math

# A simple funtion that computes the maximum distance between two vectors
@nb.jit('f8(f8[:],f8[:])')
def errnb(V0,V1):
    maxi=0.0
    for i in range(V0.shape[0]):
        m=abs(V1[i]-V0[i])
        if m>=maxi:
            maxi=m
    return maxi

x=np.random.random((1000))
y=np.random.random((1000))

%timeit errnb(x,y)
%timeit np.max(np.abs(x-y))
100000 loops, best of 3: 4.13 µs per loop
10000 loops, best of 3: 21.7 µs per loop
In [ ]:
# Let's use Numba to compute a Cobb-Douglas function
@jit
def CobbDouglas(K,L,A,alpha,beta):
    return A * K**alpha * L**beta

K,L,A,alpha,beta = 2,2,1,.3,.7

CobbDouglas(K,L,A,alpha,beta)

Let's time it and compare with Numpy or simple Python

In [ ]:
%timeit A * K**alpha * L**beta
In [ ]:
%timeit CobbDouglas(K,L,A,alpha,beta)

Well not fast enough...why?

In [ ]:
CobbDouglas.inspect_types()

OK...it is correctly compiled and it is using Numba types and not PyObjects. So perhaps we cannot gain much in this simple computation...but that is ok. Let's try something a little more complex, computing a CobbDouglas for vectors K and L

In [ ]:
# Python function
def CobbDouglas(K,L,A,alpha,beta):
    return A * K**alpha * L**beta

CobbDouglas_nb=autojit(CobbDouglas)
CobbDouglas_nb2=njit(CobbDouglas)
In [ ]:
K=np.random.random((100,1))
L=np.random.random((100,1))
In [ ]:
%timeit CobbDouglas(K,L,A,alpha,beta)
In [ ]:
%timeit CobbDouglas_nb(K,L,A,alpha,beta)
In [ ]:
%timeit CobbDouglas_nb2(K,L,A,alpha,beta)

Ooops what went wrong? Up to V.0.12.2 Numba cannot create Arrays in njit mode, i.e. without using Python Objects. But do not despair, we can create a fast CobbDouglas function by iterating over our vectors. Here I first create the CobbDouglasNB function which takes floating point numbers and tells Numba not to use PyObjects, and then I create the vectorized version CobbDouglasVecNB, which iterates over the elements of K and L (here we assume both are vectors) and returns our output Y. I know it is strange that we have to giv ethe output as part of the function, but it allows us to speed up the computations.

In [ ]:
@nb.njit('f8(f8,f8,f8,f8,f8)')
def CobbDouglasNB(K,L,A,alpha,beta):
    return A * K**alpha * L**beta

@nb.jit('f8[:](f8[:],f8[:],f8[:],f8,f8,f8)')
def CobbDouglasVecNB(Y,K,L,A,alpha,beta):
    for i in range(K.shape[0]):
        Y[i]=CobbDouglasNB(K[i],L[i],A,alpha,beta)
    return Y
In [ ]:
K.mean()
In [ ]:
%timeit CobbDouglas(K[:,0],L[:,0],A,alpha,beta)
In [ ]:
Y=np.zeros_like(K)
%timeit CobbDouglasVecNB(Y[:,0],K[:,0],L[:,0],A,alpha,beta)

Twice as fast as Numpy!

Let's generalize the function to arrays of 2 dimensions

In [ ]:
@nb.jit('f8[:,:](f8[:,:],f8[:,:],f8[:,:],f8,f8,f8)')
def CobbDouglasVecNB(Y,K,L,A,alpha,beta):
    for i in range(K.shape[0]):
        for j in range(K.shape[1]):
            Y[i,j]=CobbDouglasNB(K[i,j],L[i,j],A,alpha,beta)
    return Y
In [ ]:
K=np.random.random((1000,1000))
L=np.random.random((1000,1000))
%timeit CobbDouglas(K,L,A,alpha,beta)
In [ ]:
Y=np.zeros_like(K)
%timeit CobbDouglasVecNB(Y,K,L,A,alpha,beta)

20% faster than Numpy...so we get the idea.

While this is great to speed up, we see that there are some issues one has to think about in order to get those gains.

Now let's use the vectorize option, which creates Numpy Ufunctions, that is functions that operate on vectors (or at leats that s what I gather).

In [ ]:
@nb.vectorize('f8(f8,f8,f8,f8,f8,)')
def CobbDouglasNB2(K,L,A,alpha,beta):
    return A * K**alpha * L**beta
In [ ]:
%timeit CobbDouglasNB2(K,L,A,alpha,beta)

Let's compare these functions on different sizes of matrices $K$ and $L$

In [ ]:
K=np.random.random((1e4,1e4))
L=np.random.random((1e4,1e4))
Y=np.zeros_like(K)
In [ ]:
%timeit CobbDouglas(K[1:100,1:100],L[1:100,1:100],A,alpha,beta)
In [ ]:
%timeit CobbDouglasVecNB(Y[1:100,1:100],K[1:100,1:100],L[1:100,1:100],A,alpha,beta)
In [ ]:
%timeit CobbDouglasNB2(K[1:100,1:100],L[1:100,1:100],A,alpha,beta)
In [ ]:
%timeit CobbDouglas(K[1:1000,1:1000],L[1:1000,1:1000],A,alpha,beta)
In [ ]:
%timeit CobbDouglasVecNB(Y[1:1000,1:1000],K[1:1000,1:1000],L[1:1000,1:1000],A,alpha,beta)
In [ ]:
%timeit CobbDouglasNB2(K[1:1000,1:1000],L[1:1000,1:1000],A,alpha,beta)

Now let's try the CRRA utility function

In [ ]:
@nb.jit('f8(f8,f8)')
def U_numba(c,sigma):
    '''This function returns the value of utility when the CRRA
    coefficient is sigma. I.e. 
    u(c,sigma)=(c**(1-sigma)-1)/(1-sigma) if sigma!=1 
    and 
    u(c,sigma)=ln(c) if sigma==1
    Usage: u(c,sigma)
    '''
    if sigma!=1:
        u=(c**(1-sigma)-1)/(1-sigma)
    else:
        u=0
    return u

@nb.jit('f8[:,:](f8[:,:],f8)')
def Uvec(u,c,sigma):
    if sigma!=1:
        for i in range(c.shape[0]):
            for j in range(c.shape[1]):
                u[i,j]=U_numba(c[i,j],sigma)
    else:
        u=np.log(c)
    return u

@nb.jit('f8[:,:](f8[:,:],f8)')
def U(c,sigma):
    if sigma!=1:
        u=Unb(c,sigma)#(c**(1-sigma)-1)/(1-sigma)
    else:
        u=np.log(c)
    return u

def Unp(c,sigma):
    if sigma!=1:
        u=(c**(1-sigma)-1)/(1-sigma)
    else:
        u=np.log(c)
    return u

@nb.vectorize('f8(f8,f8)')
def Unb(c,sigma):
    if sigma!=1:
        u=(c**(1-sigma)-1)/(1-sigma)
    else:
        u=math.log(c)
    return u
In [ ]:
%timeit U(grid,1)
In [ ]:
%timeit Unp(grid,1)
In [ ]:
%timeit Unb(grid,1)
In [ ]:
u=np.zeros_like(grid)
%timeit Uvec(u,grid,1)
In [ ]:
%timeit U(grid,3)
In [ ]:
%timeit Unb(grid,3)
In [ ]:
%timeit Unp(grid,3)
In [ ]:
u=np.zeros_like(grid)
%timeit Uvec(u,grid,3)

Optimizing the code for Dynamic Programming and GIS

Optimal Growth

In [292]:
# Parameters Optimal Growth
alpha=.3
beta=.9
sigma=1
delta=1
A=1

# Grid of values for state variable over which function will be approximated
gridmin, gridmax, gridsize = 0.1, 5, 300
grid = np.linspace(gridmin, gridmax**1e-1, gridsize)**10

# Parameters for the optimization procedures
count=0
maxiter=1000
tol=1e-6
print('tol=%f' % tol)
print(grid.shape)
tol=0.000001
(300,)
In [355]:
import numba as nb
from scipy.optimize import fminbound
from scipy import interp

# Auxiliary functions 
# Maximize function V on interval [a,b]
def maximum(V, a, b, args=[]):
    return float(V(fminbound(lambda x: -V(x), a, b,args=args)))

# Return Maximizer of function V on interval [a,b]
def maximizer(V, a, b, args=[]):
    return float(fminbound(lambda x: -V(x), a, b, args=args))

# Interpolation functions Class
class LinInterp:
    "Provides linear interpolation in one dimension."

    def __init__(self, X, Y):
        """Parameters: X and Y are sequences or arrays
        containing the (x,y) interpolation points.
        """
        self.X, self.Y = X, Y

    def __call__(self, z):
        """Parameters: z is a number, sequence or array.
        This method makes an instance f of LinInterp callable,
        so f(z) returns the interpolation value(s) at z.
        """
        if isinstance(z, int) or isinstance(z, float):
            return interp ([z], self.X, self.Y)[0]
        else:
            return interp(z, self.X, self.Y)

@nb.jit('f8[:](f8[:],f8)')
def U(c,sigma):
    if sigma!=1:
        u=Unb(c,sigma)#(c**(1-sigma)-1)/(1-sigma)
    else:
        u=np.log(c)
    return u

@nb.vectorize('f8(f8,f8)')
def Unb(c,sigma):
    if sigma!=1:
        u=(c**(1-sigma)-1)/(1-sigma)
    else:
        u=math.log(c)
    return u

@nb.vectorize('f8(f8,f8,f8,f8,f8,)')
def CobbDouglasNB2(K,L,A,alpha,beta):
    '''CobbDouglasNB2(K,L,A,alpha,beta)'''
    return A * K**alpha * L**beta

@nb.vectorize('f8(f8,f8,f8)')
def F_nb(K,alpha,A):
    '''
    Cobb-Douglas production function
    F(K)=A* K^alpha
    '''
    return A * K**alpha

def F_np(K,alpha,A):
    '''
    Cobb-Douglas production function
    F(K)=A* K^alpha
    '''
    return A * K**alpha
In [294]:
%timeit CobbDouglasNB2(grid,1,A,alpha,0)
10000 loops, best of 3: 34.2 µs per loop
In [295]:
%timeit F_nb(grid,alpha,A)
10000 loops, best of 3: 27.7 µs per loop
In [296]:
%timeit F_np(grid,alpha,A)
10000 loops, best of 3: 29.7 µs per loop
In [412]:
V0=LinInterp(grid,U(grid,sigma))
def bellman(x,w):
    """The approximate Bellman operator.
    Parameters: w is a LinInterp object (i.e., a 
    callable object which acts pointwise on arrays).
    Returns: An instance of LinInterp that represents the optimal operator.
    w is a function defined on the state space.
    """
    vals = []
    for k in x:
        kmax=F_nb(k,alpha,A)
        h = lambda kp: U(kmax + (1-delta) * k - kp,sigma) + beta * w(kp)
        vals.append(maximum(h, 0, kmax))
    return LinInterp(grid, vals)

def policy(x,w):
    """
    For each function w, policy(w) returns the function that maximizes the 
    RHS of the Bellman operator.
    Replace w for the Value function to get optimal policy.
    The approximate optimal policy operator w-greedy (See Stachurski (2009)). 
    Parameters: w is a LinInterp object (i.e., a 
    callable object which acts pointwise on arrays).
    Returns: An instance of LinInterp that captures the optimal policy.
    """
    vals = []
    for k in x:
        kmax=F_nb(k,alpha,A)
        h = lambda kp: U(kmax + (1-delta) * k - kp,sigma) + beta * w(kp)
        vals.append(maximizer(h, 0, kmax))
    return LinInterp(grid, vals)

def solve():
    count=0
    V0=LinInterp(grid,U(grid,sigma))
    while count<maxiter:
        V1=bellman(grid,V0)
        err=np.max(np.abs(np.array(V1(grid))-np.array(V0(grid))))
        V0=V1
        count+=1
        if err<tol:
            print(count)
            break
    return V0


V0=bellman(grid,V0)
C0=policy(grid,V0)
%timeit bellman(grid,V0)
%timeit policy(grid,V0)
%timeit solve()
1 loops, best of 3: 681 ms per loop
1 loops, best of 3: 669 ms per loop
138
138
138
138
1 loops, best of 3: 1min 33s per loop
In [407]:
def Utrf(kp,kmax,k,sigma,w):
    return -U(kmax + (1-delta) * k - kp,sigma)-beta*w(kp)

@nb.jit('f8[:](f8[:],f8[:])')
def bellmannb(x,w0):
    """The approximate Bellman operator.
    Parameters: w is a LinInterp object (i.e., a 
    callable object which acts pointwise on arrays).
    Returns: An instance of LinInterp that represents the optimal operator.
    w is a function defined on the state space.
    """
    w=LinInterp(x,w0)
    vals = np.array([])
    for k in x:
        kmax=F_nb(k,alpha,A)
        vals=np.append(vals,-Utrf(fminbound(Utrf, 0, kmax,args=[kmax,k,sigma,w]),kmax,k,sigma,w))
    return vals

@nb.jit('f8[:](f8[:],f8[:])')
def policynb(x,w0):
    """The approximate Bellman operator.
    Parameters: w is a LinInterp object (i.e., a 
    callable object which acts pointwise on arrays).
    Returns: An instance of LinInterp that represents the optimal operator.
    w is a function defined on the state space.
    """
    w=LinInterp(x,w0)
    vals = np.array([])
    for k in x:
        kmax=F_nb(k,alpha,A)
        vals=np.append(vals,fminbound(Utrf, 0, kmax,args=[kmax,k,sigma,w]))
    return vals

@nb.jit('f8(f8[:],f8[:])')
def errnb(V0,V1):
    maxi=0.0
    for i in range(V0.shape[0]):
        m=abs(V1[i]-V0[i])
        if m>=maxi:
            maxi=m
    return maxi

@nb.jit('f8[:]()')
def solvenb():
    count=0.0
    V0=U(grid,sigma)
    while count<maxiter:
        V1=bellmannb(grid,V0)
        err=errnb(V1,V0)
        V0=V1
        count+=1
        if err<tol:
            print(count)
            break
    return V0

U0=U(grid,sigma)
U0=bellmannb(grid,U0)
c0=policynb(grid,U0)
%timeit bellmannb(grid,U0)
%timeit policynb(grid,U0)
%timeit solvenb()
1 loops, best of 3: 645 ms per loop
1 loops, best of 3: 513 ms per loop
In [404]:
plt.plot(grid,V0(grid),label='V0')
plt.plot(grid,U0)
plt.legend()
Out[404]:
<matplotlib.legend.Legend at 0x113768f90>
In [405]:
plt.plot(grid,C0(grid),label='C0')
plt.plot(grid,c0)
plt.legend()
Out[405]:
<matplotlib.legend.Legend at 0x10c821a10>
In [446]:
@nb.jit('f8(f8[:],f8[:])')
def errnb(V0,V1):
    maxi=0.0
    for i in range(V0.shape[0]):
        m=abs(V1[i]-V0[i])
        if m>=maxi:
            maxi=m
    return maxi

@nb.jit('f8[:]()')
def solvenb():
    count=0.0
    V0=U(grid,sigma)
    while count<maxiter:
        V1=bellmannb(grid,V0)
        err=errnb(V1,V0)
        V0=V1
        count+=1
        if err<tol:
            print(count)
            break
    return V0
%timeit solvenb()
138.0
138.0
138.0
138.0
1 loops, best of 3: 1min 18s per loop