Numba 0.52.0 Release Demo

This notebook contains a demonstration of new features present in the 0.52.0 release of Numba. Whilst release notes are produced as part of the CHANGE_LOG, there's nothing like seeing code in action!

This release contains a few new features, but it's mainly internals that have changed, with a particular focus on increasing run time performance! In this notebook the new CPU target features are demonstrated. The CUDA target also gained a lot of new features in 0.52.0 and @gmarkall has created a demo notebook especially for these!

Key internal changes:

  • Intel kindly sponsored the development of an LLVM level reference count pruning compiler pass. This reduces pressure on the atomic locks used for reference counting in Numba and exposes a lot more inlining/optimisation opportunities (@sklam). This change has a large impact on performance and so has its own notebook to help users understand what it's doing!
  • Intel also sponsored work to improve the performance of the numba.typed.List container (@stuartarchibald).
  • The optimisers in Numba have been lightly tuned and can now do more (@stuartarchibald).

Highlights of core feature changes:

  • The inspect_cfg method on the JIT dispatcher object has been significantly enhanced (@stuartarchibald).
  • NumPy 1.19 support is added (@stuartarchibald).
  • A few new NumPy features have been added along with some extensions to existing support.

Demonstrations of new features/changes:

First, import the necessary from Numba and NumPy...

In [ ]:
from numba import jit, njit, config, __version__, errors
from numba.extending import overload
import numba
import numpy as np
assert numba.version_info.short >= (0, 52)

Performance improvement demonstration

The performance of Numba JIT compiled functions is improved in quite a few important cases in 0.52. First, as mentioned above, this notebook demonstrates the impact of the reference count pruning compiler pass, alternatively, just try 0.52.0 with your existing code and see if it makes a difference! Second, there have been some specific improvements, demonstrating a couple of them:

Calling str(<int>)

In [ ]:
@njit
def str_on_int(n):
    c = 0
    for i in range(n):
        c += len(str(n))
    return c


sz = 100000
str_on_int(sz)
%timeit str_on_int.py_func(sz) # python function
%timeit str_on_int(sz) # jit function

Reductions/__getitem__ on typed.List

In [ ]:
# Reductions on typed.List
from numba.typed import List

n = 1000
py_list = [float(x) for x in range(n)]
nb_list = List(py_list)

def sum_list(lst):
    acc = 0.0
    for item in lst:
        acc += item
    return acc

jit_sum_list = njit(sum_list)
fastmath_jit_sum_list = njit(fastmath=True)(sum_list)

%timeit sum_list(py_list) # python function on a python list
%timeit jit_sum_list(nb_list) # JIT function on typed list
%timeit fastmath_jit_sum_list(nb_list) # "fastmath" JIT function on typed list

CFG inspection enhancements

The Numba dispatcher's inspect_cfg() method has been enhanced with colorized output and support for Python code interleaving to provide a more visual way to debug/tune code. For a more advanced demonstration, this feature is used in the notebook explaining the new reference count pruning pass. A quick demonstration of this feature:

In [ ]:
@njit(debug=True) # Switch on debug to make python source available.
def foo(n):
    acc = 0.
    for i in range(n):
        acc += np.sqrt(i)
    if acc > 1000:
        raise ValueError("Error!")
    else:
        return acc

foo(10)

# Take a look at the docstring for all the options, the ones used here are:
# strip_ir = remove LLVM IR apart from calls
# interleave = add Python source into the LLVM CFG!
foo.inspect_cfg(foo.signatures[0], strip_ir=True, interleave=True)

Newly supported NumPy functions/features

This release contains some updates to Numba's NumPy support, mostly contributed by the Numba community (with thanks!):

In [ ]:
@njit
def demo_numpy():
    # np.asfarray
    farray = np.asfarray(np.zeros(4,), dtype=np.int8)
    
    # np.split/np.array_split
    split = np.split(np.arange(10), 5)
    arr_split = np.array_split(np.arange(10), 3)
    arr_contains = 4 in np.arange(10), 11 in np.arange(10)

    # asarray_chkfinite
    caught = False
    try:
        np.asarray_chkfinite((0., np.inf, 1., np.nan,))
    except Exception: # inf and nan not accepted
        caught = True

    # String literal dtypes
    ones, zeros, empty = (np.ones((5,), 'int8'), np.zeros((3,), 'complex128'),
                          np.empty((0,), 'float32'))

    return farray, split, arr_split, arr_contains, caught, ones, zeros, empty
    
farray, split, arr_split, arr_contains, caught, ones, zeros, empty = demo_numpy()

print((f"farray: {farray}\n"
       f"split: {split}\n"
       f"array_split: {arr_split}\n"
       f"array contains: {arr_contains}\n"
       f"caught: {caught}\n"
       f"ones: {ones}\n"
       f"zeros: {zeros}\n"
       f"empty: {empty}\n"))