PyHEP Numba tutorial on February 3, 2021¶

Simple stuff¶

Even if you've used Numba before, let's start with the basics.

In fact, let's start with NumPy itself.

NumPy¶

In [ ]:

import numpy as np

NumPy accelerates Python code by replacing loops in Python's virtual machine (with type-checks at runtime) with precompiled loops that transform arrays into arrays.

In [ ]:

data = np.arange(1000000)
data

In [ ]:

%%timeit

output = np.empty(len(data))

for i, x in enumerate(data):
    output[i] = x**2

In [ ]:

%%timeit

output = data**2

But if you have to compute a complex formula, NumPy would have to make an array for each intermediate step.

(There are tricks for circumventing this, but we won't get into that.)

In [ ]:

energy = np.random.normal(100, 10, 1000000)
px = np.random.normal(0, 10, 1000000)
py = np.random.normal(0, 10, 1000000)
pz = np.random.normal(0, 10, 1000000)

In [ ]:

%%timeit

mass = np.sqrt(energy**2 - px**2 - py**2 - pz**2)

The above is equivalent to

In [ ]:

%%timeit

tmp1 = energy**2
tmp2 = px**2
tmp3 = py**2
tmp4 = pz**2
tmp5 = tmp1 - tmp2
tmp6 = tmp5 - tmp3
tmp7 = tmp6 - tmp4
mass = np.sqrt(tmp7)

Numba¶

(I always mistype "numba" as "numpy"...)

In [ ]:

import numba as nb

Numba lets us compile a function to compute a whole formula in one step.

@nb.jit

means "JIT-compile" and

@nb.njit

means "really JIT-compile" because the original has a fallback mode that's getting deprecated. If we're using Numba at all, we don't want it to fall back to ordinary Python.

In [ ]:

@nb.njit
def compute_mass(energy, px, py, pz):
    mass = np.empty(len(energy))
    for i in range(len(energy)):
        mass[i] = np.sqrt(energy[i]**2 - px[i]**2 - py[i]**2 - pz[i]**2)
    return mass

The compute_mass function is now "ready to be compiled." It will be compiled when we give it arguments, so that it can propagate types.

In [ ]:

compute_mass(energy, px, py, pz)

In [ ]:

%%timeit

mass = compute_mass(energy, px, py, pz)

(Note to self: show fastmath and parallel.)

Entering the JIT'ed code¶

What's wrong with the following?

In [ ]:

@nb.njit
def compute_mass_i(energy_i, px_i, py_i, pz_i):
    return np.sqrt(energy_i**2 - px_i**2 - py_i**2 - pz_i**2)

compute_mass_i(energy[0], px[0], py[0], pz[0])

In [ ]:

%%timeit

mass = np.empty(len(energy))
for i in range(len(energy)):
    mass[i] = compute_mass_i(energy[i], px[i], py[i], pz[i])

What do you think about this one?

In [ ]:

@nb.njit
def compute_mass_arrays(energy, px, py, pz):
    return np.sqrt(energy**2 - px**2 - py**2 - pz**2)

compute_mass_arrays(energy, px, py, pz)

In [ ]:

%%timeit

mass = compute_mass_arrays(energy, px, py, pz)

(Note to self: show nb.vectorize.)

Type considerations¶

Dynamically typed programs that are not statically typed¶

Much of Python's flexibility comes from the fact that it does not need to know the types of all variables before it starts to run.

That dynamism makes it easier to express complex logic (what I call "bookkeeping"), but it is a hurdle for speed. Dynamic typing was the first thing to go in Didier Verna's How to make LISP go faster than C.

In [ ]:

def perfectly_valid_script(data):
    output = np.empty(len(data))
    for i, group in enumerate(data):
        minimum = "nothing yet"
        for x in group:
            if minimum == "nothing yet":
                minimum = x
            elif x < minimum:
                minimum = x
        output[i] = minimum
    return output

In [ ]:

data = np.random.normal(2, 1, (100000, 3))
data

In [ ]:

perfectly_valid_script(data)

In [ ]:

invalid_for_numba = nb.njit(perfectly_valid_script)

In [ ]:

invalid_for_numba(data)

How can we fix it up?

What is "type unification?"¶

Consider the following:

In [ ]:

def another_valid_script(data):
    output = np.empty(len(data))
    for i, group in enumerate(data):
        total = np.sum(group)
        if total < 0:
            return "we don't like negative sums"
        else:
            output[i] = total
    return output

In [ ]:

another_valid_script(data[:10])

In [ ]:

invalid_for_numba = nb.njit(another_valid_script)

In [ ]:

invalid_for_numba(data)

Does it matter whether we limit data to the first 10 elements?

How can we fix it up?

Avoid lists and dicts¶

Although Numba developers are doing a lot of work on supporting Python list and dict (of identically typed contents), I find it to be too easy to run into unsupported cases. The main problem is that their contents must be typed, but the Python language didn't have ways to express types.

(Yes, Python has type annotations now, but they're not fully integrated into Numba yet.)

Let's start with something that works and make small changes to it.

In [ ]:

@nb.njit
def output_a_list(data):
    output = []
    for group in data:
        total = 0.0
        for x in group:
            total += x
        output.append(total)
    return output

In [ ]:

data = np.random.normal(2, 1, (10, 3))
data

In [ ]:

output_a_list(data)

Closures versus arguments¶

In Python, you can create functions at runtime, and these functions can reference data defined outside of the function.

In [ ]:

accumulate = nb.typed.List([])

def yet_another_valid_script(data):
    for group in data:
        total = 0.0
        for x in group:
            total += x
        accumulate.append(total)

In [ ]:

accumulate

In [ ]:

yet_another_valid_script(data)

In [ ]:

accumulate

In [ ]:

try_it_in_numba = nb.njit(yet_another_valid_script)

In [ ]:

try_it_in_numba(data)

In [ ]:

accumulate

Functions calling functions¶

The @nb.njit decorator brackets a region of code to make fast. But that doesn't mean you need to write monolithic functions; JIT'ed functions can be rapidly executed within other JIT'ed functions.

In [ ]:

@nb.njit
def find_minimum(group):
    out = None
    for x in group:
        if out is None:
            out = x
        elif x < out:
            out = x
    return out

In [ ]:

@nb.njit
def run_over_all(data):
    out = np.empty(len(data))
    for i, group in enumerate(data):
        out[i] = find_minimum(group)
    return out

In [ ]:

data = np.random.normal(2, 1, (1000000, 3))
data

In [ ]:

run_over_all(data)

In [ ]:

%%timeit

run_over_all(data)

Functional functions¶

A JIT'ed function can also be passed into a JIT'ed function as an argument. Same constraints apply.

In [ ]:

@nb.njit
def run_over_all(do_what, data):
    out = np.empty(len(data))
    for i, group in enumerate(data):
        out[i] = do_what(group)
    return out

In [ ]:

run_over_all(find_minimum, data)

In [ ]:

%%timeit

run_over_all(find_minimum, data)

What about functions returning functions?

In [ ]:

@nb.njit
def function_returning_a_function():
    def what_the_heck(x):
        return x + 1
    return what_the_heck

In [ ]:

f = function_returning_a_function()
f

In [ ]:

f(10)

But the reason you might want to do this—to make a closure—doesn't work, so I consider this as a curiosity.

In [ ]:

@nb.njit
def function_returning_a_function(y):
    def what_the_heck(x):
        return x + y
    return what_the_heck

In [ ]:

function_returning_a_function(1)

What about classes?¶

Numba has a whole system for annotating classes to be used in JIT'ed functions: @nb.jitclass (see documentation).

However, I rarely use it—you find yourself having to annotate more and more until it doesn't look like Python. The value of Numba is that you can develop quickly without leaving the Python environment. When you're doing something so complex as to need classes, you might want to write a C++ program bound to Python with pybind11 or ROOT.

But here's a simple one:

In [ ]:

@nb.experimental.jitclass([
    ("E", nb.float64),
    ("px", nb.float64),
    ("py", nb.float64),
    ("pz", nb.float64),
])
class Lorentz:
    def __init__(self, E, px, py, pz):
        self.E = E
        self.px = px
        self.py = py
        self.pz = pz

    @property
    def mass(self):
        return np.sqrt(self.E**2 - self.px**2 - self.py**2 - self.pz**2)

@nb.njit
def add_Lorentz(one, two):
    return Lorentz(one.E + two.E, one.px + two.px, one.py + two.py, one.pz + two.pz)

In [ ]:

objects = nb.typed.List([Lorentz(E, px, py, pz) for E, px, py, pz in zip(
    np.random.normal(100, 10, 10000),
    np.random.normal(0, 10, 10000),
    np.random.normal(0, 10, 10000),
    np.random.normal(0, 10, 10000),
)])

In [ ]:

@nb.njit
def calculate_masses(one, two):
    out = np.empty(len(one))
    for i in range(len(one)):
        out[i] = add_Lorentz(one[i], two[i]).mass
    return out

In [ ]:

calculate_masses(objects[:5000], objects[5000:])

It works, but small deviations from this example hit pickling errors (because Numba passes class objects into low-level ↔ Python conversions by pickling them) and other errors deep in the Numba internals.

Iteration over objects easier (and faster) with Awkward Array (see below), which will have Lorentz vector objects defined by the Vector library (in development).

Auto-generating functions¶

The biggest strength of JIT-compilers is that you can generate the code to compile after you know a lot about the problem.

In C++, this is the realm of templates and compile-time metaprogramming. In Numba, it can be an f-string:

In [ ]:

args = [
    np.random.normal(0, 1, 1000000),
    np.random.normal(0, 1, 1000000),
    np.random.normal(0, 1, 1000000),
    np.random.normal(0, 1, 1000000),
    np.random.normal(0, 1, 1000000),
]

code = f"""
@nb.vectorize
def sum_in_quadrature({", ".join("p%d" % i for i in range(len(args)))}):
    # compiled for exactly {len(args)} arguments
    return np.sqrt({" + ".join("p%d**2" % i for i in range(len(args)))})
"""

print(code)

In [ ]:

exec(code)

In [ ]:

sum_in_quadrature(*args)

The point is that you have the full power of Python at your disposal to examine your data and generate code before compiling it, and you don't have to hack together scripts to make source code files before you can use them.

Numba has a @nb.generated_jit decorator for specializing a function for specific types.

In [ ]:

@nb.generated_jit
def add_unmasked(data, mask):
    # At this level, data and mask are Numba *types* in plain Python.
    if isinstance(mask, nb.types.NoneType):
        def implementation(data, mask):
            # At this level, data and mask are *values* in a JIT'ed context.
            return np.sum(data)
        return implementation
    else:
        def implementation(data, mask):
            # The names "data" and "mask" have to match the outer function.
            result = 0
            for d, m in zip(data, mask):
                if not m:
                    result += d
            return result
        return implementation

In [ ]:

data = np.random.normal(0, 1, 1000000)
mask = np.random.randint(0, 2, 1000000).view(np.bool_)
data, mask

In [ ]:

add_unmasked(data, mask)

In [ ]:

add_unmasked(data, None)

Debugging JIT'ed functions¶

You might argue that my examples are too simple, but that's the best way to use Numba: by merging complex logic into a procedure bit by bit. You only benefit from the "JIT" aspect if you're building it up interactively—in a Python terminal, iPython, or a Jupyter notebook.

If your workflow is "edit script, save, run from the beginning, edit script again," then you might as well be using a non-JIT compiler, like C++.

You can figure out how to write a function in one environment and then copy it into the script you'll be submitting to the GRID, you know.

Recommendations:

If you're still figuring out the logic of what you want to compute, write pure Python functions.
If you have an algorithm and want to make it fast, enclose it in a function or a few functions and put @nb.njit at the tops of just those functions.
If that doesn't work, break it down into small pieces and try to @nb.njit each one individually.
There's documentation on using Numba in gdb, but note that print works:

In [ ]:

@nb.njit
def do_stuff(data):
    for group in data:
        print(group)

In [ ]:

data = np.random.normal(0, 1, (1000000, 3))
data

In [ ]:

do_stuff(data[:10])    # not too many!

Personally, I wouldn't bother with gdb. I'd just use print.

Awkward Arrays in Numba¶

The Awkward Array library is for doing array-at-a-time, precompiled calculations like NumPy, but on complex data structures. However, it also works well with Numba's element-at-a-time programming model because

it gets complex data structures into Numba way more easily than @nb.jitclass,
those data structures don't have to be "boxed" and "unboxed" into Python objects (e.g. Python's 28-byte structs for passing around each 8-byte number),
some problems are easier in array-at-a-time programming, others are easier in element-at-a-time programming.

Also, Uproot makes Awkward Arrays, so it's an easy data source to access in HEP.

Awkward Arrays as arguments¶

In [ ]:

import uproot
import skhep_testdata

In [ ]:

events = uproot.open(skhep_testdata.data_path("uproot-HZZ-objects.root"))["events"]
events.show()

In [ ]:

muons = events["muonp4"].array()
muons

In [ ]:

import particle
from hepunits import MeV, GeV

# Get the Z mass Pythonically!
z_mass = particle.Particle.from_string("Z").mass * MeV / GeV

@nb.njit
def single_mass(p1, p2):
    mass2 = ((p1.fE + p2.fE)**2
           - (p1.fP.fX + p2.fP.fX)**2 - (p1.fP.fY + p2.fP.fY)**2 - (p1.fP.fZ + p2.fP.fZ)**2)
    if mass2 > 0:
        return np.sqrt(mass2)
    else:
        return 0

@nb.njit
def calculate_masses(groups_of_particles):
    masses = np.empty(len(groups_of_particles))
    
    # Loop over groups of variable numbers of particles in each event.
    for num, group in enumerate(groups_of_particles):
        best = None
        # Consider all unique pairs of particles in each group.
        for i in range(len(group)):
            for j in range(i + 1, len(group)):
                # Compute the mass of each pair of particles.
                mass = single_mass(group[i], group[j])
                # If it's closer to the Z mass, keep it.
                if best is None:
                    best = mass
                elif abs(mass - z_mass) < abs(best - z_mass):
                    best = mass
        if best is None:
            masses[num] = 0
        else:
            masses[num] = best
    return masses

In [ ]:

calculate_masses(muons)

In [ ]:

calculate_masses(events["jetp4"].array())

Incidentally, this is how you would do the same thing with array-at-a-time functions in Awkward Array.

In [ ]:

import awkward as ak

In [ ]:

# Use mostly the same mass function, with "np.where" replacing "if".
def vectorized_mass(p1, p2):
    mass2 = ((p1.fE + p2.fE)**2
           - (p1.fP.fX + p2.fP.fX)**2 - (p1.fP.fY + p2.fP.fY)**2 - (p1.fP.fZ + p2.fP.fZ)**2)
    return np.where(mass2 > 0, np.sqrt(mass2), 0)

def calculate_masses_awkward(groups_of_particles):
    # Get the first and second in each unique pair of particles.
    first, second = ak.unzip(ak.combinations(groups_of_particles, 2))

    # Apply the function to each pair.
    nested_masses = vectorized_mass(first, second)

    # Determine how we're going to pick our favorite pair.
    objective_metric = abs(nested_masses - z_mass)
    selector = ak.argmin(objective_metric, axis=1, keepdims=True)

    # Apply it to the nested_masses, replace "None" with "0", and flatten the output.
    return ak.fill_none(nested_masses[selector], 0)[:, 0]

In [ ]:

calculate_masses_awkward(muons)

The method you choose depends on what seems more natural to you, what's quicker to type, and perhaps what has better performance overall.

Generally speaking, element-at-a-time calculations on Awkward Arrays with Numba

will be faster than array-at-a-time calculations without Numba,
will use less memory (e.g. iterating over combinations rather than creating an array of all combinations),
will require more lines of code,
will require more fiddling to get the types right.

It's also a good technique to use Numba to make arrays of booleans or arrays of integers to slice an Awkward Array.

In [ ]:

@nb.njit
def has_Z_boson(group):
    for i in range(len(group)):
        for j in range(i + 1, len(group)):
            mass = single_mass(group[i], group[j])
            if 40 < mass < 120:
                return True
    return False

@nb.njit
def mask_for_Z_bosons(groups_of_particles):
    mask = np.empty(len(groups_of_particles), np.bool_)
    for num, group in enumerate(groups_of_particles):
        mask[num] = has_Z_boson(group)
    return mask

In [ ]:

mask = mask_for_Z_bosons(muons)
mask

In [ ]:

muons

In [ ]:

muons[mask]

Awkward Arrays as return values¶

Using complex data structures as input is a very different thing from creating them as output.

When using them as input, you have an object that you can extract elements from with square brackets, pick out fields with a dot, or iterate over with a for loop.

If you want to output a part of that input, it's still not a problem.

In [ ]:

@nb.njit
def first_with_Z_boson(groups_of_particles):
    for group in groups_of_particles:
        if has_Z_boson(group):
            return group
    return None

In [ ]:

single_event = first_with_Z_boson(muons)
single_event

In [ ]:

single_event.tolist()

But if you want to make a whole new structure that didn't exist before, that's more difficult because you need a syntax for constructing it.

We've seen the difficulties of making lists (nb.typed.List), dicts (nb.typed.Dict), and class instances (nb.jitclass), because Numba's world must be statically typed and the Python has traditionally been a dynamically typed language.

Awkward Array has a facility for building structures: ak.ArrayBuilder.

In [ ]:

builder = ak.ArrayBuilder()
builder

In [ ]:

with builder.list():
    builder.integer(1)
    builder.real(2.2)
    builder.string("three")

builder

In [ ]:

builder = ak.ArrayBuilder()

In [ ]:

with builder.record():
    builder.field("x").integer(1)
    builder.field("y").real(2.2)
    builder.field("z").string("three")

builder

An ArrayBuilder is not an Awkward Array, but snapshot() will turn it into one.

In [ ]:

builder.snapshot()

Here's how they differ:

Awkward Arrays are immutable: arrays can be transformed onto new arrays, but the original arrays can't be changed in place.
ArrayBuilders are append-only: new data can be added onto the end, but not into the middle.
ArrayBuilder types start specific and get more general, as needed.

In [ ]:

builder = ak.ArrayBuilder()

In [ ]:

ak.type(builder)

In [ ]:

builder.integer(1)

In [ ]:

ak.type(builder)

In [ ]:

builder.integer(2)

In [ ]:

ak.type(builder)

In [ ]:

builder.real(3.3)

In [ ]:

ak.type(builder)

In [ ]:

with builder.list():
    pass

In [ ]:

ak.type(builder)

In [ ]:

with builder.list():
    builder.integer(4)

In [ ]:

ak.type(builder)

ArrayBuilders can be used in Numba JIT-compiled functions, which makes it the easiest way to make data structures in these functions:

In [ ]:

@nb.njit
def is_part_of_Z_boson(groups_of_particles, builder):
    for group in groups_of_particles:
        builder.begin_list()
        for i in range(len(group)):
            good_Z = False
            for j in range(i + 1, len(group)):
                mass = single_mass(group[i], group[j])
                if 40 < mass < 120:
                    good_Z = True
            builder.boolean(good_Z)
        builder.end_list()
    return builder

In [ ]:

nested_booleans = is_part_of_Z_boson(muons, ak.ArrayBuilder())
nested_booleans

In [ ]:

nested_booleans.snapshot()

In [ ]:

muons

In [ ]:

muons[nested_booleans.snapshot()]

There are some limitations:

Numba doesn't support with statements in @nb.njit mode (check their supported Python features page to see if this has changed).
- with builder.list() → builder.begin_list() and builder.end_list()
- with builder.record() → builder.begin_record() and builder.end_record()
- etc.
You can't create an ArrayBuilder inside the JIT-compiled function; pass it in.
You can't snapshot() the ArrayBuilder inside the JIT-compiled function; do that outside.
Because ArrayBuilder is not statically typed, it is slower than any equivalent NumPy.

In [ ]:

@nb.njit
def mask_for_Z_bosons_NumPy(groups_of_particles):
    mask = np.empty(len(groups_of_particles), np.bool_)
    for num, group in enumerate(groups_of_particles):
        mask[num] = has_Z_boson(group)
    return mask

@nb.njit
def mask_for_Z_bosons_ArrayBuilder(groups_of_particles, builder):
    for group in groups_of_particles:
        builder.append(has_Z_boson(group))
    return builder

In [ ]:

%%timeit

mask_for_Z_bosons_NumPy(muons)

In [ ]:

%%timeit

mask_for_Z_bosons_ArrayBuilder(muons, ak.ArrayBuilder())

(We're working on a TypedArrayBuilder to take advantage of static typing.)

Also useful: you can append parts of an Awkward Array into an ArrayBuilder (which references it like a pointer, so it doesn't matter how complex it is).

In [ ]:

@nb.njit
def muons_that_are_part_of_Z_boson(groups_of_particles, builder):
    for group in groups_of_particles:
        builder.begin_list()
        for i in range(len(group)):
            good_Z = False
            for j in range(i + 1, len(group)):
                mass = single_mass(group[i], group[j])
                if 40 < mass < 120:
                    good_Z = True
            # Instead of inserting a boolean, conditionally insert the original particle record.
            builder.append(group[i])
        builder.end_list()
    return builder

In [ ]:

selected_muons = muons_that_are_part_of_Z_boson(muons, ak.ArrayBuilder())
selected_muons.snapshot()

Any questions?¶

That's all I've prepared. We can use the space below for free-form questions.