Even if you've used Numba before, let's start with the basics.
In fact, let's start with NumPy itself.
import numpy as np
NumPy accelerates Python code by replacing loops in Python's virtual machine (with type-checks at runtime) with precompiled loops that transform arrays into arrays.
data = np.arange(1000000)
data
%%timeit
output = np.empty(len(data))
for i, x in enumerate(data):
output[i] = x**2
%%timeit
output = data**2
But if you have to compute a complex formula, NumPy would have to make an array for each intermediate step.
(There are tricks for circumventing this, but we won't get into that.)
energy = np.random.normal(100, 10, 1000000)
px = np.random.normal(0, 10, 1000000)
py = np.random.normal(0, 10, 1000000)
pz = np.random.normal(0, 10, 1000000)
%%timeit
mass = np.sqrt(energy**2 - px**2 - py**2 - pz**2)
The above is equivalent to
%%timeit
tmp1 = energy**2
tmp2 = px**2
tmp3 = py**2
tmp4 = pz**2
tmp5 = tmp1 - tmp2
tmp6 = tmp5 - tmp3
tmp7 = tmp6 - tmp4
mass = np.sqrt(tmp7)
(I always mistype "numba" as "numpy"...)
import numba as nb
Numba lets us compile a function to compute a whole formula in one step.
@nb.jit
means "JIT-compile" and
@nb.njit
means "really JIT-compile" because the original has a fallback mode that's getting deprecated. If we're using Numba at all, we don't want it to fall back to ordinary Python.
@nb.njit
def compute_mass(energy, px, py, pz):
mass = np.empty(len(energy))
for i in range(len(energy)):
mass[i] = np.sqrt(energy[i]**2 - px[i]**2 - py[i]**2 - pz[i]**2)
return mass
The compute_mass
function is now "ready to be compiled." It will be compiled when we give it arguments, so that it can propagate types.
compute_mass(energy, px, py, pz)
%%timeit
mass = compute_mass(energy, px, py, pz)
(Note to self: show fastmath
and parallel
.)
What's wrong with the following?
@nb.njit
def compute_mass_i(energy_i, px_i, py_i, pz_i):
return np.sqrt(energy_i**2 - px_i**2 - py_i**2 - pz_i**2)
compute_mass_i(energy[0], px[0], py[0], pz[0])
%%timeit
mass = np.empty(len(energy))
for i in range(len(energy)):
mass[i] = compute_mass_i(energy[i], px[i], py[i], pz[i])
What do you think about this one?
@nb.njit
def compute_mass_arrays(energy, px, py, pz):
return np.sqrt(energy**2 - px**2 - py**2 - pz**2)
compute_mass_arrays(energy, px, py, pz)
%%timeit
mass = compute_mass_arrays(energy, px, py, pz)
(Note to self: show nb.vectorize
.)
Much of Python's flexibility comes from the fact that it does not need to know the types of all variables before it starts to run.
That dynamism makes it easier to express complex logic (what I call "bookkeeping"), but it is a hurdle for speed. Dynamic typing was the first thing to go in Didier Verna's How to make LISP go faster than C.
def perfectly_valid_script(data):
output = np.empty(len(data))
for i, group in enumerate(data):
minimum = "nothing yet"
for x in group:
if minimum == "nothing yet":
minimum = x
elif x < minimum:
minimum = x
output[i] = minimum
return output
data = np.random.normal(2, 1, (100000, 3))
data
perfectly_valid_script(data)
invalid_for_numba = nb.njit(perfectly_valid_script)
invalid_for_numba(data)
How can we fix it up?
Consider the following:
def another_valid_script(data):
output = np.empty(len(data))
for i, group in enumerate(data):
total = np.sum(group)
if total < 0:
return "we don't like negative sums"
else:
output[i] = total
return output
another_valid_script(data[:10])
invalid_for_numba = nb.njit(another_valid_script)
invalid_for_numba(data)
Does it matter whether we limit data
to the first 10 elements?
How can we fix it up?
Although Numba developers are doing a lot of work on supporting Python list
and dict
(of identically typed contents), I find it to be too easy to run into unsupported cases. The main problem is that their contents must be typed, but the Python language didn't have ways to express types.
(Yes, Python has type annotations now, but they're not fully integrated into Numba yet.)
Let's start with something that works and make small changes to it.
@nb.njit
def output_a_list(data):
output = []
for group in data:
total = 0.0
for x in group:
total += x
output.append(total)
return output
data = np.random.normal(2, 1, (10, 3))
data
output_a_list(data)
In Python, you can create functions at runtime, and these functions can reference data defined outside of the function.
accumulate = nb.typed.List([])
def yet_another_valid_script(data):
for group in data:
total = 0.0
for x in group:
total += x
accumulate.append(total)
accumulate
yet_another_valid_script(data)
accumulate
try_it_in_numba = nb.njit(yet_another_valid_script)
try_it_in_numba(data)
accumulate
The @nb.njit
decorator brackets a region of code to make fast. But that doesn't mean you need to write monolithic functions; JIT'ed functions can be rapidly executed within other JIT'ed functions.
@nb.njit
def find_minimum(group):
out = None
for x in group:
if out is None:
out = x
elif x < out:
out = x
return out
@nb.njit
def run_over_all(data):
out = np.empty(len(data))
for i, group in enumerate(data):
out[i] = find_minimum(group)
return out
data = np.random.normal(2, 1, (1000000, 3))
data
run_over_all(data)
%%timeit
run_over_all(data)
A JIT'ed function can also be passed into a JIT'ed function as an argument. Same constraints apply.
@nb.njit
def run_over_all(do_what, data):
out = np.empty(len(data))
for i, group in enumerate(data):
out[i] = do_what(group)
return out
run_over_all(find_minimum, data)
%%timeit
run_over_all(find_minimum, data)
What about functions returning functions?
@nb.njit
def function_returning_a_function():
def what_the_heck(x):
return x + 1
return what_the_heck
f = function_returning_a_function()
f
f(10)
But the reason you might want to do this—to make a closure—doesn't work, so I consider this as a curiosity.
@nb.njit
def function_returning_a_function(y):
def what_the_heck(x):
return x + y
return what_the_heck
function_returning_a_function(1)
Numba has a whole system for annotating classes to be used in JIT'ed functions: @nb.jitclass
(see documentation).
However, I rarely use it—you find yourself having to annotate more and more until it doesn't look like Python. The value of Numba is that you can develop quickly without leaving the Python environment. When you're doing something so complex as to need classes, you might want to write a C++ program bound to Python with pybind11 or ROOT.
But here's a simple one:
@nb.experimental.jitclass([
("E", nb.float64),
("px", nb.float64),
("py", nb.float64),
("pz", nb.float64),
])
class Lorentz:
def __init__(self, E, px, py, pz):
self.E = E
self.px = px
self.py = py
self.pz = pz
@property
def mass(self):
return np.sqrt(self.E**2 - self.px**2 - self.py**2 - self.pz**2)
@nb.njit
def add_Lorentz(one, two):
return Lorentz(one.E + two.E, one.px + two.px, one.py + two.py, one.pz + two.pz)
objects = nb.typed.List([Lorentz(E, px, py, pz) for E, px, py, pz in zip(
np.random.normal(100, 10, 10000),
np.random.normal(0, 10, 10000),
np.random.normal(0, 10, 10000),
np.random.normal(0, 10, 10000),
)])
@nb.njit
def calculate_masses(one, two):
out = np.empty(len(one))
for i in range(len(one)):
out[i] = add_Lorentz(one[i], two[i]).mass
return out
calculate_masses(objects[:5000], objects[5000:])
It works, but small deviations from this example hit pickling errors (because Numba passes class objects into low-level ↔ Python conversions by pickling them) and other errors deep in the Numba internals.
Iteration over objects easier (and faster) with Awkward Array (see below), which will have Lorentz vector objects defined by the Vector library (in development).
The biggest strength of JIT-compilers is that you can generate the code to compile after you know a lot about the problem.
In C++, this is the realm of templates and compile-time metaprogramming. In Numba, it can be an f-string:
args = [
np.random.normal(0, 1, 1000000),
np.random.normal(0, 1, 1000000),
np.random.normal(0, 1, 1000000),
np.random.normal(0, 1, 1000000),
np.random.normal(0, 1, 1000000),
]
code = f"""
@nb.vectorize
def sum_in_quadrature({", ".join("p%d" % i for i in range(len(args)))}):
# compiled for exactly {len(args)} arguments
return np.sqrt({" + ".join("p%d**2" % i for i in range(len(args)))})
"""
print(code)
exec(code)
sum_in_quadrature(*args)
The point is that you have the full power of Python at your disposal to examine your data and generate code before compiling it, and you don't have to hack together scripts to make source code files before you can use them.
Numba has a @nb.generated_jit
decorator for specializing a function for specific types.
@nb.generated_jit
def add_unmasked(data, mask):
# At this level, data and mask are Numba *types* in plain Python.
if isinstance(mask, nb.types.NoneType):
def implementation(data, mask):
# At this level, data and mask are *values* in a JIT'ed context.
return np.sum(data)
return implementation
else:
def implementation(data, mask):
# The names "data" and "mask" have to match the outer function.
result = 0
for d, m in zip(data, mask):
if not m:
result += d
return result
return implementation
data = np.random.normal(0, 1, 1000000)
mask = np.random.randint(0, 2, 1000000).view(np.bool_)
data, mask
add_unmasked(data, mask)
add_unmasked(data, None)
You might argue that my examples are too simple, but that's the best way to use Numba: by merging complex logic into a procedure bit by bit. You only benefit from the "JIT" aspect if you're building it up interactively—in a Python terminal, iPython, or a Jupyter notebook.
If your workflow is "edit script, save, run from the beginning, edit script again," then you might as well be using a non-JIT compiler, like C++.
You can figure out how to write a function in one environment and then copy it into the script you'll be submitting to the GRID, you know.
Recommendations:
@nb.njit
at the tops of just those functions.@nb.njit
each one individually.print
works:@nb.njit
def do_stuff(data):
for group in data:
print(group)
data = np.random.normal(0, 1, (1000000, 3))
data
do_stuff(data[:10]) # not too many!
Personally, I wouldn't bother with gdb. I'd just use print
.
The Awkward Array library is for doing array-at-a-time, precompiled calculations like NumPy, but on complex data structures. However, it also works well with Numba's element-at-a-time programming model because
@nb.jitclass
,Also, Uproot makes Awkward Arrays, so it's an easy data source to access in HEP.
import uproot
import skhep_testdata
events = uproot.open(skhep_testdata.data_path("uproot-HZZ-objects.root"))["events"]
events.show()
muons = events["muonp4"].array()
muons
import particle
from hepunits import MeV, GeV
# Get the Z mass Pythonically!
z_mass = particle.Particle.from_string("Z").mass * MeV / GeV
@nb.njit
def single_mass(p1, p2):
mass2 = ((p1.fE + p2.fE)**2
- (p1.fP.fX + p2.fP.fX)**2 - (p1.fP.fY + p2.fP.fY)**2 - (p1.fP.fZ + p2.fP.fZ)**2)
if mass2 > 0:
return np.sqrt(mass2)
else:
return 0
@nb.njit
def calculate_masses(groups_of_particles):
masses = np.empty(len(groups_of_particles))
# Loop over groups of variable numbers of particles in each event.
for num, group in enumerate(groups_of_particles):
best = None
# Consider all unique pairs of particles in each group.
for i in range(len(group)):
for j in range(i + 1, len(group)):
# Compute the mass of each pair of particles.
mass = single_mass(group[i], group[j])
# If it's closer to the Z mass, keep it.
if best is None:
best = mass
elif abs(mass - z_mass) < abs(best - z_mass):
best = mass
if best is None:
masses[num] = 0
else:
masses[num] = best
return masses
calculate_masses(muons)
calculate_masses(events["jetp4"].array())
Incidentally, this is how you would do the same thing with array-at-a-time functions in Awkward Array.
import awkward as ak
# Use mostly the same mass function, with "np.where" replacing "if".
def vectorized_mass(p1, p2):
mass2 = ((p1.fE + p2.fE)**2
- (p1.fP.fX + p2.fP.fX)**2 - (p1.fP.fY + p2.fP.fY)**2 - (p1.fP.fZ + p2.fP.fZ)**2)
return np.where(mass2 > 0, np.sqrt(mass2), 0)
def calculate_masses_awkward(groups_of_particles):
# Get the first and second in each unique pair of particles.
first, second = ak.unzip(ak.combinations(groups_of_particles, 2))
# Apply the function to each pair.
nested_masses = vectorized_mass(first, second)
# Determine how we're going to pick our favorite pair.
objective_metric = abs(nested_masses - z_mass)
selector = ak.argmin(objective_metric, axis=1, keepdims=True)
# Apply it to the nested_masses, replace "None" with "0", and flatten the output.
return ak.fill_none(nested_masses[selector], 0)[:, 0]
calculate_masses_awkward(muons)
The method you choose depends on what seems more natural to you, what's quicker to type, and perhaps what has better performance overall.
Generally speaking, element-at-a-time calculations on Awkward Arrays with Numba
It's also a good technique to use Numba to make arrays of booleans or arrays of integers to slice an Awkward Array.
@nb.njit
def has_Z_boson(group):
for i in range(len(group)):
for j in range(i + 1, len(group)):
mass = single_mass(group[i], group[j])
if 40 < mass < 120:
return True
return False
@nb.njit
def mask_for_Z_bosons(groups_of_particles):
mask = np.empty(len(groups_of_particles), np.bool_)
for num, group in enumerate(groups_of_particles):
mask[num] = has_Z_boson(group)
return mask
mask = mask_for_Z_bosons(muons)
mask
muons
muons[mask]
Using complex data structures as input is a very different thing from creating them as output.
When using them as input, you have an object that you can extract elements from with square brackets, pick out fields with a dot, or iterate over with a for
loop.
If you want to output a part of that input, it's still not a problem.
@nb.njit
def first_with_Z_boson(groups_of_particles):
for group in groups_of_particles:
if has_Z_boson(group):
return group
return None
single_event = first_with_Z_boson(muons)
single_event
single_event.tolist()
But if you want to make a whole new structure that didn't exist before, that's more difficult because you need a syntax for constructing it.
We've seen the difficulties of making lists (nb.typed.List
), dicts (nb.typed.Dict
), and class instances (nb.jitclass
), because Numba's world must be statically typed and the Python has traditionally been a dynamically typed language.
Awkward Array has a facility for building structures: ak.ArrayBuilder.
builder = ak.ArrayBuilder()
builder
with builder.list():
builder.integer(1)
builder.real(2.2)
builder.string("three")
builder
builder = ak.ArrayBuilder()
with builder.record():
builder.field("x").integer(1)
builder.field("y").real(2.2)
builder.field("z").string("three")
builder
An ArrayBuilder is not an Awkward Array, but snapshot()
will turn it into one.
builder.snapshot()
Here's how they differ:
builder = ak.ArrayBuilder()
ak.type(builder)
builder.integer(1)
ak.type(builder)
builder.integer(2)
ak.type(builder)
builder.real(3.3)
ak.type(builder)
with builder.list():
pass
ak.type(builder)
with builder.list():
builder.integer(4)
ak.type(builder)
ArrayBuilders can be used in Numba JIT-compiled functions, which makes it the easiest way to make data structures in these functions:
@nb.njit
def is_part_of_Z_boson(groups_of_particles, builder):
for group in groups_of_particles:
builder.begin_list()
for i in range(len(group)):
good_Z = False
for j in range(i + 1, len(group)):
mass = single_mass(group[i], group[j])
if 40 < mass < 120:
good_Z = True
builder.boolean(good_Z)
builder.end_list()
return builder
nested_booleans = is_part_of_Z_boson(muons, ak.ArrayBuilder())
nested_booleans
nested_booleans.snapshot()
muons
muons[nested_booleans.snapshot()]
There are some limitations:
with
statements in @nb.njit
mode (check their supported Python features page to see if this has changed).with builder.list()
→ builder.begin_list()
and builder.end_list()
with builder.record()
→ builder.begin_record()
and builder.end_record()
snapshot()
the ArrayBuilder inside the JIT-compiled function; do that outside.@nb.njit
def mask_for_Z_bosons_NumPy(groups_of_particles):
mask = np.empty(len(groups_of_particles), np.bool_)
for num, group in enumerate(groups_of_particles):
mask[num] = has_Z_boson(group)
return mask
@nb.njit
def mask_for_Z_bosons_ArrayBuilder(groups_of_particles, builder):
for group in groups_of_particles:
builder.append(has_Z_boson(group))
return builder
%%timeit
mask_for_Z_bosons_NumPy(muons)
%%timeit
mask_for_Z_bosons_ArrayBuilder(muons, ak.ArrayBuilder())
(We're working on a TypedArrayBuilder to take advantage of static typing.)
Also useful: you can append
parts of an Awkward Array into an ArrayBuilder (which references it like a pointer, so it doesn't matter how complex it is).
@nb.njit
def muons_that_are_part_of_Z_boson(groups_of_particles, builder):
for group in groups_of_particles:
builder.begin_list()
for i in range(len(group)):
good_Z = False
for j in range(i + 1, len(group)):
mass = single_mass(group[i], group[j])
if 40 < mass < 120:
good_Z = True
# Instead of inserting a boolean, conditionally insert the original particle record.
builder.append(group[i])
builder.end_list()
return builder
selected_muons = muons_that_are_part_of_Z_boson(muons, ak.ArrayBuilder())
selected_muons.snapshot()
That's all I've prepared. We can use the space below for free-form questions.