Tutorial on JIT versus Scipy execution within Parcels

This tutorial is meant to highlight the potentially very big difference between the computational time required to run Parcels in JIT (Just-In-Time compilation) versus in Scipy mode. It also discusses how to more efficiently sample in Scipy mode.

Short summary: JIT is faster than scipy

In the code snippet below, we use AdvectionRK4 to advect 100 particles in the peninsula FieldSet. We first do it in JIT mode (by setting ptype=JITParticle in the declaration of pset) and then we also do it in Scipy mode (by setting ptype=ScipyParticle in the declaration of pset).

In both cases, we advect the particles for 1 hour, with a timestep of 30 seconds.

To measure the computational time, we use the timer module.

In [1]:
from parcels import FieldSet, ParticleSet, JITParticle, ScipyParticle
from parcels import AdvectionRK4
from parcels import timer
from datetime import timedelta as delta

timer.root = timer.Timer('root')

timer.fieldset = timer.Timer('fieldset creation', parent=timer.root)
fieldset = FieldSet.from_parcels('Peninsula_data/peninsula', allow_time_extrapolation=True)
timer.fieldset.stop()

ptype = {'scipy': ScipyParticle, 'jit': JITParticle}
ptimer = {'scipy': timer.Timer('scipy', parent=timer.root, start=False),
          'jit': timer.Timer('jit', parent=timer.root, start=False)}

for p in ['scipy', 'jit']:
    pset = ParticleSet.from_line(fieldset=fieldset, pclass=ptype[p], size=100, start=(3e3, 3e3), finish=(3e3, 45e3))

    ptimer[p].start()
    pset.execute(AdvectionRK4, runtime=delta(hours=1), dt=delta(seconds=30))
    ptimer[p].stop()

timer.root.stop()
timer.root.print_tree()
INFO: Compiled JITParticleAdvectionRK4 ==> /var/folders/r2/8593q8z93kd7t4j9kbb_f7p00000gr/T/parcels-504/fa40f24fcc10fceffaa228a76b22051d_0.so
(100%)  Timer root                       : 6.823e+00 s
(  1%)    (  1%) Timer fieldset creation : 8.873e-02 s
( 87%)    ( 87%) Timer scipy             : 5.910e+00 s
( 12%)    ( 12%) Timer jit               : 8.232e-01 s

As you can see above, even in this very small example Scipy mode took almost 8 times as long (6.0 seconds versus 0.7 seconds) as the JIT mode. For larger examples, this can grow to hundreds of times slower.

This is just an illustrative example, depending on the number of calls to AdvectionRK4, the size of the FieldSet, the size of the pset, the ratio between dt and outputdt in the .execute etc, the difference between JIT and Scipy can vary significantly. However, JIT will almost always be faster!

So why does Parcels support both JIT and Scipy mode then? Because Scipy is easier to debug when writing custom kernels, so can provide faster development of new features.

As an aside, you may wonder why we use the time.time module, and not the timeit module, to time the runs above. That's because it affects the AST of the kernels, causing errors in JIT mode.

Further digging into Scipy mode: adding particle keyword to Field-sampling

Sometimes, you'd want to run Parcels in Scipy mode anyways. In that case, there are ways to make Parcels a bit faster.

As background, one of the most computationally expensive operations in Parcels is the Field Sampling. In the default sampling in Scipy mode, we don't keep track of where in the grid a particle is; which means that for every sampling call, we need to again search for which grid cell a particle is in.

Let's see how this works in the simple Peninsula FieldSet used above. We use a simple Euler-Forward Advection now to make the point. In particular, we use two types of Advection Kernels

In [2]:
def AdvectionEE_depth_lat_lon_time(particle, fieldset, time):
    (u1, v1) = fieldset.UV[time, particle.depth, particle.lat, particle.lon]
    particle.lon += u1 * particle.dt
    particle.lat += v1 * particle.dt

def AdvectionEE_depth_lat_lon_time_particle(particle, fieldset, time):
    (u1, v1) = fieldset.UV[time, particle.depth, particle.lat, particle.lon, particle]  # note the extra particle argument here
    particle.lon += u1 * particle.dt
    particle.lat += v1 * particle.dt

kernels = {'dllt': AdvectionEE_depth_lat_lon_time, 'dllt_p': AdvectionEE_depth_lat_lon_time_particle}
In [3]:
timer.root = timer.Timer('root')
ptimer = {'dllt': timer.Timer('dllt', parent=timer.root, start=False),
          'dllt_p': timer.Timer('dllt_p', parent=timer.root, start=False)}

for k in ['dllt', 'dllt_p']:
    pset = ParticleSet.from_line(fieldset=fieldset, pclass=ScipyParticle, size=100, start=(3e3, 3e3), finish=(3e3, 45e3))

    ptimer[k].start()
    pset.execute(kernels[k], runtime=delta(hours=1), dt=delta(seconds=30))
    ptimer[k].stop()

timer.root.stop()
timer.root.print_tree()
(100%)  Timer root                       : 5.533e+00 s
( 50%)    ( 50%) Timer dllt              : 2.750e+00 s
( 50%)    ( 50%) Timer dllt_p            : 2.781e+00 s

You will see that the two kernels don't really differ in speed. That is because the Peninsula FieldSet is a simple Rectilinear Grid, where indexing a particle location to the grid is very fast.

However, the difference is much more pronounced if we use a Curvilinear Grid like in the NEMO dataset.

In [4]:
data_path = 'NemoCurvilinear_data/'
filenames = {'U': {'lon': data_path + 'mesh_mask.nc4',
                   'lat': data_path + 'mesh_mask.nc4',
                   'data': data_path + 'U_purely_zonal-ORCA025_grid_U.nc4'},
             'V': {'lon': data_path + 'mesh_mask.nc4',
                   'lat': data_path + 'mesh_mask.nc4',
                   'data': data_path + 'V_purely_zonal-ORCA025_grid_V.nc4'}}
variables = {'U': 'U', 'V': 'V'}
dimensions = {'lon': 'glamf', 'lat': 'gphif', 'time': 'time_counter'}
fieldset = FieldSet.from_nemo(filenames, variables, dimensions, allow_time_extrapolation=True)
WARNING: Casting lon data to np.float32
WARNING: Casting lat data to np.float32
WARNING: Casting depth data to np.float32
WARNING: Trying to initialize a shared grid with different chunking sizes - action prohibited. Replacing requested field_chunksize with grid's master chunksize.
In [5]:
timer.root = timer.Timer('root')
ptimer = {'dllt': timer.Timer('dllt', parent=timer.root, start=False),
          'dllt_p': timer.Timer('dllt_p', parent=timer.root, start=False)}

for k in ['dllt', 'dllt_p']:
    pset = ParticleSet.from_line(fieldset=fieldset, pclass=ScipyParticle, size=10, start=(45, 40), finish=(60, 40))

    ptimer[k].start()
    pset.execute(kernels[k], runtime=delta(days=10), dt=delta(hours=6))
    ptimer[k].stop()

timer.root.stop()
timer.root.print_tree()
(100%)  Timer root                       : 9.748e+00 s
( 96%)    ( 96%) Timer dllt              : 9.341e+00 s
(  4%)    (  4%) Timer dllt_p            : 4.055e-01 s

Now, the difference is massive, with the AdvectionEE_depth_lat_lon_time_particle kernel more than 20 times faster than the kernel without the particle argument at the end of the Field sampling operation.

So, if you want to run in Scipy mode, make sure to add particle at the end of your Field sampling!