This is the first part of a tutorial describing the Iteration/Expression Tree (IET), an intermediate representation based on Abstract Syntax Trees (AST) used in the Devito compiler. In this part we will show how to access and navigate the IET rooted in an Operator.

The reader of this tutorial is expected to be familiar with the basic Devito API (i.e., Grid, Function/TimeFunction, Operator, ...). Otherwise, the CFD tutorials in examples/cfd are a better place to start.

# Part I - Top Down¶

In [1]:
from devito import Eq, Grid, TimeFunction, Operator

grid = Grid(shape=(3, 3))
u = TimeFunction(name='u', grid=grid)
u.data
Out[1]:
Data([[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],

[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]], dtype=float32)

Here, we just defined a time-varying grid with 3x3 points in each of the space Dimensions x and y. u.data gives us access to the actual data values. In particular, u.data[0, :, :] holds the data values at timestep t=0, whereas u.data[1, :, :] holds the data values at timestep t=1.

We now create an Operator that increments by 1 all points in the computational domain. Note: User should not change to configuration['openmp'] = 1 as the IET structure changes with OpenMP enabled.

In [2]:
from devito import configuration
configuration['openmp'] = 0  # We don't care about OpenMP in this example

eq = Eq(u.forward, u+1)
op = Operator(eq)

An Operator is an IET node that can generate, JIT-compile, and run low-level code (e.g., C). Just like all other types of IET nodes, it's got a number of metadata attached. For example, we can query an Operator to retrieve the user-provided SymPy expressions.

In [3]:
op.args['expressions']
Out[3]:
Eq(u(t + dt, x, y), u(t, x, y) + 1)

If we print op, we can see how the generated code looks like.

In [4]:
print(op)
#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
#include "sys/time.h"
#include "xmmintrin.h"
#include "pmmintrin.h"

struct dataobj
{
void *restrict data;
int * size;
int * npsize;
int * dsize;
int * hsize;
int * hofs;
int * oofs;
} ;

struct profiler
{
double section0;
} ;

int Kernel(struct dataobj *restrict u_vec, const int time_M, const int time_m, struct profiler * timers, const int x_M, const int x_m, const int y_M, const int y_m)
{
float (*restrict u)[u_vec->size[1]][u_vec->size[2]] __attribute__ ((aligned (64))) = (float (*)[u_vec->size[1]][u_vec->size[2]]) u_vec->data;
/* Flush denormal numbers to zero in hardware */
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
{
struct timeval start_section0, end_section0;
gettimeofday(&start_section0, NULL);
for (int x = x_m; x <= x_M; x += 1)
{
#pragma omp simd
for (int y = y_m; y <= y_M; y += 1)
{
u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 1;
}
}
gettimeofday(&end_section0, NULL);
timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
}
return 0;
}

As shown in other tutorials, we can JIT-compile and run op by executing:

In [5]:
op.apply(time=2)
u.data
Operator Kernel run in 0.00 s
Out[5]:
Data([[[2., 2., 2.],
[2., 2., 2.],
[2., 2., 2.]],

[[3., 3., 3.],
[3., 3., 3.],
[3., 3., 3.]]], dtype=float32)

This is the first time we are calling apply on op, so the Operator code gets written in a .c file and compiled into a library (a .so shared object if on Linux).

op runs with the user-provided time_M=2 and with the default arguments u=u, x_m=0, x_M=2, y_m=0, y_M=2, and time_m=0. The indices *_m and *_M represent the minimum and maximum iteration points along Dimension *.

op can be used for a completely different run, for example providing a new TimeFunction.

In [6]:
u2 = TimeFunction(name='u', grid=grid)
u2.data
Out[6]:
Data([[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],

[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]], dtype=float32)

Any of the Operator default arguments may be replaced by passing suitable key-value parameters.

In [7]:
op.apply(u=u2, x_m=1, x_M=2, y_m=0, y_M=1, time_M=3)
u2.data
Operator Kernel run in 0.00 s
Out[7]:
Data([[[0., 0., 0.],
[4., 4., 0.],
[4., 4., 0.]],

[[0., 0., 0.],
[3., 3., 0.],
[3., 3., 0.]]], dtype=float32)

Note, however, that there is no need for recompilation. JIT compilation occurs only once, triggered by the first call to op.apply().

An Operator is the root of an IET that typically consists of several nested Iterations and Expressions – two other fundamental IET node types. The user-provided SymPy equations are wrapped within Expressions. Loop nest embedding such expressions are constructed by suitably nesting Iterations.

The Devito compiler constructs the IET from a collection of Clusters, which represent a higher-level intermediate representation (not covered in this tutorial).

The Devito compiler also analyses the IET to determine key computational properties, such as sequential, parallel, and vectorizable. These properties are attached directly to the relevant IET nodes.

We can print the IET structure of an Operator, as well as the attached computational properties, using the utility function pprint.

In [8]:
from devito import pprint
pprint(op)
<Callable Kernel>
<List (0, 2, 0)>

<ArrayCast>
<List (0, 2, 0)>

<DenormalsMacro>

<Element /* Flush denormal numbers to zero in hardware */>
<Element _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);>
<Element _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);>

<List (0, 1, 0)>

<[affine,sequential,wrappable] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
<TimedList (2, 1, 2)>
<C.Statement struct timeval start_section0, end_section0;>
<C.Statement gettimeofday(&start_section0, NULL);>
<Section (1)>

<[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
<[affine,parallel,vector-dim] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
<ExpressionBundle (1)>

<Expression u[t1, x + 1, y + 1] = u[t0, x + 1, y + 1] + 1>

<C.Statement gettimeofday(&end_section0, NULL);>
<C.Statement timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;>

In this example, op is represented as a <Callable Kernel>. Attached to it are metadata, such as _headers and _includes, as well as a body, which includes the children IET nodes. Here, the body is a List object.

In [9]:
Out[9]:
['#define _POSIX_C_SOURCE 200809L']
In [10]:
op._includes
Out[10]:
['stdlib.h', 'math.h', 'sys/time.h', 'xmmintrin.h', 'pmmintrin.h']
In [11]:
op.body
Out[11]:
(<List (0, 2, 0)>,)

We can inspect the List to observe that its immediate children are an <ArrayCast> and another <List>. We can then proceed with the IET traversal until we locate the user-provided SymPy equations.

In [12]:
print(op.body[0].body[0])  # Printing the ArrayCast
float (*restrict u)[u_vec->size[1]][u_vec->size[2]] __attribute__ ((aligned (64))) = (float (*)[u_vec->size[1]][u_vec->size[2]]) u_vec->data;
In [13]:
print(op.body[0].body[1])  # Printing the List
/* Flush denormal numbers to zero in hardware */
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
{
struct timeval start_section0, end_section0;
gettimeofday(&start_section0, NULL);
for (int x = x_m; x <= x_M; x += 1)
{
#pragma omp simd
for (int y = y_m; y <= y_M; y += 1)
{
u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 1;
}
}
gettimeofday(&end_section0, NULL);
timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
}

Below we access the Iteration representing the time loop.

In [14]:
t_iter = op.body[0].body[1].body[1].body[0]
t_iter
Out[14]:
<WithProperties[affine,sequential,wrappable]::Iteration time[t0,t1]; (time_m, time_M, 1)>
In [15]:
print(t_iter)
for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
{
struct timeval start_section0, end_section0;
gettimeofday(&start_section0, NULL);
for (int x = x_m; x <= x_M; x += 1)
{
#pragma omp simd
for (int y = y_m; y <= y_M; y += 1)
{
u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 1;
}
}
gettimeofday(&end_section0, NULL);
timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
}

We can inspect the Iteration to discover what its iteration bounds are.

In [16]:
t_iter.limits
Out[16]:
(time_m, time_M, 1)

And as we keep going down through the IET, we can eventually reach the Expression wrapping the user-provided SymPy equation.

In [17]:
expr = t_iter.nodes[0].body[0].body[0].nodes[0].nodes[0].body[0]
expr.view
Out[17]:
'<Expression u[t1, x + 1, y + 1] = u[t0, x + 1, y + 1] + 1>'

Of course, there are mechanisms in place to, for example, find all Expressions in a given IET. The Devito compiler has a number of IET visitors, among which FindNodes, usable to retrieve all nodes of a particular type. So we easily can get all Expressions within op as follows

In [18]:
from devito.ir.iet import Expression, FindNodes
exprs = FindNodes(Expression).visit(op)
exprs[0].view
Out[18]:
'<Expression u[t1, x + 1, y + 1] = u[t0, x + 1, y + 1] + 1>'