In this tutorial we will learn about data regions and how these impact the Operator construction. We will use the simple time marching example shown in the 01_iet tutorial.
from devito import Eq, Grid, TimeFunction, Operator
grid = Grid(shape=(3, 3))
u = TimeFunction(name='u', grid=grid)
u.data[:] = 1
At this point, we have a time-varying 3x3 grid filled with 1's
. Below, we can see the domain
data values:
print(u.data)
[[[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]] [[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]]]
We now create an Operator
that, at each timestep, increments by 2
all points in the computational domain.
from devito import configuration
configuration['openmp'] = 0
eq = Eq(u.forward, u+2)
op = Operator(eq)
We can print op
to get the generated code.
print(op)
#define _POSIX_C_SOURCE 200809L #include "stdlib.h" #include "math.h" #include "sys/time.h" #include "xmmintrin.h" #include "pmmintrin.h" struct profiler { double section0; } ; int Kernel(float *restrict u_vec, const int time_M, const int time_m, struct profiler* timers, const int x_M, const int x_m, const int x_size, const int y_M, const int y_m, const int y_size) { float (*restrict u)[x_size + 1 + 1][y_size + 1 + 1] __attribute__((aligned(64))) = (float (*)[x_size + 1 + 1][y_size + 1 + 1]) u_vec; /* Flush denormal numbers to zero in hardware */ _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON); _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2)) { struct timeval start_section0, end_section0; gettimeofday(&start_section0, NULL); for (int x = x_m; x <= x_M; x += 1) { #pragma omp simd for (int y = y_m; y <= y_M; y += 1) { u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 2; } } gettimeofday(&end_section0, NULL); timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000; } return 0; }
When we take a look at the constructed expression, u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 2
, we see several +1
were added to the u
's spatial indexes.
This is because the domain region is actually surrounded by 'ghost' points, which can be accessed via a stencil when iterating in proximity of the domain boundary. The ghost points define the halo
region. The halo region can be accessed through the data_with_halo
data accessor; as we see below, the halo points correspond to the zeros surrounding the domain region.
print(u.data_with_halo)
[[[0. 0. 0. 0. 0.] [0. 1. 1. 1. 0.] [0. 1. 1. 1. 0.] [0. 1. 1. 1. 0.] [0. 0. 0. 0. 0.]] [[0. 0. 0. 0. 0.] [0. 1. 1. 1. 0.] [0. 1. 1. 1. 0.] [0. 1. 1. 1. 0.] [0. 0. 0. 0. 0.]]]
By adding the '+1' offsets, the Devito compiler ensures the array accesses are logically aligned to the equation’s physical domain. For instance, the TimeFunction
u(t, x, y)
used in the example above has one point on each side of the x
and y
halo regions; if the user writes an expression including u(t, x, y)
and u(t, x + 2, y + 2)
, the compiler will ultimately generate u[t, x + 1, y + 1]
and u[t, x + 3, y + 3]
. When x = y = 0
, therefore, the values u[t, 1, 1]
and u[t, 3, 3]
are fetched, representing the first and third points in the physical domain.
By default, the halo region has space_order
points on each side of the space dimensions. Sometimes, these points may be unnecessary, or, depending on the partial differential equation being approximated, extra points may be needed.
u0 = TimeFunction(name='u0', grid=grid, space_order=0)
u0.data[:] = 1
print(u0.data_with_halo)
[[[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]] [[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]]]
u2 = TimeFunction(name='u2', grid=grid, space_order=2)
u2.data[:] = 1
print(u2.data_with_halo)
[[[0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.]] [[0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.]]]
One can also pass a 3-tuple (o, lp, rp)
instead of a single integer representing the discretization order. Here, o
is the discretization order, while lp
and rp
indicate how many points are expected on left (lp) and right (rp) sides of a point of interest.
u_new = TimeFunction(name='u_new', grid=grid, space_order=(4, 3, 1))
u_new.data[:] = 1
print(u_new.data_with_halo)
[[[0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 1. 1. 1. 0.] [0. 0. 0. 1. 1. 1. 0.] [0. 0. 0. 1. 1. 1. 0.] [0. 0. 0. 0. 0. 0. 0.]] [[0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 1. 1. 1. 0.] [0. 0. 0. 1. 1. 1. 0.] [0. 0. 0. 1. 1. 1. 0.] [0. 0. 0. 0. 0. 0. 0.]]]
Let's have a look at the Operator generated code when using u_new
.
equation = Eq(u_new.forward, u_new + 2)
op = Operator(equation)
print(op)
#define _POSIX_C_SOURCE 200809L #include "stdlib.h" #include "math.h" #include "sys/time.h" #include "xmmintrin.h" #include "pmmintrin.h" struct profiler { double section0; } ; int Kernel(float *restrict u_new_vec, const int time_M, const int time_m, struct profiler* timers, const int x_M, const int x_m, const int x_size, const int y_M, const int y_m, const int y_size) { float (*restrict u_new)[x_size + 1 + 3][y_size + 1 + 3] __attribute__((aligned(64))) = (float (*)[x_size + 1 + 3][y_size + 1 + 3]) u_new_vec; /* Flush denormal numbers to zero in hardware */ _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON); _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2)) { struct timeval start_section0, end_section0; gettimeofday(&start_section0, NULL); for (int x = x_m; x <= x_M; x += 1) { #pragma omp simd for (int y = y_m; y <= y_M; y += 1) { u_new[t1][x + 3][y + 3] = u_new[t0][x + 3][y + 3] + 2; } } gettimeofday(&end_section0, NULL); timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000; } return 0; }
And finally, let's run it, to convince ourselves that only the domain region values will be incremented at each timestep.
op.apply(time_M=2)
print(u_new.data_with_halo)
Operator `Kernel` run in 0.00 s
[[[0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 5. 5. 5. 0.] [0. 0. 0. 5. 5. 5. 0.] [0. 0. 0. 5. 5. 5. 0.] [0. 0. 0. 0. 0. 0. 0.]] [[0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 7. 7. 7. 0.] [0. 0. 0. 7. 7. 7. 0.] [0. 0. 0. 7. 7. 7. 0.] [0. 0. 0. 0. 0. 0. 0.]]]
The halo region, in turn, is surrounded by the padding
region, which can be used for data alignment. By default, there is no padding. This can be changed by passing a suitable value to 'padding', as shown below:
u_pad = TimeFunction(name='u_pad', grid=grid, space_order=2, padding=(0,2,2))
u_pad.data_with_halo[:] = 1
u_pad.data[:] = 2
equation = Eq(u_pad.forward, u_pad + 2)
op = Operator(equation)
print(op)
#define _POSIX_C_SOURCE 200809L #include "stdlib.h" #include "math.h" #include "sys/time.h" #include "xmmintrin.h" #include "pmmintrin.h" struct profiler { double section0; } ; int Kernel(float *restrict u_pad_vec, const int time_M, const int time_m, struct profiler* timers, const int x_M, const int x_m, const int x_size, const int y_M, const int y_m, const int y_size) { float (*restrict u_pad)[x_size + 2 + 2 + 2 + 2][y_size + 2 + 2 + 2 + 2] __attribute__((aligned(64))) = (float (*)[x_size + 2 + 2 + 2 + 2][y_size + 2 + 2 + 2 + 2]) u_pad_vec; /* Flush denormal numbers to zero in hardware */ _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON); _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2)) { struct timeval start_section0, end_section0; gettimeofday(&start_section0, NULL); for (int x = x_m; x <= x_M; x += 1) { #pragma omp simd for (int y = y_m; y <= y_M; y += 1) { u_pad[t1][x + 4][y + 4] = u_pad[t0][x + 4][y + 4] + 2; } } gettimeofday(&end_section0, NULL); timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000; } return 0; }
We see that +4
was added to each space dimension index. Of this '+4', '+2' is due to the halo (space_order=2
), and another +2 to the padding.
Although in practice not very useful, with the (private) _data_allocated
accessor one can see the entire domain+halo+padding region.
print(u_pad._data_allocated)
[[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 2. 2. 2. 1. 1. 0. 0.] [0. 0. 1. 1. 2. 2. 2. 1. 1. 0. 0.] [0. 0. 1. 1. 2. 2. 2. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 2. 2. 2. 1. 1. 0. 0.] [0. 0. 1. 1. 2. 2. 2. 1. 1. 0. 0.] [0. 0. 1. 1. 2. 2. 2. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [0. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]
Above, the domain is filled with 2's, the halo is filled with 1's, and the padding is filled with 0's.