Domain, Halo and Padding regions

In this tutorial we will learn about data regions and how these impact the Operator construction. We will use the simple time marching example shown in the 01_iet tutorial.

In [1]:
from devito import Eq, Grid, TimeFunction, Operator

grid = Grid(shape=(3, 3))
u = TimeFunction(name='u', grid=grid)
u.data[:] = 1

At this point, we have a time-varying 3x3 grid filled with 1's. Below, we can see the domain data values:

In [2]:
print(u.data)
[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]

We now create an Operator that, at each timestep, increments by 2 all points in the computational domain.

In [3]:
from devito import configuration
configuration['openmp'] = 0

# For illustration purposes, we ask Devito to generate explicit array casts.
configuration['codegen'] = 'explicit'

eq = Eq(u.forward, u+2)
op = Operator(eq)

We can print op to get the generated code.

In [4]:
print(op)
#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
#include "sys/time.h"
#include "xmmintrin.h"
#include "pmmintrin.h"

struct dataobj
{
  void *restrict data;
  int * size;
  int * npsize;
  int * dsize;
  int * hsize;
  int * hofs;
  int * oofs;
} ;

struct profiler
{
  double section0;
} ;


int Kernel(struct dataobj *restrict u_vec, const int time_M, const int time_m, struct profiler * timers, const int x_M, const int x_m, const int x_size, const int y_M, const int y_m, const int y_size)
{
  float (*restrict u)[x_size + 1 + 1][y_size + 1 + 1] __attribute__ ((aligned (64))) = (float (*)[x_size + 1 + 1][y_size + 1 + 1]) u_vec->data;
  /* Flush denormal numbers to zero in hardware */
  _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
  for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
  {
    struct timeval start_section0, end_section0;
    gettimeofday(&start_section0, NULL);
    for (int x = x_m; x <= x_M; x += 1)
    {
      #pragma omp simd
      for (int y = y_m; y <= y_M; y += 1)
      {
        u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 2;
      }
    }
    gettimeofday(&end_section0, NULL);
    timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
  }
  return 0;
}

When we take a look at the constructed expression, u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 2, we see several +1 were added to the u's spatial indexes.

This is because the domain region is actually surrounded by 'ghost' points, which can be accessed via a stencil when iterating in proximity of the domain boundary. The ghost points define the halo region. The halo region can be accessed through the data_with_halo data accessor; as we see below, the halo points correspond to the zeros surrounding the domain region.

In [5]:
print(u.data_with_halo)
[[[0. 0. 0. 0. 0.]
  [0. 1. 1. 1. 0.]
  [0. 1. 1. 1. 0.]
  [0. 1. 1. 1. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 1. 1. 1. 0.]
  [0. 1. 1. 1. 0.]
  [0. 1. 1. 1. 0.]
  [0. 0. 0. 0. 0.]]]

By adding the '+1' offsets, the Devito compiler ensures the array accesses are logically aligned to the equation’s physical domain. For instance, the TimeFunction u(t, x, y) used in the example above has one point on each side of the x and y halo regions; if the user writes an expression including u(t, x, y) and u(t, x + 2, y + 2), the compiler will ultimately generate u[t, x + 1, y + 1] and u[t, x + 3, y + 3]. When x = y = 0, therefore, the values u[t, 1, 1] and u[t, 3, 3] are fetched, representing the first and third points in the physical domain.

By default, the halo region has space_order points on each side of the space dimensions. Sometimes, these points may be unnecessary, or, depending on the partial differential equation being approximated, extra points may be needed.

In [6]:
u0 = TimeFunction(name='u0', grid=grid, space_order=0)
u0.data[:] = 1
print(u0.data_with_halo)
[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]
In [7]:
u2 = TimeFunction(name='u2', grid=grid, space_order=2)
u2.data[:] = 1
print(u2.data_with_halo)
[[[0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 1. 1. 1. 0. 0.]
  [0. 0. 1. 1. 1. 0. 0.]
  [0. 0. 1. 1. 1. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 1. 1. 1. 0. 0.]
  [0. 0. 1. 1. 1. 0. 0.]
  [0. 0. 1. 1. 1. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]]]

One can also pass a 3-tuple (o, lp, rp) instead of a single integer representing the discretization order. Here, o is the discretization order, while lp and rp indicate how many points are expected on left (lp) and right (rp) sides of a point of interest.

In [8]:
u_new = TimeFunction(name='u_new', grid=grid, space_order=(4, 3, 1))
In [9]:
u_new.data[:] = 1
print(u_new.data_with_halo)
[[[0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 1. 1. 1. 0.]
  [0. 0. 0. 1. 1. 1. 0.]
  [0. 0. 0. 1. 1. 1. 0.]
  [0. 0. 0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 1. 1. 1. 0.]
  [0. 0. 0. 1. 1. 1. 0.]
  [0. 0. 0. 1. 1. 1. 0.]
  [0. 0. 0. 0. 0. 0. 0.]]]

Let's have a look at the Operator generated code when using u_new.

In [10]:
equation = Eq(u_new.forward, u_new + 2)
op = Operator(equation)
print(op)
#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
#include "sys/time.h"
#include "xmmintrin.h"
#include "pmmintrin.h"

struct dataobj
{
  void *restrict data;
  int * size;
  int * npsize;
  int * dsize;
  int * hsize;
  int * hofs;
  int * oofs;
} ;

struct profiler
{
  double section0;
} ;


int Kernel(struct dataobj *restrict u_new_vec, const int time_M, const int time_m, struct profiler * timers, const int x_M, const int x_m, const int x_size, const int y_M, const int y_m, const int y_size)
{
  float (*restrict u_new)[x_size + 1 + 3][y_size + 1 + 3] __attribute__ ((aligned (64))) = (float (*)[x_size + 1 + 3][y_size + 1 + 3]) u_new_vec->data;
  /* Flush denormal numbers to zero in hardware */
  _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
  for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
  {
    struct timeval start_section0, end_section0;
    gettimeofday(&start_section0, NULL);
    for (int x = x_m; x <= x_M; x += 1)
    {
      #pragma omp simd
      for (int y = y_m; y <= y_M; y += 1)
      {
        u_new[t1][x + 3][y + 3] = u_new[t0][x + 3][y + 3] + 2;
      }
    }
    gettimeofday(&end_section0, NULL);
    timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
  }
  return 0;
}

And finally, let's run it, to convince ourselves that only the domain region values will be incremented at each timestep.

In [11]:
op.apply(time_M=2)
print(u_new.data_with_halo)
Operator `Kernel` run in 0.00 s
[[[0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 5. 5. 5. 0.]
  [0. 0. 0. 5. 5. 5. 0.]
  [0. 0. 0. 5. 5. 5. 0.]
  [0. 0. 0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 7. 7. 7. 0.]
  [0. 0. 0. 7. 7. 7. 0.]
  [0. 0. 0. 7. 7. 7. 0.]
  [0. 0. 0. 0. 0. 0. 0.]]]

The halo region, in turn, is surrounded by the padding region, which can be used for data alignment. By default, there is no padding. This can be changed by passing a suitable value to 'padding', as shown below:

In [12]:
u_pad = TimeFunction(name='u_pad', grid=grid, space_order=2, padding=(0,2,2))
u_pad.data_with_halo[:] = 1
u_pad.data[:] = 2
equation = Eq(u_pad.forward, u_pad + 2)
op = Operator(equation)
print(op)
#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
#include "sys/time.h"
#include "xmmintrin.h"
#include "pmmintrin.h"

struct dataobj
{
  void *restrict data;
  int * size;
  int * npsize;
  int * dsize;
  int * hsize;
  int * hofs;
  int * oofs;
} ;

struct profiler
{
  double section0;
} ;


int Kernel(struct dataobj *restrict u_pad_vec, const int time_M, const int time_m, struct profiler * timers, const int x_M, const int x_m, const int x_size, const int y_M, const int y_m, const int y_size)
{
  float (*restrict u_pad)[x_size + 2 + 2 + 2][y_size + 2 + 2 + 2] __attribute__ ((aligned (64))) = (float (*)[x_size + 2 + 2 + 2][y_size + 2 + 2 + 2]) u_pad_vec->data;
  /* Flush denormal numbers to zero in hardware */
  _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
  for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
  {
    struct timeval start_section0, end_section0;
    gettimeofday(&start_section0, NULL);
    for (int x = x_m; x <= x_M; x += 1)
    {
      #pragma omp simd
      for (int y = y_m; y <= y_M; y += 1)
      {
        u_pad[t1][x + 2][y + 2] = u_pad[t0][x + 2][y + 2] + 2;
      }
    }
    gettimeofday(&end_section0, NULL);
    timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
  }
  return 0;
}

Although in practice not very useful, with the (private) _data_allocated accessor one can see the entire domain+halo+padding region.

In [13]:
print(u_pad._data_allocated)
[[[1. 1. 1. 1. 1. 1. 1. 0. 0.]
  [1. 1. 1. 1. 1. 1. 1. 0. 0.]
  [1. 1. 2. 2. 2. 1. 1. 0. 0.]
  [1. 1. 2. 2. 2. 1. 1. 0. 0.]
  [1. 1. 2. 2. 2. 1. 1. 0. 0.]
  [1. 1. 1. 1. 1. 1. 1. 0. 0.]
  [1. 1. 1. 1. 1. 1. 1. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0.]]

 [[1. 1. 1. 1. 1. 1. 1. 0. 0.]
  [1. 1. 1. 1. 1. 1. 1. 0. 0.]
  [1. 1. 2. 2. 2. 1. 1. 0. 0.]
  [1. 1. 2. 2. 2. 1. 1. 0. 0.]
  [1. 1. 2. 2. 2. 1. 1. 0. 0.]
  [1. 1. 1. 1. 1. 1. 1. 0. 0.]
  [1. 1. 1. 1. 1. 1. 1. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0.]]]

Above, the domain is filled with 2's, the halo is filled with 1's, and the padding is filled with 0's.