Notebook

Customizing Code Generation¶

Writing custom backends in DaCe is a powerful feature that allows users to specify their own backends. This could range from customizing different support libraries, through new target architectures and use of specialized hardware, to outputting a completely different language than C.

This is made possible due to the modular enumerations and code generation architecture in DaCe. Any enumeration (e.g., for storage types and schedulers) can be extended through the API, in order to enable custom, project-specific behavior, and code generators (similarly to transformations) have a registry that can also be extended at runtime.

Code generation generally follows a recursive procedure, starting from the top-level SDFG:

Generate frame-code and entry point for SDFG
Generate code for array allocation/deallocation based on lifetime (SDFG-wide and persistent lifetime)
Generate control flow between states
Generate each state/scope:
- Generate additional code for array allocation/deallocation based on lifetime (state- or scope-wide)
- Generate data movement code for each memlet path
- Generate each node in a topological, hierarchical order (descending into scopes such as Maps)
  - If a node is a scope entry, generate code for the scope as a subgraph
  - If a node is a nested SDFG, generate code for that SDFG as a separate function and call it

In this tutorial, we will show how to customize the code generation procedure by creating a new map scheduler and generating custom code for it. First, we will import dace and some classes to make the rest of the code cleaner:

In [1]:

import dace
from dace import registry
from dace.sdfg.scope import ScopeSubgraphView
from dace.codegen.prettycode import CodeIOStream
from dace.codegen.targets.target import TargetCodeGenerator
from dace.codegen.targets.framecode import DaCeCodeGenerator
from dace.codegen.targets.cpp import sym2cpp

Next, we can define some simple program to work with:

In [2]:

@dace.program
def simple(A: dace.float64[20, 30]):
    for i, j in dace.map[0:20:2, 0:30]:
        A[i, j] += A[i, j]

# Preview SDFG
sdfg = simple.to_sdfg()
sdfg

Out[2]:

If we observe the generated code, by default our map would be scheduled to an OpenMP multi-core loop:

In [3]:

from IPython.display import Code
Code(sdfg.generate_code()[0].clean_code, language='cpp')

Out[3]:

/* DaCe AUTO-GENERATED FILE. DO NOT MODIFY */
#include <dace/dace.h>

void __program_simple_internal(double * __restrict__ A)
{

    {
        
        
        {
            #pragma omp parallel for
            for (auto i = 0; i < 20; i += 2) {
                for (auto j = 0; j < 30; j += 1) {
                    {
                        double __in1 = A[((30 * i) + j)];
                        double __in2 = A[((30 * i) + j)];
                        double __out;

                        ///////////////////
                        // Tasklet code (augassign_4_8)
                        __out = (__in1 + __in2);
                        ///////////////////

                        A[((30 * i) + j)] = __out;
                    }
                }
            }
        }
    }
}

DACE_EXPORTED void __program_simple(double * __restrict__ A)
{
    __program_simple_internal(A);
}

DACE_EXPORTED int __dace_init_simple(double * __restrict__ A)
{
    int __result = 0;

    return __result;
}

DACE_EXPORTED void __dace_exit_simple(double * __restrict__ A)
{
}

Custom Scope Generator¶

To begin, we need to add our own enum entries for our new map schedule. Here are the current schedule types in our extensible enumeration:

In [4]:

list(dace.ScheduleType)

Out[4]:

[<ScheduleType.Default: 1>,
 <ScheduleType.Sequential: 2>,
 <ScheduleType.MPI: 3>,
 <ScheduleType.CPU_Multicore: 4>,
 <ScheduleType.GPU_Device: 5>,
 <ScheduleType.GPU_ThreadBlock: 6>,
 <ScheduleType.GPU_ThreadBlock_Dynamic: 7>,
 <ScheduleType.GPU_Persistent: 8>,
 <ScheduleType.FPGA_Device: 9>]

Registering a new value is just a matter of calling register:

In [5]:

dace.ScheduleType.register('LoopyLoop')
list(dace.ScheduleType)

Out[5]:

[<ScheduleType.Default: 1>,
 <ScheduleType.Sequential: 2>,
 <ScheduleType.MPI: 3>,
 <ScheduleType.CPU_Multicore: 4>,
 <ScheduleType.GPU_Device: 5>,
 <ScheduleType.GPU_ThreadBlock: 6>,
 <ScheduleType.GPU_ThreadBlock_Dynamic: 7>,
 <ScheduleType.GPU_Persistent: 8>,
 <ScheduleType.FPGA_Device: 9>,
 <ScheduleType.LoopyLoop: 10>]

It can also be used directly from now on:

In [6]:

dace.ScheduleType.LoopyLoop

Out[6]:

<ScheduleType.LoopyLoop: 10>

One additional step for code generation is to tell the code generator which arrays and scopes lie inside by default:

In [7]:

dace.SCOPEDEFAULT_SCHEDULE[dace.ScheduleType.LoopyLoop] = dace.ScheduleType.Sequential
dace.SCOPEDEFAULT_STORAGE[dace.ScheduleType.LoopyLoop] = dace.StorageType.CPU_Heap

Now we can register and create a matching code generator:

In [8]:

@registry.autoregister_params(name='loopy')
class MyCustomLoop(TargetCodeGenerator):
    def __init__(self, frame_codegen: DaCeCodeGenerator, sdfg: dace.SDFG):
        ################################################################
        # Define some locals:
        # Can be used to call back to the frame-code generator
        self.frame = frame_codegen
        # Can be used to dispatch other code generators for allocation/nodes
        self.dispatcher = frame_codegen.dispatcher
        
        ################################################################
        # Register handlers/hooks through dispatcher: Can be used for 
        # nodes, memory copy/allocation, scopes, states, and more.
        
        # In this case, register scopes
        self.dispatcher.register_map_dispatcher(dace.ScheduleType.LoopyLoop, self)
        
        # You can similarly use register_{array,copy,node,state}_dispatcher
        
    # A scope dispatcher will trigger a method called generate_scope whenever 
    # an SDFG has a scope with that schedule
    def generate_scope(self, sdfg: dace.SDFG, scope: ScopeSubgraphView,
                       state_id: int, function_stream: CodeIOStream,
                       callsite_stream: CodeIOStream):
        # The parameters here are:
        # sdfg: The SDFG we are currently generating.
        # scope: The subgraph of the state containing only the scope (map contents)
        #        we want to generate the code for.
        # state_id: The state in the SDFG the subgraph is taken from (i.e., 
        #           `sdfg.node(state_id)` is the same as `scope.graph`)
        # function_stream: A cursor to the global code (which can be used to define
        #                  functions, hence the name).
        # callsite_stream: A cursor to the current location in the code, most of
        #                  the code is generated here.
        
        # We can get the map entry node from the scope graph
        entry_node = scope.source_nodes()[0]
        
        # First, generate an opening brace (for instrumentation and dynamic map ranges)
        callsite_stream.write('{', sdfg, state_id, entry_node)
        
        ################################################################
        # Generate specific code: We will generate a reversed loop with a 
        # comment for each dimension of the map. For the sake of simplicity,
        # dynamic map ranges are not supported.
        
        for param, rng in zip(entry_node.map.params, entry_node.map.range):
            # We use the sym2cpp function from the cpp support functions
            # to convert symbolic expressions to proper C++
            begin, end, stride = (sym2cpp(r) for r in rng)
            
            # Every write is optionally (but recommended to be) tagged with
            # 1-3 extra arguments, serving as line information to match
            # SDFG, state, and graph nodes/edges to written code.
            callsite_stream.write(f'''// Loopy-loop {param}
            for (auto {param} = {end}; {param} >= {begin}; {param} -= {stride}) {{''',
                                  sdfg, state_id, entry_node
            )
        
            # NOTE: CodeIOStream will automatically take care of indentation for us.
        
        
        # Now that the loops have been defined, use the dispatcher to invoke any
        # code generator (including this one) that is registered to deal with
        # the internal nodes in the subgraph. We skip the MapEntry node.
        self.dispatcher.dispatch_subgraph(sdfg, scope, state_id,
                                          function_stream, callsite_stream,
                                          skip_entry_node=True)
        
        # NOTE: Since skip_exit_node above is set to False, closing braces will
        #       be automatically generated

After the code generator has been registered, all that's left is to change the map schedule and generate new code:

In [9]:

# Change schedule
for node, _ in sdfg.all_nodes_recursive():
    if isinstance(node, dace.nodes.MapEntry):
        node.schedule = dace.ScheduleType.LoopyLoop

Code(sdfg.generate_code()[0].clean_code, language='cpp')

Out[9]:

/* DaCe AUTO-GENERATED FILE. DO NOT MODIFY */
#include <dace/dace.h>

void __program_simple_internal(double * __restrict__ A)
{

    {
        
        
        // Loopy-loop i
        for (auto i = 19; i >= 0; i -= 2) {
            // Loopy-loop j
            for (auto j = 29; j >= 0; j -= 1) {
                {
                    double __in1 = A[((30 * i) + j)];
                    double __in2 = A[((30 * i) + j)];
                    double __out;

                    ///////////////////
                    // Tasklet code (augassign_4_8)
                    __out = (__in1 + __in2);
                    ///////////////////

                    A[((30 * i) + j)] = __out;
                }
            }
        }
    }
}
}

DACE_EXPORTED void __program_simple(double * __restrict__ A)
{
__program_simple_internal(A);
}

DACE_EXPORTED int __dace_init_simple(double * __restrict__ A)
{
int __result = 0;

return __result;
}

DACE_EXPORTED void __dace_exit_simple(double * __restrict__ A)
{
}

and the code is generated appropriately.