In this tutorial, we will see how one can write code using the python-numpy frontend. The frontend supports a subset of the python language and array/matrix operations inspired by the numpy module.
Let's start with a first example that will showcase the basic elements of the DaCe program. First, we import the dace
and numpy
modules:
import dace
import numpy as np
Then, we declare the program parameters, which can be either symbols or constants:
M, N, K = 24, 24, 24
We proceed by writing the DaCe program as a regular python method, that is annotated with the dace.program
annotation. The parameters of the python method must have type annotations. For example, below we define a gemm
method, for implementing the generalized matrix-matrix multiplication operation. The first 3 parameters are the 32-bit floating-point matrices A
, B
and C
. The last 2 parameters are the 32-bit floating-point scalar values alpha
and beta
. The implementation of the method is written the same way as in python, using the numpy module.
@dace.program
def gemm(A: dace.float32[M, K], B: dace.float32[K, N], C: dace.float32[M, N],
alpha: dace.float32, beta: dace.float32):
C[:] = alpha * A @ B + beta * C
The [:]
slice expression is representing the whole range of the array/matrix. Note that in DaCe you are not allowed to redefine data in the same DaCe program. Therefore, if you define an array C
, you may not assign to it a different value. For example, if we changed the implementation of the gemm
method to C = alpha * A @ B + beta * C
, we would get an error.
The DaCe program may be parsed to an SDFG and/or compiled using the the same methods from the SDFG API:
sdfg = gemm.to_sdfg()
sdfg
The frontend supports the unary operators {+, -, not, ~}
. Note that the not
operator works the same way as the numpy.logical_not
method.
@dace.program
def uadd(A: dace.int64[5, 5], B: dace.int64[5, 5]):
B[:] = +A
@dace.program
def usub(A: dace.int64[5, 5], B: dace.int64[5, 5]):
B[:] = -A
@dace.program
def logicalnot(A: dace.bool[5, 5], B: dace.bool[5, 5]):
B[:] = not A
@dace.program
def invert(A: dace.int64[5, 5], B: dace.int64[5, 5]):
B[:] = ~A
The frontend support the binary operators {+, -, *, /, //, %, **, @, <<, >>, |, ^, &, and, or, ==, !=, <, <=, >, >=}
. Note that the boolean operators {and, or}
work the same way as the methods numpy.logical_and
and numpy.logical_or
. Apart from the matrix-multiplication operator @
, all the other operators are point-wise. Note that the return type of the operators is the one returned by numpy.result_type
. Exception to that are the boolean operators, which return boolean values.
Python augmented assignments are supported for all operators. Some examples follow:
@dace.program
def augfloordiv(A: dace.int64[5, 5], B: dace.int64[5, 5]):
B //= A
@dace.program
def augmod(A: dace.int64[5, 5], B: dace.int64[5, 5]):
B %= A
@dace.program
def augpow(A: dace.int64[5, 5], B: dace.int64[5, 5]):
B **= A
@dace.program
def auglshift(A: dace.int64[5, 5], B: dace.int64[5, 5]):
B <<= A
Operations between arrays/matrices and scalars or arrays of size 1 behave the same way as with numpy:
@dace.program
def addscalar(A: dace.int64[5, 5], B: dace.int64, C: dace.int64[5, 5]):
C[:] = A + B
@dace.program
def floordivnumber(A: dace.int64[5, 5], C: dace.int64[5, 5]):
C[:] = A // 5
Transient arrays can be defined with dace.define_local
or just numpy.ndarray
. Furthermore, transient scalars can be defined with dace.define_local_scalar
.
@dace.program
def transient(A: dace.float32[M, N, K]):
s = np.ndarray(shape=(M, N, K), dtype=np.int32)
t = dace.define_local(A.shape, A.dtype)
s[:] = A
t[:] = A
Python for-loops are automatically converted to control-flow:
N, BS = (dace.symbol(name) for name in ['N', 'BS'])
@dace.program
def forloop(HD: dace.complex128[N, BS, BS], HE: dace.complex128[N, BS, BS],
HF: dace.complex128[N, BS, BS],
sigmaRSD: dace.complex128[N, BS, BS],
sigmaRSE: dace.complex128[N, BS, BS],
sigmaRSF: dace.complex128[N, BS, BS]):
for n in range(N):
if n < N - 1:
HE[n] -= sigmaRSE[n]
else:
HE[n] = -sigmaRSE[n]
if n > 0:
HF[n] -= sigmaRSF[n]
else:
HF[n] = -sigmaRSF[n]
HD[n] = HD[n] - sigmaRSD[n]
forloop.to_sdfg()
Maps (parallel for-loops) can be created with dace.map
:
Nkz, NE, Nqz, Nw, N3D, NA, NB, Norb = (
dace.symbol(name)
for name in ['Nkz', 'NE', 'Nqz', 'Nw',
'N3D', 'NA', 'NB', 'Norb'])
@dace.program
def maptest(neigh_idx: dace.int32[NA, NB],
dH: dace.complex128[NA, NB, N3D, Norb, Norb],
G: dace.complex128[Nkz, NE, NA, Norb, Norb],
D: dace.complex128[Nqz, Nw, NA, NB, N3D, N3D],
Sigma: dace.complex128[Nkz, NE, NA, Norb, Norb]):
for k, E, q, w, i, j, a, b in dace.map[0:Nkz, 0:NE,
0:Nqz, 0:Nw,
0:N3D, 0:N3D,
0:NA, 0:NB]:
dHG = G[k-q, E-w, neigh_idx[a, b]] @ dH[a, b, i]
dHD = dH[a, b, j] * D[q, w, a, b, i, j]
Sigma[k, E, a] += dHG @ dHD
maptest.to_sdfg()
The python-numpy syntax can be used in combination with the explicit dataflow syntax:
N = dace.symbol('N')
@dace.program
def slicetest(A: dace.float64[N, N - 1], B: dace.float64[N - 1, N],
C: dace.float64[N - 1, N - 1]):
tmp = A[1:N] * B[:, 0:N - 1]
for i, j in dace.map[0:4, 0:4]:
with dace.tasklet:
t << tmp[i, j]
c >> C[i, j]
c = t
@dace.program
def saoptest(A: dace.float64[5, 5], alpha: dace.float64,
B: dace.float64[5, 5]):
tmp = alpha * A * 5
for i, j in dace.map[0:5, 0:5]:
with dace.tasklet:
t << tmp[i, j]
c >> B[i, j]
c = t
Reductions can be defined with dace.reduce
:
W = dace.symbol('W')
H = dace.symbol('H')
@dace.program(dace.float32[H, W], dace.float32[H, W], dace.float32[1])
def mapreduce_test(A, B, sum):
tmp = dace.define_local([H, W], dace.float32)
@dace.map(_[0:H, 0:W])
def compute_tile(i, j):
a << A[i, j]
b >> B[i, j]
t >> tmp[i, j]
b = a * 5
t = a * 5
sum[:] = dace.reduce(lambda a, b: a + b, tmp)
The frontend also supports basic math methods such as {exp, sin, cos, sqrt, log, conj, real, imag}
. See the documentation for details.