Notebook

How DyND Views Memory¶

Basics of the DyND Architecture Part 1¶

Mark Wiebe
Thinkbox Software

Motivating the DyND Array Data Structure¶

To work with DyND at a low level, particularly to be able to develop in the library, one needs a good understanding of how DyND views memory. We're going to work our way from Python objects to DyND arrays, motivating its design as we go.

At the end of these slides, you will hopefully grasp why DyND is structured as it is, and be prepared to dig deeper into its code.

Starting From Python¶

Python Objects¶

We begin with a look at a Python object. In Python, objects are stored in memory blocks independently allocated on the heap. A single floating point value looks something like this:

Python Float

This doesn’t look too bad, in a dynamically typed system we need to store some metadata about the type, and we need to store the data containing the value.

Python List¶

Often, however, we don’t want just one floating point value, we want a whole array of them. In Python, this is done using a list object, which looks like this:

Python List of Float

Python List Problems¶

We can start to see some problems from this picture.

The float type metadata is repeated seven times. No way to say “everything is float, just store that once.”
The memory for the floats might be separated or out of order in memory. Bad for CPU cache utilization.
There is memory being used for the pointers to all the individual floats.

We will call this the “Smalltalk road.” Program data is made of many little objects, connected by pointers. Another option is the “Fortran road,” where program data is made of arrays of data, stored contiguously.

Smalltalk Road vs Fortran Road¶

There's a long but worthwhile set of lectures by Alexander Stepanov on Amazon A9's youtube site which goes into some depth about many programming topics.

In [1]:

from IPython.display import YouTubeVideo
YouTubeVideo("Y4qdNddSWAg", start=190)

Out[1]:

Why Python Is Slow¶

Jake VanderPlas has an excellent blog post which dives into more details of Python's structure, that of the CPython interpreter in particular.

Another great place to look for performance inspiration is the game programming industry, as they work under constraints far more harsh than most developers. Bob Nystrom's book "Game Programming Patterns" has a chapter about data locality which goes into some depth on the topic and has some nice diagrams to illustrate things. Here's a quote:

"When I got some stuff working, I was surprised. I knew it was a big deal, but there’s nothing quite like seeing it with your own eyes. I wrote two programs that did the exact same computation. The only difference was how many cache misses they caused. The slow one was fifty times slower than the other."

Dynamic Typed Arrays¶

Dynamic Array Of Float¶

Let's take the Python list we looked at, factor out the redundant repetition of the float type metadata, and split all the type information away from the value storage. This gives us an abstract idea of how a dynamic typed array should look:

Dynamic Array

NumPy Array¶

The dynamic typed array we made is just a small step away from the NumPy array. In NumPy, the “Array of 7” part is represented as (ndim, shape, strides), and the “Float type” part is represented as the NumPy dtype.

NumPy Array

NumPy Strided Array¶

The strided array approach NumPy takes turns out to be a very good data structure for many kinds of numeric data. Allowing the strides to be set arbitrarily means both C-order (row-major) and Fortran-order (column-major) data can be represented naturally. There is good material out there going into more depth showing the consequences and benefits of it.

One good resource is this section of open SciPy lecture notes.

Weaknesses of NumPy¶

NumPy, as fantastic as it is, has proved to be lacking in many areas. It provides something good enough and stable enough that a huge ecosystem has grown on top of it, but there are many directions people try to stretch it that require incredible contortions or performance sacrifices to achieve.

A few of these weaknesses include

A slow pace of evolution. NumPy development is highly constrained by backwards compatibility requirements.
Written directly against the CPython API. There is no core library that can be shared in a larger community. (Compare with Jupyter)
Implemented in C, which tends to be very verbose and error-prone relative to its expressiveness. C++, especially with the recent C++11 and C++14 standards, has many features that help with implementing this kind of library.
Missing features that people want: Ragged arrays, more data types, labeled dimensions, R-style missing data support, and physical units metadata, to name a few.

The DyND Array Object¶

The DyND Array¶

DyND begins one step back from NumPy in the data representation narrative we have followed, generalizing the “Dynamic Typed Array” we saw differently.

Instead of focusing on the “Array of 7” part and turning it into a “general strided multi-dimensional array,” DyND treats it as one dimension of a hierarchically represented type. DyND takes the regularities in the types of stored data, bundles them together into a type, and takes the data as a separate bundle. This gets us a picture like this:

DyND Array Take 1

DyND Types¶

The types DyND is using are from datashape, a type system created for the Blaze project. What datashape does is combine the shape and dtype notions we saw in the NumPy array.

The shape is generalized slightly, allowing both NumPy-style strided dimensions, called “fixed” as seen in the DyND Array diagram, as well as variable-sized dimensions, called “var.” This variable-sized dimension allows for storage of ragged data.

The dtype is also slightly generalized over NumPy, but more importantly a language is defined for writing out types in the datashape system. This is spelled out in the datashape documentation.

Slicing Arrays¶

Slicing is a pretty common operation to perform on numerical arrays, so we're going to use it to examine how the structure of the DyND array we defined fares.

In NumPy, slicing is enabled by the strided nature of the array object and the ability to point at the data of another array object. DyND works much the same way. Each “fixed” dimension has an associated stride, and the DyND array contains a data reference in addition to the raw data pointer.

Our diagram didn't include the stride, though, so we have to determine where it will go.

Based on what was described so far, there are two places: in the type or in the value. Let’s first imagine we put the stride in the type, and see what happens when we apply two different slices to an array.

Putting The Stride In The Type¶

To take a look at what will happen, let's consider an array A with three elements, [1, 2, 3], and type “fixed[size=3, stride=8] * float64”.

The Array A

Slice First Two Elements¶

Our first slice grabs first two elements, in Python it is A[:2]. This produces an array with a new type “fixed[size=2, stride=8] * float64”, and points to the same data as A.

A sliced first two elements

Slice Every Second Element¶

Our second slice grabs every second element, in Python it is A[::2]. This produces an array with a new type “fixed[size=2, stride=16] * float64”, and points to the same data as A.

A sliced first two elements

Problems With Stride In The Type¶

By slicing different ways, we’ve produced two arrays which are “arrays with two floats,” but have different types. In NumPy, this distinction is hidden away in the strides value attached to the object, and the transparent handling of the this detail is considered a feature, not a problem. It would be nicer to just be able to say we want a “fixed dimension of size 2” or just a “fixed dimension.”

DyND's usual strided arrays do not include the stride in the type. The types are spelled "2 * float64" and "fixed * float64".

Putting The Stride In The Data¶

The other place we had available for the stride information is with the data. There are multiple ways we could do this, but let’s just consider the most obvious one, storing the stride immediately followed by the data. This means our original array looks like this:

DyND Array with Stride in the Data

Slice First Two Elements¶

Our first slice, A[:2], still looks good. The only things that change in the diagram are the “fixed[3]” dimension becomes “fixed[2]”, and the third value in the array is ignored.

A sliced first two elements

Slice Every Second Element¶

Our second slice, however, leads to trouble. We want to view the same data, but now with a different stride. The stride must remain 8 for the original A, but needs to be 16 for the sliced A[::2], leading to the conclusion that the set of arrays viewed by this representation is not closed under slicing.

A sliced every second element

Another trivial yet problematic case is slicing the last two elements, where the place for the stride would overlap with a value.

Where To Put The Stride?¶

Since neither the type nor the data of the array seem like appropriate places to put the stride, we need a new location for it. Recall that in NumPy, the array object contains a dtype, a data pointer, and a few additional values including ndim, shape, and strides. We’ve put ndim and shape into the type, but we’re going to leave strides in the array object, creating some new space called “array metadata” or “arrmeta” for short.

While the exercise we just went through may feel like a triviality, it’s partly an attempt to get you thinking about how various data types get laid out in memory. In this example, just placing the stride ahead of the data leads to something absurd, but there are other cases like the “var” dimension type, which are structured pretty close to this.

Array Metadata (Arrmeta)¶

Every dynd type has the opportunity to store an arbitrary but fixed amount of arrmeta in the array object. This is a place to store information that doesn't really belong in the type, but is also necessary to fully describe the data the array object is pointing at.

In the case of the fixed dimension type, we have chosen to store both the size and the stride of the dimension. (The size is redundant, but included for simple low-level access.) For other types, other arrmeta is needed. We’re going to examine what arrmeta we should choose for a struct/record type, by thinking about some operations we want to perform on data of that type.

A DyND Struct¶

Let’s start with a simple 2D point type, with an X and a Y coordinate. In datashape, we can spell it as “{X: float64, Y: float64}”. If we treat it as a C/C++ struct, here’s how an array A of this type looks:

DyND Struct Take 1

Accessing A Struct Field¶

Just like with the slicing operation we had before, we have simple operations we want to support, such as field selection. If we access field A.Y, we get

A.Y

A More Complicated Struct¶

A struct with only two fields isn't enough to illustrate our next operation, of selecting multiple fields. In addition to the coordinates, our data might include differentials as well, so let's add them.

More Complicated Struct Take 1

Selecting Multiple Fields¶

Suppose we have a function which takes values of type “{X: float64, Y: float64}” as a parameter, and we want to pass the X and Y coordinates from our complicated struct to that function. A natural thing to try is to select these two fields to get a new struct.

More Complicated Struct Take 1

Something went wrong here, just like when we tried to put the stride in the fixed dimension type. The values don't match a C/C++ struct anymore, the second offset can't be 8. Similarly if the offset is 16, the type is different from the type the function we have takes.

Field Offsets Are Like Array Strides¶

Just like when we first looked at where to put the stride for the fixed dimension, we have made a choice of where to put the offsets for the struct dtype. In this case, it was quite intuitive to simply do what C/C++ does, and put those offsets in the type. This is in fact what NumPy does.

This led to trouble, however, which was the exact same trouble that made us put the stride into the arrmeta. Therefore, we put these offsets in the arrmeta.

DyND Struct With Offsets In The Arrmeta¶

Here's how it looks after we move the struct’s field offsets into the arrmeta. Our data A of type “{X: float64, DX: float64, Y: float64, DY: float64}” looks like:

DyND Struct Take 2

Struct Field Selection¶

Now if we do a field selection, extracting the X and Y fields, we’ll get a DyND struct whose type is “{X: float64, Y: float64}”, exactly as we wanted. The layout is not the same as a C/C++ struct, but views of the data are closed under field selection, just as views of strided arrays are closed under slicing. Here’s the result of that field selection:

Struct field selection

Nesting Arrmeta¶

So far we’ve seen two forms of arrmeta, for fixed dimensions and for structs. A natural question arises, what happens when we have arrays of structs or structs of arrays? Let's start with a simple case, an array of XY coordinates with type "10 * {X: float64, Y: float64}". The dimension type, "10" (short for "fixed[10]"), needs two arrmeta values, size and stride. The struct dtype also needs two arrmeta values, offsetX and offsetY. These need to be systematically organized, so that the arrmeta can scale up to arbitrary complexity.

If we think of the type as a tree, we can use its hierarchy to organize the arrmeta structure. Here's how that looks:

Simple nested arrmeta

Struct Field Arrmeta¶

So far, we've only made fields of type float64 in structs, so there has been no arrmeta to track. Struct fields are not limited this way, however, so let's do something a bit more involved. Let's make a struct of an array and a struct, with type "{X: 10 * float64, Y: {Z: float64, W: float64}}".

Simple nested arrmeta

Arbitrary Nesting¶

This nesting can get arbitrarily deep, and walking through the arrmeta always matches the tree structure of its corresponding type. DyND provides a way to print a representation of the arrmeta, which in Python is provided by the nd.debug_repr function. Here's how that looks:

>>> a = nd.empty("10 * {A: 3 * float64, B: 3 * {X: float64, Y: 4 * float64}, C: 2 * float64}")
>>> nd.debug_repr(a)
------ array
 address: 00000000003A6060
 refcount: 1
 type:
  pointer: 00000000003974C0
  type: 10 * {A : 3 * float64, B : 3 * {X : float64, Y : 4 * float64}, C : 2 * float64}
  type refcount: 1
 arrmeta:
  flags: 3 (read_access write_access )
  type-specific arrmeta:
   fixed_dim arrmeta
    size: 10
    stride: 160
    struct arrmeta
     field offsets: 0, 24, 144
     field 0 (name A) arrmeta:
      fixed_dim arrmeta
       size: 3
       stride: 8
     field 1 (name B) arrmeta:
      fixed_dim arrmeta
       size: 3
       stride: 40
       struct arrmeta
        field offsets: 0, 8
        field 1 (name Y) arrmeta:
         fixed_dim arrmeta
          size: 4
          stride: 8
     field 2 (name C) arrmeta:
      fixed_dim arrmeta
       size: 2
       stride: 8
 data:
   pointer: 00000000003A6108
   reference: 0000000000000000 (embedded in array memory)
------

DyND Memory Blocks¶

Memory Ownership¶

Memory leaks can be a big problem, leading to the enormous popularity of systems using garbage collection as the main strategy to deal with it. Garbage collection requires a tightly controlled environment, however, and a library designed to interoperate with the memory of multiple systems cannot make strong assumptions about them.

In modern C++, the standard approach is to reason about memory ownership, using deterministic scoping to manage lifetimes. An important acronym to know is RAII (Resource Aquisition Is Initialization), which summarizes the basic idea of using memory ownership to systematically avoid memory leaks. See the Wikipedia article for a decent summary.

Being able to point at memory owned by other objects is necessary for the "view" model DyND is using. To manage this, DyND uses data references independent from the pointers, based on a notion of memory block objects.

DyND Memory Blocks¶

To track ownership and lifetime of DyND arrays and the data they point to, DyND uses reference-counted objects called memory blocks. These have semantics similar to the standard C++ shared_ptr, utilizing an atomic reference count to keep track of how many references to the object are being held.

This is similar to Python objects in the CPython interpreter, which use reference counting as well. An important difference is that Python does not use atomic reference counting, and consequently is faster but requires the GIL (Global Interpreter Lock) be held any time objects are manipulated.

DyND uses the same ideas as C++ shared_ptr to make the reference count be mostly handled automatically, but it's useful to understand reference counting especially to work with DyND at a lower level. The CPython Reference Counting Documentation is a good read for this. Because the CPython API is C, there is no way to automatically handle reference counts, and one must manage them explicitly when using that API.

DyND Array With Data Reference¶

Putting the memory block data reference together with the other parts of the DyND array, this leads us to the following picture:

Simple nested arrmeta

Small Object Optimization¶

Memory allocations are fairly expensive operations, so creating a single DyND array can be costly. Additionally, when the memory block for the data is allocated separately from the main array object, it might be arbitrarily far away from that data in memory, reducing data locality.

DyND supports an optimization for this case, where the memory for the data and the array object are allocated in one chunk. This is signaled by a NULL data reference, because holding a reference to itself would cause a reference cycle.

Simple nested arrmeta

Referencing Data From Other Systems¶

An important use case for DyND is the ability to point into memory owned by another system. For example Python PEP 3118, simply known as the buffer protocol in Python 3, provides an interface through which to publish strided array memory. Another example might be the result of an algorithm in a library being wrapped, returned as an object containing a large heap-allocated array.

DyND has something called the external memory block for this case, a memory block which does not itself contain the memory, but owns a reference to the memory that it knows how to free. This is similar to the Capsule object in Python.

Simple nested arrmeta

DyND Types With Pointers¶

The DyND Pointer Type¶

Variable-sized data is a commonly desired feature missing from NumPy, outside of its support for PyObject arrays. Proper support for this requires the ability to allocate and point to memory outside of the fixed-size data for a single array element. This is one of the purposes of DyND's support for pointers.

DyND data elements can contain pointers into separately allocated memory. We're going to introduce the topic via the "pointer" type, which represents a single pointer into some other memory. Some more types, like the "var" dimension, "string", and "bytes" also use pointers in their representation.

The pointers in an array's data will be pointing at some other data. It wouldn't be ok for this other data to disappear, so we need some way to hang on to a reference to it. Just like the stride and offsets before, this gets stored in the array's arrmeta.

One major difference with previously seen arrmeta is that the reference to that data will need to be freed when the array object is freed. DyND provides a mechanism for this via an arrmeta destructor function.

Simple Pointer Array¶

Let's start with something really simple, a single pointer to an array of floats. The type is spelled "pointer[7 * float64]", and the array object looks like:

Simple pointer array

Observe that the DataRef in the array object manages ownership for all data that the array object points to. Similarly, the DataRef in the pointer's arrmeta manages ownership for all data that pointers in the data corresponding to that pointer point to.

Pointer Interactions With Slicing¶

Recall the technique we used before to evaluate our type and data storage choices, by checking that the representation is closed over slicing. Let's try that here by slicing the last two elements of the array. If the array is A, we're producing A[-2:].

Simple pointer array

We can't modify the pointer, as the original A is still using it. Recall that before when we sliced, we got a new array object in which we could populate its data pointer. Now we're not getting a new data pointer in A's data, so we must do something different.

Pointer Arrmeta Includes An Offset¶

Because of the uniform way slicing applies, every pointer would be offset by the same amount. Thus, we can store a single offset value in the pointer's arrmeta, and then slicing works.

Simple pointer array

The Var Dimension¶

The "var" dimension allows for the storage of ragged arrays, where the size of the dimension marked "var" does not remain constant. A single element of this type includes a pointer and a dimension size. The arrmeta contains a stride, just like the fixed dimension, along with a data reference and an offset, just like the pointer type.

We'll make a simple ragged array with type "2 * var * float64" to illustrate this structure.

Simple pointer array

Pooled Memory For Var Dimensions¶

An array with the structure we just saw won't normally "poof" into existence fully formed. It would instead start in some initial empty state, and be filled in element by element. DyND handles this by initializing all the var dim elements to NULL, then having the memblock for the var dim expose a pooled memory allocator interface. When we call nd::empty("2 * var * float64"), we get back an array object looking like this:

Simple pointer array

Pooled Memory For Var Dimensions¶

After assigning to the first element, we get this:

Simple pointer array

Pooled Memory For Var Dimensions¶

After assigning to the second element, we get the fully populated array:

Simple pointer array

Pooled Memory For Var Dimensions¶

This pooled memory approach has some advantages and some disadvantages. A big advantage is that the data allocated in the pooled memblock can be viewed in the same way as other data in dynd memblocks. For example, if we access via a slice A[1, :], DyND will collapse the singular var dimension into a fixed dimension.

Simple pointer array

Pooled Memory For Var Dimensions¶

A big disadvantage of the pooled approach is that resizing an element of a var dimension after it has been created isn't allowed. If it were allowed, very simple code which repeatedly resized an element would be causing many allocations without any frees, a highly unexpected behavior.

DyND needs support for non-pooled allocations as well, for all of the string, bytes, and var dimension types that currently only support pooled allocations. The details of this hasn't been fully worked out, but it may be that this gets done by having the memblock of the dataref expose different kinds of memory allocators for the pooled and non-pooled cases.