This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

5.8. Writing massively parallel code for heterogeneous platforms with OpenCL

Let's import PyOpenCL.

In [ ]:
import pyopencl as cl
import numpy as np

This object defines some flags related to memory management on the device.

In [ ]:
mf = cl.mem_flags

We create an OpenCL context and a command queue.

In [ ]:
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

Now, we initialize the NumPy array that will contain the fractal.

In [ ]:
size = 200
iterations = 100
col = np.empty((size, size), dtype=np.int32)

We allocate memory for this array on the GPU.

In [ ]:
col_buf = cl.Buffer(ctx, 
                    mf.WRITE_ONLY,
                    col.nbytes)

We write the OpenCL kernel in a string. The mandelbrot function accepts pointers to the buffers as arguments, as well as the figure size. It updates the col buffer with the escape value in the fractal for each pixel.

In [ ]:
code = """
__kernel void mandelbrot(int size,
                         int iterations,
                         global int *col)
{
    // Get the row and column index of the current thread.
    int i = get_global_id(1);
    int j = get_global_id(0);
    int index = i * size + j;
    
    // Declare and initialize the variables.
    double cx, cy;
    double z0, z1, z0_tmp, z0_2, z1_2;
    cx = -2.0 + (double)j / size * 3;
    cy = -1.5 + (double)i / size * 3;

    // Main loop.
    z0 = z1 = 0.0;
    for (int n = 0; n < iterations; n++)
    {
        z0_2 = z0 * z0;
        z1_2 = z1 * z1;
        if (z0_2 + z1_2 <= 100)
        {
            // Need to update z0 and z1 in parallel.
            z0_tmp = z0_2 - z1_2 + cx;
            z1 = 2 * z0 * z1 + cy;
            z0 = z0_tmp;
            col[index] = n;
        }
        else break;
    }
}
"""

Now, we compile the OpenCL program.

In [ ]:
prg = cl.Program(ctx, code).build()

We call the compiled function, passing the command queue, the grid size, the number of iterations, and the buffer as arguments.

In [ ]:
prg.mandelbrot(queue, col.shape, None, np.int32(size), np.int32(iterations), col_buf).wait()

Once the function has completed, we copy the contents of the OpenCL buffer back to the NumPy array col.

In [ ]:
cl.enqueue_copy(queue, col, col_buf);

Let's display the fractal.

In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(np.log(col), cmap=plt.cm.hot,);
plt.xticks([]);
plt.yticks([]);

Let's evaluate the time taken by this function.

In [ ]:
%%timeit
prg.mandelbrot(queue, col.shape, None, np.int32(size), np.int32(iterations), col_buf).wait()
cl.enqueue_copy(queue, col, col_buf);

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).