Notebook

Part 4: MPI for Python¶

MPI is the Message Passing Interface - a standard used for parallel programming involving communication between separate parallel processes each with their own separate memory allocation. MPI processes have to pass messages between themselves to invoke code execution and share data between with each other.

MPI is commonly used in distributed memory systems, computer systems composed of computer nodes, each with their own separate physical memory, such as high-performance computers or cluster machines. Equally though, MPI will run just as well on your laptop computer and is not tied to any particular architecture.

In this mini tutorial we introduce the mpi4py Python package, which provides Python bindings to the MPI libraries, similar to the corresponding C, C++, and Fortran MPI APIs.

If you are familiar with existing C/C++ MPI, you will find picking up the syntax in the Python bindings very similar. However, don't worry if you are completely new to MPI - we will introduce some of the simpler concepts from a Python perspective to get you started. However, unlike the previous parts of the tutorial, the Python version of mpi4py does little to hide away the underlying parallel library. (In contrast to numba or Cython, which use OpenMP or OpenMP-like libraries, but hide the details away from the user moreso.)

If you'd prefer to learn more about the basics of MPI before starting this tutorial, Lawrence Livermore National Lab have a good introductory tutorial that covers the concepts well: https://computing.llnl.gov/tutorials/mpi/

Following this tutorial¶

Jupyter notebooks do not work particularly well with mpi4py (It is possible to set them up to work together, but that is beyond the scope of this tutorial.) We will demonstrate a few simple examples in this notebook, but to actually run the code it is easier to copy or type out the examples into a Python script and save it as a .py file. You would then execute the python script using the mpi launcher, usually called mpirun on most systems. For example to run a sample mpi4py python script over 4 processes, we would run the python script like so:

mpirun -np 4 python mpi_python_script.py

To get started with any mpi4py script, we would add to our Python code the following import statement:

In [1]:

from mpi4py import MPI

Then a simple test of mpi4py would print "Hello world!" or similar from each MPI process:

In [2]:

from mpi4py import MPI

comm = MPI.COMM_WORLD

print("Hello! I'm rank {} from {} running in total...".format(comm.rank, comm.size))

# Wait for everyone to syncronize here:
comm.Barrier()

Hello! I'm rank 0 from 1 running in total...

So what's going on here?

Firstly we set up a comm object, which is a communicator that lets us get information about and talk to the other processes. (You will see it in most MPI programs.)

In the print function, we make a call to comm.rank, which gets the ID of the current process in the communicator (its 'rank') and then comm.size which gives is the total number of running processes in the communicator object.

At the end we have to call comm.Barrier to synchronise the processes in the communicator.

Unfortunately, if running this in a jupyter notebook, we will only likely see rank 0 and 1 total process, because of the way Python is running inside a single process. To see the effects of multiple MPI processes, we would ideally run the above code as a script from the terminal as outlined above, i.e. mpirun -np 4 python mpi_hello.py

Point-to-Point communication example¶

This example looks at how we would send a Python object from one process to another.

from mpi4py import MPI
import numpy

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    data = {'a': 7, 'b': 3.14}
    comm.send(data, dest=1)
elif rank == 1:
    data = comm.recv(source=0)
    print('On process 1, data is ',data)

As usual, we set up the comm object to begin with. Then we fetch the rank (ID) of each process. If we are on rank 0, we will use this process as the master process and initialise some data here (a Python dictionary object). Then we use comm.send() to send the Python data object to a particular process.

If we are executing the other MPI process (rank 1), we are going to set up our receiving rank. We have to explicity say we want to receive the data from rank 0. We also tell this rank=1 process to print out the dictionary we have received.

To run this, create a separate python script with the above code and run it with:

mpirun -np 2 python point_mpi.py

The output should tell us that we have received the dictionary data on process 1:

On process 1, data is  {'a': 7, 'b': 3.14}

Broadcasting example¶

Broadcasting is another feature in MPI where a copy of some data is sent to every other MPI process (to be used to perform some further calculation on it by each process for example).

Another example that demonstrates the interface with MPI is broadcasting a vector to n MPI processes.

In this example, we are going to take a 1D numpy array and broadcast it from rank 0 to the other ranks (processes).

Note: Perhaps confusingly MPI and NumPy both use the term broadcasting to mean different things.

In [3]:

import numpy as np
from mpi4py import MPI

comm = MPI.COMM_WORLD

print("-"*78)
print(" Running on %d cores" % comm.size)
print("-"*78)

comm.Barrier()

# Prepare a vector of N=5 elements to be broadcasted...
N = 5
if comm.rank == 0:
    A = np.arange(N, dtype=np.float64)    # rank 0 has the proper data
else:
    A = np.empty(N, dtype=np.float64)     # all other ranks just an empty array

# Broadcast array A from rank 0 to everybody
comm.Bcast( [A, MPI.DOUBLE] )
# The list argument contains the array to be broadcast and the corresponding
# MPI data type: MPI.DOUBLE

# Everybody should now have the same...
print("[%02d] %s" % (comm.rank, A))

------------------------------------------------------------------------------
 Running on 1 cores
------------------------------------------------------------------------------
[00] [0. 1. 2. 3. 4.]

Note how if you are running this in a Jupyter notebook, you are only going to see 1 core reported. This is because we are running the python code through a standard Python interpreter, rather than invoking it with a command like mpirun. To see the effects of this with multiple MPI processes running at once, you need to run the code above in a script called broadcast_mpi.py from the command line with:

mpirun -np 4 python broadcast_mpi.py

This command now invokes the python interpreter through the mpirun executable, allowing the mpi4py python library to interact with the MPI API, and run on multiple cores on a multicore CPU.

The output (from the terminal) should now look something like:

------------------------------------------------------------------------------
 Running on 4 cores
------------------------------------------------------------------------------
[00] [0. 1. 2. 3. 4.]
------------------------------------------------------------------------------
 Running on 4 cores
------------------------------------------------------------------------------
[01] [0. 1. 2. 3. 4.]
------------------------------------------------------------------------------
 Running on 4 cores
------------------------------------------------------------------------------
[02] [0. 1. 2. 3. 4.]
------------------------------------------------------------------------------
 Running on 4 cores
------------------------------------------------------------------------------
[03] [0. 1. 2. 3. 4.]

Note how now each process has received a copy of the same array from the broadcasting operation. In a real program, we might then proceed to mainpulate this array or perform calculations on it.

MPI Pi example¶

In the final example, we set up a a parent process which spawns its own children MPI processes that will run a script to calculate a partial value of pi. The parent process then calls a reduction (comm.Reduce) which will sum up all the partial values of pi from each child process, giving the total estimate for pi.

https://en.wikipedia.org/wiki/Leibniz_formula_for_%CF%80

# parent_pi.py
from mpi4py import MPI
import numpy
import sys

comm = MPI.COMM_SELF.Spawn(sys.executable,
                           args=['child_pi.py'],
                           maxprocs=5)

N = numpy.array(100, 'i')
comm.Bcast([N, MPI.INT], root=MPI.ROOT)
PI = numpy.array(0.0, 'd')
comm.Reduce(None, [PI, MPI.DOUBLE],
            op=MPI.SUM, root=MPI.ROOT)
print(PI)

comm.Disconnect()

# child_pi.py
from mpi4py import MPI
import numpy

comm = MPI.Comm.Get_parent()
size = comm.Get_size()
rank = comm.Get_rank()

N = numpy.array(0, dtype='i')
comm.Bcast([N, MPI.INT], root=0)

h = 1.0 / N; s = 0.0
for i in range(rank, N, size):
    x = h * (i + 0.5)
    s += 4.0 / (1.0 + x**2)
PI = numpy.array(s * h, dtype='d')

comm.Reduce([PI, MPI.DOUBLE], None,
            op=MPI.SUM, root=0)

comm.Disconnect()

To run this code, make sure you have created the two scripts with the names parent_pi.py and child_pi.py and placed them in the same folder. Then start mpirun but with only 1 process - the Spawn method will create new MPI processes that run the child_pi.py script.

mpirun -np 1 python parent_pi.py

The output should return a reasonable value for Pi. Hint: you can get a more/less accurate value for pi by changing the value of N in the parent_pi.py part of the solution.

Note how in the examples there are slightly different ways for sending Python objects compared to sending NumPy arrays. Also when using numpy arrays with MPI functions, we have to specific the data-type in the list argument that is passed to the MPI function. e.g.

comm.Bcast([N, MPI.INT], ...

whereas for a Pytohn dictionary object, this would be:

comm.bcast(my_dict, ...

(Note also the lowercase bcast)

Summary and taking it further¶

mpi4py is a Python interface to the MPI library for message-passing parallel programming. It provides an interface to the powerful Message-Passing Interface standard, a parallel programming standard commonly used in distribute memory parallel programming.

In Part 1: Multiprocessing, you may recall we discussed the multiprocessing module, another pytohn module that creates separate processes and can be used to distribute parallel tasks. Multiprocessing is limited to creating OS-level processes on a shared-memory computing environment, and (to my knowledge) is not easy to apply across multi-node cluster computers, something which MPI (and by extension mpi4py) was designed specifically to do.

So if your problem size requires the resources of larger compute clusters, mpi4py may be the more appropriate choice, though the learning curve is inevitably steeper due to requiring knowledge of the MPI standard to get the most out of the Python implementation.

Useful resources for taking it further¶

Introductory MPI texts and tutorials would be just as useful for an overview of MPI in greater depth. The current documentation for mpi4py, although good, does tend to assume some knowledge of the concepts behind message passing and distributed memory programming. Therefore, starting with a basic MPI tutorial may be a good first move. E.g.: http://mpitutorial.com/tutorials/ is a nice set of introductory tutorials.

MPI for Python documentation: https://mpi4py.readthedocs.io/en/stable/tutorial.html

Excellent Python MPI tutorial (goes into more depth) https://nyu-cds.github.io/python-mpi/

Another good tutorial based on the mpi4py documentation: https://rabernat.github.io/research_computing/parallel-programming-with-mpi-for-python.html

In [ ]: