Notebook

Content under Creative Commons Attribution license CC-BY 4.0, code under MIT license (c)2014 L.A. Barba, G.F. Forsyth.¶

Relax and hold steady¶

Ready for more relaxing? This is the third lesson of Module 5 of the course, exploring solutions to elliptic PDEs. In Lesson 1 and Lesson 2 of this module we used the Jacobi method (a relaxation scheme) to iteratively find solutions to Laplace and Poisson equations.

And it worked, so why are we still talking about it? Because the Jacobi method is slow, very slow to converge. It might not have seemed that way in the first two notebooks because we were using small grids, but we did need more than 3,000 iterations to reach the exit criterion while solving the Poisson equation on a $41\times 41$ grid.

You can confirm this below: using nx,ny= $128$ on the Laplace problem of Lesson 1, the Jacobi method requires nearly 20,000 iterations before we reach $10^{-8}$ for the L2-norm of the difference between two iterates. That's a lot of iterations!

Now, consider this application: an incompressible Navier-Stokes solver has to ensure that the velocity field is divergence-free at every time step. One of the most common ways to ensure this is to solve a Poisson equation for the pressure field. In fact, the pressure Poisson equation is responsible for the majority of the computational expense of an incompressible Navier-Stokes solver. Imagine having to do 20,000 Jacobi iterations for every time step in a fluid-flow problem with many thousands or perhaps millions of grid points!

The Jacobi method is the slowest of all relaxation schemes, so let's learn how to improve on it. In this lesson, we'll study the Gauss-Seidel method—twice as fast as Jacobi, in theory—and the successive over-relaxation (SOR) method. We also have some neat Python tricks lined up for you to get to the solution even faster. Let's go!

Test problem¶

Let's use the same example problem as in Lesson 1: Laplace's equation with boundary conditions

$\begin{equation} \begin{gathered} p=0 \text{ at } x=0\\ \frac{\partial p}{\partial x} = 0 \text{ at } x = L_x\\ p = 0 \text{ at }y = 0 \\ p = \sin \left( \frac{\frac{3}{2}\pi x}{L_x} \right) \text{ at } y = L_y \end{gathered} \end{equation}$

We import our favorite Python libraries, and also some custom functions that we wrote in Lesson 1, which we have saved in a 'helper' Python file for re-use.

In [1]:

import numpy
from matplotlib import pyplot
%matplotlib inline

In [2]:

# Set the font family and size to use for Matplotlib figures.
pyplot.rcParams['font.family'] = 'serif'
pyplot.rcParams['font.size'] = 16

In [3]:

from helper import laplace_solution, l2_norm, plot_3d

We now have the functions laplace_solution, l2_norm, and plot_3d in our namespace. If you can't remember how they work, just use the Python built-in function help() and take advantage of the docstrings. It's a good habit to always write docstrings in your functions, and now you see why!

In this notebook, we are going to use larger grids than before, to better illustrate the speed increases we achieve with different iterative methods. Let's create a $128\times128$ grid and initialize.

In [4]:

# Set parameters.
nx = 128  # number of points in the x direction
ny = 128  # number of points in the y direction
Lx = 5.0  # domain length in the x direction
Ly = 5.0  # domain length in the y direction
dx = Lx / (nx - 1)  # grid spacing in x direction
dy = Ly / (ny - 1)  # grid spacing in y direction

# Create the gridline locations.
x = numpy.linspace(0.0, Lx, num=nx)
y = numpy.linspace(0.0, Ly, num=ny)

# Set the initial conditions.
p0 = numpy.zeros((ny, nx))
p0[-1, :] = numpy.sin(1.5 * numpy.pi * x / Lx)

We said above that the Jacobi method takes nearly $20,000$ iterations before it satisfies our exit criterion of $10^{-8}$ (L2-norm difference between two consecutive iterations). You'll just have to confirm that now. Have a seat!

In [5]:

def laplace_2d_jacobi(p0, maxiter=20000, rtol=1e-6):
    """
    Solves the 2D Laplace equation on a uniform grid
    with equal grid spacing in both directions
    using Jacobi relaxation method.
    
    The exit criterion of the solver is based on the relative L2-norm
    of the solution difference between two consecutive iterations.
    
    Parameters
    ----------
    p0 : numpy.ndarray
        The initial solution as a 2D array of floats.
    maxiter : integer, optional
        Maximum number of iterations to perform;
        default: 20000.
    rtol : float, optional
        Relative tolerance for convergence;
        default: 1e-6.
    
    Returns
    -------
    p : numpy.ndarray
        The solution after relaxation as a 2D array of floats.
    ite : integer
        The number of iterations performed.
    diff : float
        The final relative L2-norm of the difference.
    """
    p = p0.copy()
    diff = rtol + 1.0  # initial difference
    ite = 0  # iteration index
    while diff > rtol and ite < maxiter:
        pn = p.copy()
        # Update the solution at interior points.
        p[1:-1, 1:-1] = 0.25 * (pn[1:-1, :-2] + pn[1:-1, 2:] +
                                pn[:-2, 1:-1] + pn[2:, 1:-1])
        # Apply 2nd-order Neumann condition (zero-gradient)
        # at the right boundary.
        p[1:-1, -1] = 0.25 * (2.0 * pn[1:-1, -2] +
                              pn[2:, -1] + pn[:-2, -1])
        # Compute the relative L2-norm of the difference.
        diff = l2_norm(p, pn)
        ite += 1
    return p, ite, diff

In [6]:

# Compute the solution using Jacobi relaxation method.
p, ites, diff = laplace_2d_jacobi(p0, maxiter=20000, rtol=1e-8)
print('Jacobi relaxation: {} iterations '.format(ites) +
      'to reach a relative difference of {}'.format(diff))

Jacobi relaxation: 19993 iterations to reach a relative difference of 9.998616841362057e-09

Would we lie to you? $19,993$ iterations before we reach the exit criterion of $10^{-8}$ . Yikes!

We can also time how long the Jacobi method takes using the %%timeit cell-magic. Go make some tea, because this can take a while—the %%timeit magic runs the function a few times and then averages their runtimes to give a more accurate result.

Notes¶

When using %%timeit, the return values of a function (p and iterations in this case) won't be saved.
We document our timings below, but your timings can vary quite a lot, depending on your hardware. In fact, you may not even see the same trends (some recent hardware can play some fancy tricks with optimizations that you have no control over).

With those caveats, let's give it a shot:

In [7]:

%%timeit
laplace_2d_jacobi(p0, maxiter=20000, rtol=1e-8)

7.15 s ± 659 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The printed result above (and others to come later) is from a 2012 MacBook Pro, powered by a 2.9 GHz Intel Core i7. We tried also on more modern machines, and got conflicting results—like the Gauss-Seidel method being slightly slower than Jacobi, even though it required fewer iterations. Don't get too hung up on this: the hardware optimizations applied by more modern CPUs are varied and make a big difference sometimes.

Meanwhile, let's check the overall accuracy of the numerical calculation by comparing it to the analytical solution.

In [8]:

# Compute the analytical solution.
p_exact = laplace_solution(x, y, Lx, Ly)

# Compute the relative L2-norm of the error.
l2_norm(p, p_exact)

Out[8]:

6.173551335288024e-05

That's a pretty small error. Let's assume it is good enough and focus on speeding up the process.

Gauss-Seidel¶

You will recall from Lesson 1 that a single Jacobi iteration is written as:

$\begin{equation} p^{k+1}_{i,j} = \frac{1}{4} \left(p^{k}_{i,j-1} + p^k_{i,j+1} + p^{k}_{i-1,j} + p^k_{i+1,j} \right) \end{equation}$

The Gauss-Seidel method is a simple tweak to this idea: use updated values of the solution as soon as they are available, instead of waiting for the values in the whole grid to be updated.

If you imagine that we progress through the grid points in the order shown by the arrow in Figure 1, then you can see that the updated values $p^{k+1}_{i-1,j}$ and $p^{k+1}_{i,j-1}$ can be used to calculate $p^{k+1}_{i,j}$ .

Figure 1. Assumed order of updates on a grid.¶

The iteration formula for Gauss-Seidel is thus:

$\begin{equation} p^{k+1}_{i,j} = \frac{1}{4} \left(p^{k+1}_{i,j-1} + p^k_{i,j+1} + p^{k+1}_{i-1,j} + p^k_{i+1,j} \right) \end{equation}$

There's now a problem for the Python implementation. You can no longer use NumPy's array operations to evaluate the solution updates. Since Gauss-Seidel requires using values immediately after they're updated, we have to abandon our beloved array operations and return to nested for loops. Ugh.

We don't like it, but if it saves us a bunch of time, then we can manage. But does it?

Here's a function to compute the Gauss-Seidel updates using a double loop.

In [9]:

def laplace_2d_gauss_seidel(p0, maxiter=20000, rtol=1e-6):
    """
    Solves the 2D Laplace equation on a uniform grid
    with equal grid spacing in both directions
    using Gauss-Seidel relaxation method.
    
    The exit criterion of the solver is based on the relative L2-norm
    of the solution difference between two consecutive iterations.
    
    Parameters
    ----------
    p0 : numpy.ndarray
        The initial solution as a 2D array of floats.
    maxiter : integer, optional
        Maximum number of iterations to perform;
        default: 20000.
    rtol : float, optional
        Relative tolerance for convergence;
        default: 1e-6.
    
    Returns
    -------
    p : numpy.ndarray
        The solution after relaxation as a 2D array of floats.
    ite : integer
        The number of iterations performed.
    diff : float
        The final relative L2-norm of the difference.
    """
    ny, nx = p0.shape
    p = p0.copy()
    diff = rtol + 1.0  # initial difference
    ite = 0  # iteration index
    while diff > rtol and ite < maxiter:
        pn = p.copy()
        # Update the solution at interior points.
        for j in range(1, ny - 1):
            for i in range(1, nx - 1):
                p[j, i] = 0.25 * (p[j, i - 1] + p[j, i + 1] +
                                  p[j - 1, i] + p[j + 1, i])
        # Apply 2nd-order Neumann condition (zero-gradient)
        # at the right boundary.
        for j in range(1, ny - 1):
            p[j, -1] = 0.25 * (2.0 * p[j, -2] +
                               p[j - 1, -1] + p[j + 1, -1])
        # Compute the relative L2-norm of the difference.
        diff = l2_norm(p, pn)
        ite += 1
    return p, ite, diff

We would then run this with the following function call:

p, ites, diff = laplace_2d_gauss_seidel(
                  p0, maxiter=20000, rtol=1e-8)

But don't do it. We did it so that you don't have to!

The solution of our test problem with the Gauss-Seidel method required several thousand fewer iterations than the Jacobi method, but it took nearly 10 minutes to run on our machine.

What happened?¶

If you think back to the far off days when you first learned about array operations, you might recall that we discovered that NumPy array operations could drastically improve code performance compared with nested for loops. NumPy operations are written in C and pre-compiled, so they are much faster than vanilla Python.

But the Jacobi method is not algorithmically optimal, giving slow convergence. We want to take advantage of the faster-converging iterative methods, yet unpacking the array operations into nested loops destroys performance. What can we do?

Use Numba!¶

Numba is an open-source optimizing compiler for Python. It works by reading Python functions that you give it, and generating a compiled version for you—also called Just-In-Time (JIT) compilation. You can then use the function at performance levels that are close to what you can get with compiled languages (like C, C++ and fortran).

It can massively speed up performance, especially when dealing with loops. Plus, it's pretty easy to use. Like we overheard at a conference: Numba is a Big Deal..

Caveat¶

We encourage everyone following the course to use the Anaconda Python distribution because it's well put-together and simple to use. If you haven't been using Anaconda, that's fine, but let us strongly suggest that you take the plunge now. Numba is great and easy to use, but it is not easy to install without help. Those of you using Anaconda can install it by running:

conda install numba

If you really don't want to use Anaconda, you will have to compile all of Numba's dependencies.

Intro to Numba¶

Let's dive in! Numba is great and easy to use. We're going to first walk you through a simple example to give you a taste of Numba's abilities.

After installing Numba (see above), we can use it by adding a line to import numba and another to import autojit (more on this in a bit).

In [10]:

import numba
from numba import jit

You tell Numba which functions you want to accelerate by using a Python decorator, a special type of command that tells the Python interpreter to modify a callable object (like a function). For example, let's write a quick function to calculate the $n^{\text{th}}$ number in the Fibonacci sequence:

In [11]:

def fib_it(n):
    a, b = 1, 1
    for i in range(n - 2):
        a, b = b, a + b
    return b

There are several faster ways to program the Fibonacci sequence, but that's not a concern right now (but if you're curious, check them out). Let's use %%timeit and see how long this simple function takes to find the 500,000-th Fibonacci number.

In [12]:

%%timeit
fib_it(500000)

3.19 s ± 172 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Now let's try Numba! Just add the @jit decorator above the function name and let's see what happens!

In [13]:

@jit
def fib_it(n):
    a, b = 1, 1
    for i in range(n - 2):
        a, b = b, a + b
    return b

In [14]:

%%timeit
fib_it(500000)

318 µs ± 7.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Holy cow! In our machine, that's more than 8,000 times faster!

That warning from %%timeit is due to the compilation overhead for Numba. The very first time that it executes the function, it has to compile it, then it caches that code for reuse without extra compiling. That's the 'Just-In-Time' bit. You'll see it disappear if we run %%timeit again.

In [15]:

%%timeit
fib_it(500000)

315 µs ± 4.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

We would agree if you think that this is a rather artificial example, but the speed-up is very impressive indeed. Just adding the one-word decorator!

Running in `nopython` mode¶

Numba is very clever, but it can't optimize everything. When it can't, rather than failing to run, it will fall back to the regular Python, resulting in poor performance again. This can be confusing and frustrating, since you might not know ahead of time which bits of code will speed up and which bits won't.

To avoid this particular annoyance, you can tell Numba to use nopython mode. In this case, your code will simply fail if the "jitted" function can't be optimized. It's simply an option to give you "fast or nothing."

Use nopython mode by adding the following line above the function that you want to JIT-compile:

@jit(nopython=True)

Also, you can check here what NumPy features are supported by Numba.

Numba version check¶

In these examples, we are using the latest (as of publication) version of Numba: 0.39.0. Make sure to upgrade or some of the code examples below may not run.

In [16]:

print(numba.__version__)

0.39.0

Back to Jacobi¶

We want to compare the performance of different iterative methods under the same conditions. Because the Gauss-Seidel method forces us to unpack the array operations into nested loops (which are very slow in Python), we use Numba to get the code to perform well. Thus, we need to write a new Jacobi method using for-loops and Numba (instead of NumPy), so we can make meaningful comparisons.

Let's write a "jitted" Jacobi with loops.

In [17]:

@jit(nopython=True)
def laplace_2d_jacobi(p0, maxiter=20000, rtol=1e-6):
    """
    Solves the 2D Laplace equation on a uniform grid
    with equal grid spacing in both directions
    using Jacobi relaxation method.
    
    The exit criterion of the solver is based on the relative L2-norm
    of the solution difference between two consecutive iterations.
    
    Parameters
    ----------
    p0 : numpy.ndarray
        The initial solution as a 2D array of floats.
    maxiter : integer, optional
        Maximum number of iterations to perform;
        default: 20000.
    rtol : float, optional
        Relative tolerance for convergence;
        default: 1e-6.
    
    Returns
    -------
    p : numpy.ndarray
        The solution after relaxation as a 2D array of floats.
    ite : integer
        The number of iterations performed.
    conv : list
        The convergence history as a list of floats.
    """
    ny, nx = p0.shape
    p = p0.copy()
    conv = []  # convergence history
    diff = rtol + 1.0  # initial difference
    ite = 0  # iteration index
    while diff > rtol and ite < maxiter:
        pn = p.copy()
        # Update the solution at interior points.
        for j in range(1, ny - 1):
            for i in range(1, nx - 1):
                p[j, i] = 0.25 * (pn[j, i - 1] + pn[j, i + 1] +
                                  pn[j - 1, i] + pn[j + 1, i])
        # Apply 2nd-order Neumann condition (zero-gradient)
        # at the right boundary.
        for j in range(1, ny - 1):
            p[j, -1] = 0.25 * (2.0 * pn[j, -2] +
                               pn[j - 1, -1] + pn[j + 1, -1])
        # Compute the relative L2-norm of the difference.
        diff = numpy.sqrt(numpy.sum((p - pn)**2) / numpy.sum(pn**2))
        conv.append(diff)
        ite += 1
    return p, ite, conv

In [18]:

# Compute the solution using Jacobi relaxation method.
p, ites, conv_jacobi = laplace_2d_jacobi(p0,
                                         maxiter=20000, rtol=1e-8)
print('Jacobi relaxation: {} iterations '.format(ites) +
      'to reach a relative difference of {}'.format(conv_jacobi[-1]))

Jacobi relaxation: 19993 iterations to reach a relative difference of 9.998616841562463e-09

In [19]:

%%timeit
laplace_2d_jacobi(p0, maxiter=20000, rtol=1e-8)

1.57 s ± 19.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In our old machine, that's faster than the NumPy version of Jacobi, but on some newer machines it might not be. Don't obsess over this: there is much hardware black magic that we cannot control.

Remember that NumPy is a highly optimized library. The fact that we can get competitive execution times with this JIT-compiled code is kind of amazing. Plus(!) now we get to try out those techniques that aren't possible with NumPy array operations.

Note¶

We're also saving the history of the L2-norm of the difference between consecutive iterations. We'll take a look at that once we have a few more methods to compare.

Challenge task¶

It is possible to get a good estimate of the number of iterations needed by the Jacobi method to reduce the initial error by a factor $10^{-m}$ , for given $m$ . The formula depends on the largest eigenvalue of the coefficient matrix, which is known for the discrete Poisson problem on a square domain. See Parviz Moin, "Fundamentals of Engineering Numerical Analysis" (2nd ed., pp.141–143).

Find the estimated number of iterations to reduce the initial error by $10^{-8}$ when using the grids listed below, in the section on grid convergence, with $11$ , $21$ , $41$ and $81$ grid points on each coordinate axis.

Back to Gauss-Seidel¶

If you recall, the reason we got into this Numba sidetrack was to try out Gauss-Seidel and compare the performance with Jacobi. Recall from above that the formula for Gauss-Seidel is as follows:

$\begin{equation} p^{k+1}_{i,j} = \frac{1}{4} \left(p^{k+1}_{i,j-1} + p^k_{i,j+1} + p^{k+1}_{i-1,j} + p^k_{i+1,j} \right) \end{equation}$

We only need to slightly tweak the Jacobi function to get one for Gauss-Seidel. Instead of updating p in terms of pn, we just update p using p!

In [20]:

@jit(nopython=True)
def laplace_2d_gauss_seidel(p0, maxiter=20000, rtol=1e-6):
    """
    Solves the 2D Laplace equation on a uniform grid
    with equal grid spacing in both directions
    using Gauss-Seidel relaxation method.
    
    The exit criterion of the solver is based on the relative L2-norm
    of the solution difference between two consecutive iterations.
    
    Parameters
    ----------
    p0 : numpy.ndarray
        The initial solution as a 2D array of floats.
    maxiter : integer, optional
        Maximum number of iterations to perform;
        default: 20000.
    rtol : float, optional
        Relative tolerance for convergence;
        default: 1e-6.
    
    Returns
    -------
    p : numpy.ndarray
        The solution after relaxation as a 2D array of floats.
    ite : integer
        The number of iterations performed.
    conv : list
        The convergence history as a list of floats.
    """
    ny, nx = p0.shape
    p = p0.copy()
    conv = []  # convergence history
    diff = rtol + 1.0  # initial difference
    ite = 0  # iteration index
    while diff > rtol and ite < maxiter:
        pn = p.copy()
        # Update the solution at interior points.
        for j in range(1, ny - 1):
            for i in range(1, nx - 1):
                p[j, i] = 0.25 * (p[j, i - 1] + p[j, i + 1] +
                                  p[j - 1, i] + p[j + 1, i])
        # Apply 2nd-order Neumann condition (zero-gradient)
        # at the right boundary.
        for j in range(1, ny - 1):
            p[j, -1] = 0.25 * (2.0 * p[j, -2] +
                               p[j - 1, -1] + p[j + 1, -1])
        # Compute the relative L2-norm of the difference.
        diff = numpy.sqrt(numpy.sum((p - pn)**2) / numpy.sum(pn**2))
        conv.append(diff)
        ite += 1
    return p, ite, conv

In [21]:

# Compute the solution using Gauss-Seidel relaxation method.
p, ites, conv_gs = laplace_2d_gauss_seidel(p0,
                                           maxiter=20000, rtol=1e-8)
print('Gauss-Seidel relaxation: {} iterations '.format(ites) +
      'to reach a relative difference of {}'.format(conv_gs[-1]))

Gauss-Seidel relaxation: 13939 iterations to reach a relative difference of 9.99763565214552e-09

Cool! Using the most recently updated values of the solution in the Gauss-Seidel method saved 6,000 iterations! Now we can see how much faster than Jacobi this is, because both methods are implemented the same way:

In [22]:

%%timeit
laplace_2d_gauss_seidel(p0, maxiter=20000, rtol=1e-8)

2.41 s ± 113 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

We get no speed-up over the Numba version of Jacobi, and you may see different results. On some of the machines we tried, we could not beat the NumPy version of Jacobi. This can be confusing, and hard to explain without getting into the nitty grity of hardware optimizations.

Don't lose hope! We have another trick up our sleeve!

Successive Over-Relaxation (SOR)¶

Successive over-relaxation is able to improve on the Gauss-Seidel method by using in the update a linear combination of the previous and the current solution, as follows:

$\begin{equation} p^{k+1}_{i,j} = (1 - \omega)p^k_{i,j} + \frac{\omega}{4} \left(p^{k+1}_{i,j-1} + p^k_{i,j+1} + p^{k+1}_{i-1,j} + p^k_{i+1,j} \right) \end{equation}$

The relaxation parameter $\omega$ will determine how much faster SOR will be than Gauss-Seidel. SOR iterations are only stable for $0 < \omega < 2$ . Note that for $\omega = 1$ , SOR reduces to the Gauss-Seidel method.

If $\omega < 1$ , that is technically an "under-relaxation" and it will be slower than Gauss-Seidel.

If $\omega > 1$ , that's the over-relaxation and it should converge faster than Gauss-Seidel.

Let's write a function for SOR iterations of the Laplace equation, using Numba to get high performance.

In [23]:

@jit(nopython=True)
def laplace_2d_sor(p0, omega, maxiter=20000, rtol=1e-6):
    """
    Solves the 2D Laplace equation on a uniform grid
    with equal grid spacing in both directions
    using successive over-relaxation (SOR) method.
    
    The exit criterion of the solver is based on the relative L2-norm
    of the solution difference between two consecutive iterations.
    
    Parameters
    ----------
    p0 : numpy.ndarray
        The initial solution as a 2D array of floats.
    omega : float
        Relaxation parameter.
    maxiter : integer, optional
        Maximum number of iterations to perform;
        default: 20000.
    rtol : float, optional
        Relative tolerance for convergence;
        default: 1e-6.
    
    Returns
    -------
    p : numpy.ndarray
        The solution after relaxation as a 2D array of floats.
    ite : integer
        The number of iterations performed.
    conv : list
        The convergence history as a list of floats.
    """
    ny, nx = p0.shape
    p = p0.copy()
    conv = []  # convergence history
    diff = rtol + 1.0  # initial difference
    ite = 0  # iteration index
    while diff > rtol and ite < maxiter:
        pn = p.copy()
        # Update the solution at interior points.
        for j in range(1, ny - 1):
            for i in range(1, nx - 1):
                p[j, i] = ((1.0 - omega) * p[j, i] +
                           omega * 0.25 *(p[j, i - 1] + p[j, i + 1] +
                                          p[j - 1, i] + p[j + 1, i]))
        # Apply 2nd-order Neumann condition (zero-gradient)
        # at the right boundary.
        for j in range(1, ny - 1):
            p[j, -1] = 0.25 * (2.0 * p[j, -2] +
                               p[j - 1, -1] + p[j + 1, -1])
        # Compute the relative L2-norm of the difference.
        diff = numpy.sqrt(numpy.sum((p - pn)**2) / numpy.sum(pn**2))
        conv.append(diff)
        ite += 1
    return p, ite, conv

That wasn't too bad at all. Let's try this out first with $\omega = 1$ and check that it matches the Gauss-Seidel results from above.

In [24]:

# Compute the solution using SOR method.
omega = 1.0
p, ites, conv_sor = laplace_2d_sor(p0, omega,
                                   maxiter=20000, rtol=1e-8)
print('SOR (omega={}): {} iterations '.format(omega, ites) +
      'to reach a relative difference of {}'.format(conv_sor[-1]))

SOR (omega=1.0): 13939 iterations to reach a relative difference of 9.99763565214552e-09

We have the exact same number of iterations as Gauss-Seidel. That's a good sign that things are working as expected.

Now let's try to over-relax the solution and see what happens. To start, let's try $\omega = 1.5$ .

In [25]:

# Compute the solution using SOR method.
omega = 1.5
p, ites, conv_sor = laplace_2d_sor(p0, omega,
                                   maxiter=20000, rtol=1e-8)
print('SOR (omega={}): {} iterations '.format(omega, ites) +
      'to reach a relative difference of {}'.format(conv_sor[-1]))

SOR (omega=1.5): 7108 iterations to reach a relative difference of 9.991011445834247e-09

Wow! That really did the trick! We dropped from $13,939$ iterations down to $7,108$ . Now we're really cooking! Let's try %%timeit on SOR.

In [26]:

%%timeit
laplace_2d_sor(p0, omega, maxiter=20000, rtol=1e-8)

1.24 s ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Things continue to speed up. But we can do even better!

Tuned SOR¶

Above, we picked $\omega=1.5$ arbitrarily, but we would like to over-relax the solution as much as possible without introducing instability, as that will result in the fewest number of iterations.

For square domains, it turns out that the ideal factor $\omega$ can be computed as a function of the number of nodes in one direction, e.g., nx.

$\begin{equation} \omega \approx \frac{2}{1+\frac{\pi}{nx}} \end{equation}$

This is not some arbitrary formula, but its derivation lies outside the scope of this course. (If you're curious and have some serious math chops, you can check out Reference 3 for more information). For now, let's try it out and see how it works.

In [27]:

# Compute the solution using tuned SOR method.
omega = 2.0 / (1.0 + numpy.pi / nx)
p, ites, conv_opt_sor = laplace_2d_sor(p0, omega,
                                       maxiter=20000, rtol=1e-8)
print('SOR (omega={:.4f}): {} iterations '.format(omega, ites) +
      'to reach a relative difference of {}'.format(conv_opt_sor[-1]))

SOR (omega=1.9521): 1110 iterations to reach a relative difference of 9.964283931955807e-09

Wow! That's very fast. Also, $\omega$ is very close to the upper limit of 2. SOR tends to work fastest when $\omega$ approaches 2, but don't be tempted to push it. Set $\omega = 2$ and the walls will come crumbling down.

Let's see what %%timeit has for us now.

In [28]:

%%timeit
laplace_2d_sor(p0, omega, maxiter=20000, rtol=1e-8)

196 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Regardless of the hardware in which we tried this, the tuned SOR gave big speed-ups, compared to the Jacobi method (whether implemented with NumPy or Numba). Now you know why we told you at the end of Lesson 1 that the Jacobi method is the worst iterative solver and almost never used.

Just to convince ourselves that everything is OK, let's check the error after the $1,110$ iterations of tuned SOR:

In [29]:

# Compute the relative L2-norm of the error.
l2_norm(p, p_exact)

Out[29]:

7.792743355069158e-05

Looking very good, indeed.

We didn't explain it in any detail, but notice the very interesting implication of Equation $(6)$ : the ideal relaxation factor is a function of the grid size.
Also keep in mind that the formula only works for square domains with uniform grids. If your problem has an irregular geometry, you will need to find a good value of $\omega$ by numerical experiments.

Decay of the difference between iterates¶

In the Poisson Equation notebook, we noticed how the norm of the difference between consecutive iterations first dropped quite fast, then settled for a more moderate decay rate. With Gauss-Seidel, SOR and tuned SOR, we reduced the number of iterations required to reach the stopping criterion. Let's see how that reflects on the time history of the difference between consecutive solutions.

In [30]:

# Plot the convergence history for different methods.
pyplot.figure(figsize=(9.0, 4.0))
pyplot.xlabel('Iterations')
pyplot.ylabel('Relative $L_2$-norm\nof the difference')
pyplot.grid()
pyplot.semilogy(conv_jacobi, label='Jacobi')
pyplot.semilogy(conv_gs, label='Gauss-Seidel')
pyplot.semilogy(conv_sor, label='SOR')
pyplot.semilogy(conv_opt_sor, label='Optimized SOR')
pyplot.legend()
pyplot.xlim(0, 20000);

The Jacobi method starts out with very fast convergence, but then it settles into a slower rate. Gauss-Seidel shows a faster rate in the first few thousand iterations, but it seems to be slowing down towards the end. SOR is a lot faster to converge, though, and optimized SOR just plunges down!

References¶

Gonsalves, Richard J. Computational Physics I. State University of New York, Buffalo: (2011): Section 3.1
Moin, Parviz, "Fundamentals of Engineering Numerical Analysis," Cambridge University Press, 2nd edition (2010).
Young, David M. "A bound for the optimum relaxation factor for the successive overrelaxation method." Numerische Mathematik 16.5 (1971): 408-413.

The cell below loads the style of this notebook.¶

In [31]:

from IPython.core.display import HTML
css_file = '../../styles/numericalmoocstyle.css'
HTML(open(css_file, 'r').read())

Out[31]: