Notebook

Content under Creative Commons Attribution license CC-BY 4.0, code under MIT license © 2015 L.A. Barba, G.F. Forsyth, B. Knaepen¶

Relax and hold steady¶

This is the fourth and last notebook of Module 5 ("Relax and hold steady"), dedicated to elliptic PDEs. In the previous notebook, we examined how different algebraic formulations can speed up the iterative solution of the Laplace equation, compared to the simplest (but slowest) Jacobi method. The Gauss-Seidel and successive-over relaxation methods both provide faster algebraic convergence than Jacobi. But there is still room for improvement.

In this lesson, we'll take a look at the very popular conjugate gradient (CG) method.
The CG method solves linear systems with coefficient matrices that are symmetric and positive-definite. It is either used on its own, or in conjunction with multigrid—a technique that we'll explore later on its own (optional) course module.

For a real understanding of the CG method, there is no better option than studying the now-classic monograph by Jonathan Shewchuck: "An introduction to the conjugate gradient method without the agonizing pain" (1994). Here, we try to give you a brief summary to explain the implementation in Python.

Test problem¶

Let's return to the Poisson equation example from Lesson 2.

$\begin{equation} \nabla^2 p = -2\left(\frac{\pi}{2}\right)^2\sin\left( \frac{\pi x}{L_x} \right) \cos\left(\frac{\pi y}{L_y}\right) \end{equation}$

in the domain

$\left\lbrace \begin{align*} 0 &\leq x\leq 1 \\ -0.5 &\leq y \leq 0.5 \end{align*} \right.$

where $L_x = L_y = 1$ and with boundary conditions

$p=0 \text{ at } \left\lbrace \begin{align*} x&=0\\ y&=0\\ y&=-0.5\\ y&=0.5 \end{align*} \right.$

We will solve this equation by assuming an initial state of $p=0$ everywhere, and applying boundary conditions to relax via the Laplacian operator.

Head in the right direction!¶

Recall that in its discretized form, the Poisson equation reads,

$\frac{p_{i+1,j}^{k}-2p_{i,j}^{k}+p_{i-1,j}^{k}}{\Delta x^2}+\frac{p_{i,j+1}^{k}-2 p_{i,j}^{k}+p_{i,j-1}^{k}}{\Delta y^2}=b_{i,j}^{k}$

The left hand side represents a linear combination of the values of $p$ at several grid points and this linear combination has to be equal to the value of the source term, $b$ , on the right hand side.

Now imagine you gather the values $p_{i,j}$ of $p$ at all grid points into a big vector ${\bf p}$ and you do the same for $b$ using the same ordering. Both vectors ${\bf p}$ and ${\bf b}$ contain $N=nx*ny$ values and thus belong to $\mathbb{R}^N$ . The discretized Poisson equation corresponds to the following linear system:

$\begin{equation} A{\bf p}={\bf b}, \end{equation}$

where $A$ is an $N\times N$ matrix. Although we will not directly use the matrix form of the system in the CG algorithm, it is useful to examine the problem this way to understand how the method works.

All iterative methods start with an initial guess, $\mathbf{p}^0$ , and modify it in a way such that we approach the solution. This can be viewed as modifying the vector of discrete $p$ values on the grid by adding another vector, i.e., taking a step of magnitude $\alpha$ in a direction $\mathbf{d}$ , as follows:

$\begin{equation} {\bf p}^{k+1}={\bf p}^k + \alpha {\bf d}^k \end{equation}$

The iterations march towards the solution by taking steps along the direction vectors ${\bf d}^k$ , with the scalar $\alpha$ dictating how big a step to take at each iteration. We could converge faster to the solution if we just knew how to carefully choose the direction vectors and the size of the steps. But how to do that?

The residual¶

One of the tools we use to find the right direction to step to is called the residual. What is the residual? We're glad you asked!

We know that, as the iterations proceed, there will be some error between the calculated value, $p^k_i$ , and the exact solution $p^{exact}_i$ . We may not know what the exact solution is, but we know it's out there. The error is:

$\begin{equation} e^k_i = p^k_i - p^{exact}_i \end{equation}$

Note: We are talking about error at a specific point $i$ , not a measure of error across the entire domain.

What if we recast the Poisson equation in terms of a not-perfectly-relaxed $\bf p^k$ ?

$\begin{equation} A \bf p^k \approx b \end{equation}$

We write this as an approximation because $\bf p^k \neq p$ . To "fix" the equation, we need to add an extra term to account for the difference in the Poisson equation $-$ that extra term is called the residual. We can write out the modified Poisson equation like this:

$\begin{equation} {\bf r^k} + A \bf p^k = b \end{equation}$

The method of steepest descent¶

Before considering the more-complex CG algorithm, it is helpful to introduce a simpler approach called the method of steepest descent. At iteration $0$ , we choose an initial guess. Unless we are immensely lucky, it will not satisfy the Poisson equation and we will have,

$\begin{equation} {\bf b}-A{\bf p}^0={\bf r}^0\ne {\bf 0} \end{equation}$

The vector ${\bf r}^0$ is the initial residual and measures how far we are from satisfying the linear system. We can monitor the residual vector at each iteration, as it gets (hopefully) smaller and smaller:

$\begin{equation} {\bf r}^k={\bf b}-A{\bf p}^k \end{equation}$

We make two choices in the method of steepest descent:

the direction vectors are the residuals ${\bf d}^k = {\bf r}^k$ , and
the length of the step makes the $k+1^{th}$ residual orthogonal to the $k^{th}$ residual.

There are good (not very complicated) reasons to justify these choices and you should read one of the references to understand them. But since we want you to converge to the end of the notebook in a shorter time, please accept them for now.

Choice 2 requires that,

$\begin{align} {\bf r}^{k+1}\cdot {\bf r}^{k} = 0 \nonumber \\ \Leftrightarrow ({\bf b}-A{\bf p}^{k+1}) \cdot {\bf r}^{k} = 0 \nonumber \\ \Leftrightarrow ({\bf b}-A({\bf p}^{k}+\alpha {\bf r}^k)) \cdot {\bf r}^{k} = 0 \nonumber \\ \Leftrightarrow ({\bf r}^k-\alpha A{\bf r}^k) \cdot {\bf r}^{k} = 0 \nonumber \\ \alpha = \frac{{\bf r}^k \cdot {\bf r}^k}{A{\bf r}^k \cdot {\bf r}^k}. \end{align}$

We are now ready to test this algorithm.

To begin, let's import libraries and some helper functions and set up our mesh.

In [1]:

import numpy
from helper import l2_norm, poisson_2d_jacobi, poisson_solution

In [2]:

# Set parameters.
nx = 101  # number of points in the x direction
ny = 101  # number of points in the y direction
xmin, xmax = 0.0, 1.0  # limits in the x direction
ymin, ymax = -0.5, 0.5  # limits in the y direction
Lx = xmax - xmin  # domain length in the x direction
Ly = ymax - ymin  # domain length in the y direction
dx = Lx / (nx - 1)  # grid spacing in the x direction
dy = Ly / (ny - 1)  # grid spacing in the y direction

# Create the gridline locations and the mesh grid.
x = numpy.linspace(xmin, xmax, num=nx)
y = numpy.linspace(ymin, ymax, num=ny)
X, Y = numpy.meshgrid(x, y)

# Create the source term.
b = (-2.0 * (numpy.pi / Lx) * (numpy.pi / Ly) *
     numpy.sin(numpy.pi * X / Lx) *
     numpy.cos(numpy.pi * Y / Ly))

# Set the initial conditions.
p0 = numpy.zeros((ny, nx))

# Compute the analytical solution.
p_exact = poisson_solution(x, y, Lx, Ly)

Time to code steepest descent!¶

Let's quickly review the solution process:

Calculate the residual, $\bf r^k$ , which also serves as the direction vector, $\bf d^k$
Calculate the step size $\alpha$
Update ${\bf p}^{k+1}={\bf p}^k + \alpha {\bf d}^k$

How do we calculate the residual?¶

We have an equation for the residual above:

$\begin{equation} {\bf r}^k={\bf b}-A{\bf p}^k \end{equation}$

Remember that $A$ is just a stand-in for the discrete Laplacian, which taking $\Delta x=\Delta y$ is:

$\begin{equation} \nabla^2 p^k = \frac{-4p^k_{i,j} + \left(p^{k}_{i,j-1} + p^k_{i,j+1} + p^{k}_{i-1,j} + p^k_{i+1,j} \right)}{\Delta x^2} \end{equation}$

What about calculating $\alpha$ ?¶

The calculation of $\alpha$ is relatively straightforward, but does require evaluating the term $A{\bf r^k}$ , but we just wrote the discrete $A$ operator above. You just need to apply that same formula to $\mathbf{r}^k$ .

In [3]:

def poisson_2d_steepest_descent(p0, b, dx, dy,
                                maxiter=20000, rtol=1e-6):
    """
    Solves the 2D Poisson equation on a uniform grid,
    with the same grid spacing in both directions,
    for a given forcing term
    using the method of steepest descent.
    
    The function assumes Dirichlet boundary conditions with value zero.
    The exit criterion of the solver is based on the relative L2-norm
    of the solution difference between two consecutive iterations.

    Parameters
    ----------
    p0 : numpy.ndarray
        The initial solution as a 2D array of floats.
    b : numpy.ndarray
        The forcing term as a 2D array of floats.
    dx : float
        Grid spacing in the x direction.
    dy : float
        Grid spacing in the y direction.
    maxiter : integer, optional
        Maximum number of iterations to perform;
        default: 20000.
    rtol : float, optional
        Relative tolerance for convergence;
        default: 1e-6.

    Returns
    -------
    p : numpy.ndarray
        The solution after relaxation as a 2D array of floats.
    ite : integer
        The number of iterations performed.
    conv : list
        The convergence history as a list of floats.
    """
    def A(p):
        # Apply the Laplacian operator to p.
        return (-4.0 * p[1:-1, 1:-1] +
                p[1:-1, :-2] + p[1:-1, 2:] +
                p[:-2, 1:-1] + p[2:, 1:-1]) / dx**2
    p = p0.copy()
    r = numpy.zeros_like(p)  # initial residual
    Ar = numpy.zeros_like(p)  # to store the mat-vec multiplication
    conv = []  # convergence history
    diff = rtol + 1  # initial difference
    ite = 0  # iteration index
    while diff > rtol and ite < maxiter:
        pk = p.copy()
        # Compute the residual.
        r[1:-1, 1:-1] = b[1:-1, 1:-1] - A(p)
        # Compute the Laplacian of the residual.
        Ar[1:-1, 1:-1] = A(r)
        # Compute the step size.
        alpha = numpy.sum(r * r) / numpy.sum(r * Ar)
        # Update the solution.
        p = pk + alpha * r
        # Dirichlet boundary conditions are automatically enforced.
        # Compute the relative L2-norm of the difference.
        diff = l2_norm(p, pk)
        conv.append(diff)
        ite += 1
    return p, ite, conv

Let's see how it performs on our example problem.

In [4]:

# Compute the solution using the method of steepest descent.
p, ites, conv_sd = poisson_2d_steepest_descent(p0, b, dx, dy,
                                               maxiter=20000,
                                               rtol=1e-10)
print('Method of steepest descent: {} iterations '.format(ites) +
      'to reach a relative difference of {}'.format(conv_sd[-1]))

Method of steepest descent: 2 iterations to reach a relative difference of 1.3307695446303778e-16

In [5]:

# Compute the relative L2-norm of the error.
l2_norm(p, p_exact)

Out[5]:

8.225076220929745e-05

Not bad! it took only two iterations to reach a solution that meets our exit criterion. Although this seems great, the steepest descent algorithm is not too good when used with large systems or more complicated right-hand sides in the Poisson equation (we'll examine this below!). We can get better performance if we take a little more care in selecting the direction vectors, $\bf d^k$ .

The method of conjugate gradients¶

With steepest descent, we know that two successive jumps are orthogonal, but that's about it. There is nothing to prevent the algorithm from making several jumps in the same (or a similar) direction. Imagine you wanted to go from the intersection of 5th Avenue and 23rd Street to the intersection of 9th Avenue and 30th Street. Knowing that each segment has the same computational cost (one iteration), would you follow the red path or the green path?

Figure 1. Do you take the red path or the green path?¶

The method of conjugate gradients reduces the number of jumps by making sure the algorithm never selects the same direction twice. The size of the jumps is now given by:

$\begin{equation} \alpha = \frac{{\bf r}^k \cdot {\bf r}^k}{A{\bf d}^k \cdot {\bf d}^k} \end{equation}$

and the direction vectors by:

$\begin{equation} {\bf d}^{k+1}={\bf r}^{k+1}+\beta{\bf d}^{k} \end{equation}$

where $\beta = \frac{{\bf r}^{k+1} \cdot {\bf r}^{k+1}}{{\bf r}^k \cdot {\bf r}^k}$ .

The search directions are no longer equal to the residuals but are instead a linear combination of the residual and the previous search direction. It turns out that CG converges to the exact solution (up to machine accuracy) in a maximum of $N$ iterations! When one is satisfied with an approximate solution, many fewer steps are needed than with any other method. Again, the derivation of the algorithm is not immensely difficult and can be found in Shewchuk (1994).

Implementing Conjugate Gradients¶

We will again update $\bf p$ according to

$\begin{equation} {\bf p}^{k+1}={\bf p}^k + \alpha {\bf d}^k \end{equation}$

but use the modified equations above to calculate $\alpha$ and ${\bf d}^k$ .

You may have noticed that $\beta$ depends on both ${\bf r}^{k+1}$ and ${\bf r}^k$ and that makes the calculation of ${\bf d}^0$ a little bit tricky. Or impossible (using the formula above). Instead we set ${\bf d}^0 = {\bf r}^0$ for the first step and then switch for all subsequent iterations.

Thus, the full set of steps for the method of conjugate gradients is:

Calculate ${\bf d}^0 = {\bf r}^0$ (just once), then

Calculate $\alpha = \frac{{\bf r}^k \cdot {\bf r}^k}{A{\bf d}^k \cdot {\bf d}^k}$
Update ${\bf p}^{k+1}$
Calculate ${\bf r}^{k+1} = {\bf r}^k - \alpha A {\bf d}^k$ $\ \ \ \$ (see Shewchuk (1994))
Calculate $\beta = \frac{{\bf r}^{k+1} \cdot {\bf r}^{k+1}}{{\bf r}^k \cdot {\bf r}^k}$
Calculate ${\bf d}^{k+1}={\bf r}^{k+1}+\beta{\bf d}^{k}$
Repeat!

In [6]:

def poisson_2d_conjugate_gradient(p0, b, dx, dy,
                                  maxiter=20000, rtol=1e-6):
    """
    Solves the 2D Poisson equation on a uniform grid,
    with the same grid spacing in both directions,
    for a given forcing term
    using the method of conjugate gradients.
    
    The function assumes Dirichlet boundary conditions with value zero.
    The exit criterion of the solver is based on the relative L2-norm
    of the solution difference between two consecutive iterations.

    Parameters
    ----------
    p0 : numpy.ndarray
        The initial solution as a 2D array of floats.
    b : numpy.ndarray
        The forcing term as a 2D array of floats.
    dx : float
        Grid spacing in the x direction.
    dy : float
        Grid spacing in the y direction.
    maxiter : integer, optional
        Maximum number of iterations to perform;
        default: 20000.
    rtol : float, optional
        Relative tolerance for convergence;
        default: 1e-6.

    Returns
    -------
    p : numpy.ndarray
        The solution after relaxation as a 2D array of floats.
    ite : integer
        The number of iterations performed.
    conv : list
        The convergence history as a list of floats.
    """
    def A(p):
        # Apply the Laplacian operator to p.
        return (-4.0 * p[1:-1, 1:-1] +
                p[1:-1, :-2] + p[1:-1, 2:] +
                p[:-2, 1:-1] + p[2:, 1:-1]) / dx**2
    p = p0.copy()
    r = numpy.zeros_like(p)  # initial residual
    Ad = numpy.zeros_like(p)  # to store the mat-vec multiplication
    conv = []  # convergence history
    diff = rtol + 1  # initial difference
    ite = 0  # iteration index
    # Compute the initial residual.
    r[1:-1, 1:-1] = b[1:-1, 1:-1] - A(p)
    # Set the initial search direction to be the residual.
    d = r.copy()
    while diff > rtol and ite < maxiter:
        pk = p.copy()
        rk = r.copy()
        # Compute the Laplacian of the search direction.
        Ad[1:-1, 1:-1] = A(d)
        # Compute the step size.
        alpha = numpy.sum(r * r) / numpy.sum(d * Ad)
        # Update the solution.
        p = pk + alpha * d
        # Update the residual.
        r = rk - alpha * Ad
        # Update the search direction.
        beta = numpy.sum(r * r) / numpy.sum(rk * rk)
        d = r + beta * d
        # Dirichlet boundary conditions are automatically enforced.
        # Compute the relative L2-norm of the difference.
        diff = l2_norm(p, pk)
        conv.append(diff)
        ite += 1
    return p, ite, conv

In [7]:

# Compute the solution using the method of conjugate gradients.
p, ites, conv_cg = poisson_2d_conjugate_gradient(p0, b, dx, dy,
                                                 maxiter=20000,
                                                 rtol=1e-10)
print('Method of conjugate gradients: {} iterations '.format(ites) +
      'to reach a relative difference of {}'.format(conv_cg[-1]))

Method of conjugate gradients: 2 iterations to reach a relative difference of 1.2982770796281907e-16

In [8]:

# Compute the relative L2-norm of the error.
l2_norm(p, p_exact)

Out[8]:

8.225076220929585e-05

The method of conjugate gradients also took two iterations to reach a solution that meets our exit criterion. But let's compare this to the number of iterations needed for the Jacobi iteration:

In [9]:

# Compute the solution using Jacobi relaxation.
p, ites, conv_jacobi = poisson_2d_jacobi(p0, b, dx, dy,
                                         maxiter=40000,
                                         rtol=1e-10)
print('Jacobi relaxation: {} iterations '.format(ites) +
      'to reach a relative difference of {}'.format(conv_jacobi[-1]))

Jacobi relaxation: 31227 iterations to reach a relative difference of 9.997923503623598e-11

For our test problem, we get substantial gains in terms of computational cost using the method of steepest descent or the conjugate gradient method.

References¶

Shewchuk, J. (1994). An Introduction to the Conjugate Gradient Method Without the Agonizing Pain (PDF)

Ilya Kuzovkin, The Concept of Conjugate Gradient Descent in Python

The cell below loads the style of this notebook.¶

In [12]:

from IPython.core.display import HTML
css_file = '../../styles/numericalmoocstyle.css'
HTML(open(css_file, 'r').read())

Out[12]: