This is the fourth and last notebook of Module 5 ("Relax and hold steady"), dedicated to elliptic PDEs. In the previous notebook, we examined how different algebraic formulations can speed up the iterative solution of the Laplace equation, compared to the simplest (but slowest) Jacobi method. The Gauss-Seidel and successive-over relaxation methods both provide faster algebraic convergence than Jacobi. But there is still room for improvement.
In this lesson, we'll take a look at the very popular conjugate gradient (CG) method.
The CG method solves linear systems with coefficient matrices that are symmetric and positive-definite. It is either used on its own, or in conjunction with multigrid—a technique that we'll explore later on its own (optional) course module.
For a real understanding of the CG method, there is no better option than studying the now-classic monograph by Jonathan Shewchuck: "An introduction to the conjugate gradient method without the agonizing pain" (1994). Here, we try to give you a brief summary to explain the implementation in Python.
Let's return to the Poisson equation example from Lesson 2.
∇2p=−2(π2)2sin(πxLx)cos(πyLy)in the domain
{0≤x≤1−0.5≤y≤0.5where Lx=Ly=1 and with boundary conditions
p=0 at {x=0y=0y=−0.5y=0.5We will solve this equation by assuming an initial state of p=0 everywhere, and applying boundary conditions to relax via the Laplacian operator.
Recall that in its discretized form, the Poisson equation reads,
pki+1,j−2pki,j+pki−1,jΔx2+pki,j+1−2pki,j+pki,j−1Δy2=bki,jThe left hand side represents a linear combination of the values of p at several grid points and this linear combination has to be equal to the value of the source term, b, on the right hand side.
Now imagine you gather the values pi,j of p at all grid points into a big vector p and you do the same for b using the same ordering. Both vectors p and b contain N=nx∗ny values and thus belong to RN. The discretized Poisson equation corresponds to the following linear system:
Ap=b,where A is an N×N matrix. Although we will not directly use the matrix form of the system in the CG algorithm, it is useful to examine the problem this way to understand how the method works.
All iterative methods start with an initial guess, p0, and modify it in a way such that we approach the solution. This can be viewed as modifying the vector of discrete p values on the grid by adding another vector, i.e., taking a step of magnitude α in a direction d, as follows:
pk+1=pk+αdkThe iterations march towards the solution by taking steps along the direction vectors dk, with the scalar α dictating how big a step to take at each iteration. We could converge faster to the solution if we just knew how to carefully choose the direction vectors and the size of the steps. But how to do that?
One of the tools we use to find the right direction to step to is called the residual. What is the residual? We're glad you asked!
We know that, as the iterations proceed, there will be some error between the calculated value, pki, and the exact solution pexacti. We may not know what the exact solution is, but we know it's out there. The error is:
eki=pki−pexactiNote: We are talking about error at a specific point i, not a measure of error across the entire domain.
What if we recast the Poisson equation in terms of a not-perfectly-relaxed pk?
Apk≈bWe write this as an approximation because pk≠p. To "fix" the equation, we need to add an extra term to account for the difference in the Poisson equation − that extra term is called the residual. We can write out the modified Poisson equation like this:
rk+Apk=bBefore considering the more-complex CG algorithm, it is helpful to introduce a simpler approach called the method of steepest descent. At iteration 0, we choose an initial guess. Unless we are immensely lucky, it will not satisfy the Poisson equation and we will have,
b−Ap0=r0≠0The vector r0 is the initial residual and measures how far we are from satisfying the linear system. We can monitor the residual vector at each iteration, as it gets (hopefully) smaller and smaller:
rk=b−ApkWe make two choices in the method of steepest descent:
There are good (not very complicated) reasons to justify these choices and you should read one of the references to understand them. But since we want you to converge to the end of the notebook in a shorter time, please accept them for now.
Choice 2 requires that,
rk+1⋅rk=0⇔(b−Apk+1)⋅rk=0⇔(b−A(pk+αrk))⋅rk=0⇔(rk−αArk)⋅rk=0α=rk⋅rkArk⋅rk.We are now ready to test this algorithm.
To begin, let's import libraries and some helper functions and set up our mesh.
import numpy
from helper import l2_norm, poisson_2d_jacobi, poisson_solution
# Set parameters.
nx = 101 # number of points in the x direction
ny = 101 # number of points in the y direction
xmin, xmax = 0.0, 1.0 # limits in the x direction
ymin, ymax = -0.5, 0.5 # limits in the y direction
Lx = xmax - xmin # domain length in the x direction
Ly = ymax - ymin # domain length in the y direction
dx = Lx / (nx - 1) # grid spacing in the x direction
dy = Ly / (ny - 1) # grid spacing in the y direction
# Create the gridline locations and the mesh grid.
x = numpy.linspace(xmin, xmax, num=nx)
y = numpy.linspace(ymin, ymax, num=ny)
X, Y = numpy.meshgrid(x, y)
# Create the source term.
b = (-2.0 * (numpy.pi / Lx) * (numpy.pi / Ly) *
numpy.sin(numpy.pi * X / Lx) *
numpy.cos(numpy.pi * Y / Ly))
# Set the initial conditions.
p0 = numpy.zeros((ny, nx))
# Compute the analytical solution.
p_exact = poisson_solution(x, y, Lx, Ly)
Let's quickly review the solution process:
We have an equation for the residual above:
rk=b−ApkRemember that A is just a stand-in for the discrete Laplacian, which taking Δx=Δy is:
∇2pk=−4pki,j+(pki,j−1+pki,j+1+pki−1,j+pki+1,j)Δx2The calculation of α is relatively straightforward, but does require evaluating the term Ark, but we just wrote the discrete A operator above. You just need to apply that same formula to rk.
def poisson_2d_steepest_descent(p0, b, dx, dy,
maxiter=20000, rtol=1e-6):
"""
Solves the 2D Poisson equation on a uniform grid,
with the same grid spacing in both directions,
for a given forcing term
using the method of steepest descent.
The function assumes Dirichlet boundary conditions with value zero.
The exit criterion of the solver is based on the relative L2-norm
of the solution difference between two consecutive iterations.
Parameters
----------
p0 : numpy.ndarray
The initial solution as a 2D array of floats.
b : numpy.ndarray
The forcing term as a 2D array of floats.
dx : float
Grid spacing in the x direction.
dy : float
Grid spacing in the y direction.
maxiter : integer, optional
Maximum number of iterations to perform;
default: 20000.
rtol : float, optional
Relative tolerance for convergence;
default: 1e-6.
Returns
-------
p : numpy.ndarray
The solution after relaxation as a 2D array of floats.
ite : integer
The number of iterations performed.
conv : list
The convergence history as a list of floats.
"""
def A(p):
# Apply the Laplacian operator to p.
return (-4.0 * p[1:-1, 1:-1] +
p[1:-1, :-2] + p[1:-1, 2:] +
p[:-2, 1:-1] + p[2:, 1:-1]) / dx**2
p = p0.copy()
r = numpy.zeros_like(p) # initial residual
Ar = numpy.zeros_like(p) # to store the mat-vec multiplication
conv = [] # convergence history
diff = rtol + 1 # initial difference
ite = 0 # iteration index
while diff > rtol and ite < maxiter:
pk = p.copy()
# Compute the residual.
r[1:-1, 1:-1] = b[1:-1, 1:-1] - A(p)
# Compute the Laplacian of the residual.
Ar[1:-1, 1:-1] = A(r)
# Compute the step size.
alpha = numpy.sum(r * r) / numpy.sum(r * Ar)
# Update the solution.
p = pk + alpha * r
# Dirichlet boundary conditions are automatically enforced.
# Compute the relative L2-norm of the difference.
diff = l2_norm(p, pk)
conv.append(diff)
ite += 1
return p, ite, conv
Let's see how it performs on our example problem.
# Compute the solution using the method of steepest descent.
p, ites, conv_sd = poisson_2d_steepest_descent(p0, b, dx, dy,
maxiter=20000,
rtol=1e-10)
print('Method of steepest descent: {} iterations '.format(ites) +
'to reach a relative difference of {}'.format(conv_sd[-1]))
Method of steepest descent: 2 iterations to reach a relative difference of 1.3307695446303778e-16
# Compute the relative L2-norm of the error.
l2_norm(p, p_exact)
8.225076220929745e-05
Not bad! it took only two iterations to reach a solution that meets our exit criterion. Although this seems great, the steepest descent algorithm is not too good when used with large systems or more complicated right-hand sides in the Poisson equation (we'll examine this below!). We can get better performance if we take a little more care in selecting the direction vectors, dk.
With steepest descent, we know that two successive jumps are orthogonal, but that's about it. There is nothing to prevent the algorithm from making several jumps in the same (or a similar) direction. Imagine you wanted to go from the intersection of 5th Avenue and 23rd Street to the intersection of 9th Avenue and 30th Street. Knowing that each segment has the same computational cost (one iteration), would you follow the red path or the green path?
The method of conjugate gradients reduces the number of jumps by making sure the algorithm never selects the same direction twice. The size of the jumps is now given by:
α=rk⋅rkAdk⋅dkand the direction vectors by:
dk+1=rk+1+βdkwhere β=rk+1⋅rk+1rk⋅rk.
The search directions are no longer equal to the residuals but are instead a linear combination of the residual and the previous search direction. It turns out that CG converges to the exact solution (up to machine accuracy) in a maximum of N iterations! When one is satisfied with an approximate solution, many fewer steps are needed than with any other method. Again, the derivation of the algorithm is not immensely difficult and can be found in Shewchuk (1994).
We will again update p according to
pk+1=pk+αdkbut use the modified equations above to calculate α and dk.
You may have noticed that β depends on both rk+1 and rk and that makes the calculation of d0 a little bit tricky. Or impossible (using the formula above). Instead we set d0=r0 for the first step and then switch for all subsequent iterations.
Thus, the full set of steps for the method of conjugate gradients is:
Calculate d0=r0 (just once), then
def poisson_2d_conjugate_gradient(p0, b, dx, dy,
maxiter=20000, rtol=1e-6):
"""
Solves the 2D Poisson equation on a uniform grid,
with the same grid spacing in both directions,
for a given forcing term
using the method of conjugate gradients.
The function assumes Dirichlet boundary conditions with value zero.
The exit criterion of the solver is based on the relative L2-norm
of the solution difference between two consecutive iterations.
Parameters
----------
p0 : numpy.ndarray
The initial solution as a 2D array of floats.
b : numpy.ndarray
The forcing term as a 2D array of floats.
dx : float
Grid spacing in the x direction.
dy : float
Grid spacing in the y direction.
maxiter : integer, optional
Maximum number of iterations to perform;
default: 20000.
rtol : float, optional
Relative tolerance for convergence;
default: 1e-6.
Returns
-------
p : numpy.ndarray
The solution after relaxation as a 2D array of floats.
ite : integer
The number of iterations performed.
conv : list
The convergence history as a list of floats.
"""
def A(p):
# Apply the Laplacian operator to p.
return (-4.0 * p[1:-1, 1:-1] +
p[1:-1, :-2] + p[1:-1, 2:] +
p[:-2, 1:-1] + p[2:, 1:-1]) / dx**2
p = p0.copy()
r = numpy.zeros_like(p) # initial residual
Ad = numpy.zeros_like(p) # to store the mat-vec multiplication
conv = [] # convergence history
diff = rtol + 1 # initial difference
ite = 0 # iteration index
# Compute the initial residual.
r[1:-1, 1:-1] = b[1:-1, 1:-1] - A(p)
# Set the initial search direction to be the residual.
d = r.copy()
while diff > rtol and ite < maxiter:
pk = p.copy()
rk = r.copy()
# Compute the Laplacian of the search direction.
Ad[1:-1, 1:-1] = A(d)
# Compute the step size.
alpha = numpy.sum(r * r) / numpy.sum(d * Ad)
# Update the solution.
p = pk + alpha * d
# Update the residual.
r = rk - alpha * Ad
# Update the search direction.
beta = numpy.sum(r * r) / numpy.sum(rk * rk)
d = r + beta * d
# Dirichlet boundary conditions are automatically enforced.
# Compute the relative L2-norm of the difference.
diff = l2_norm(p, pk)
conv.append(diff)
ite += 1
return p, ite, conv
# Compute the solution using the method of conjugate gradients.
p, ites, conv_cg = poisson_2d_conjugate_gradient(p0, b, dx, dy,
maxiter=20000,
rtol=1e-10)
print('Method of conjugate gradients: {} iterations '.format(ites) +
'to reach a relative difference of {}'.format(conv_cg[-1]))
Method of conjugate gradients: 2 iterations to reach a relative difference of 1.2982770796281907e-16
# Compute the relative L2-norm of the error.
l2_norm(p, p_exact)
8.225076220929585e-05
The method of conjugate gradients also took two iterations to reach a solution that meets our exit criterion. But let's compare this to the number of iterations needed for the Jacobi iteration:
# Compute the solution using Jacobi relaxation.
p, ites, conv_jacobi = poisson_2d_jacobi(p0, b, dx, dy,
maxiter=40000,
rtol=1e-10)
print('Jacobi relaxation: {} iterations '.format(ites) +
'to reach a relative difference of {}'.format(conv_jacobi[-1]))
Jacobi relaxation: 31227 iterations to reach a relative difference of 9.997923503623598e-11
For our test problem, we get substantial gains in terms of computational cost using the method of steepest descent or the conjugate gradient method.
The conjugate gradient method really shines when one needs to solve more difficult Poisson problems. To get an insight into this, let's solve the Poisson problem using the same boundary conditions as the previous problem but with the following right-hand side,
b=sin(πxLx)cos(πyLy)+sin(6πxLx)cos(6πyLy)# Modify the source term of the Poisson system.
b = (numpy.sin(numpy.pi * X / Lx) *
numpy.cos(numpy.pi * Y / Ly) +
numpy.sin(6.0 * numpy.pi * X / Lx) *
numpy.cos(6.0 * numpy.pi * Y / Ly))
maxiter, rtol = 40000, 1e-10
p, ites, conv = poisson_2d_jacobi(p0, b, dx, dy,
maxiter=maxiter, rtol=rtol)
print('Jacobi relaxation: {} iterations'.format(ites))
p, ites, conv = poisson_2d_steepest_descent(p0, b, dx, dy,
maxiter=maxiter,
rtol=rtol)
print('Method of steepest descent: {} iterations'.format(ites))
p, ites, conv = poisson_2d_conjugate_gradient(p0, b, dx, dy,
maxiter=maxiter,
rtol=rtol)
print('Method of conjugate gradients: {} iterations'.format(ites))
Jacobi relaxation: 31226 iterations Method of steepest descent: 31591 iterations Method of conjugate gradients: 72 iterations
Now we can really appreciate the marvel of the CG method!
Shewchuk, J. (1994). An Introduction to the Conjugate Gradient Method Without the Agonizing Pain (PDF)
Ilya Kuzovkin, The Concept of Conjugate Gradient Descent in Python
from IPython.core.display import HTML
css_file = '../../styles/numericalmoocstyle.css'
HTML(open(css_file, 'r').read())