In the Introduction to density-functional theory we concluded that the non-linear eigenvalue problem underlying DFT can be written as the fixed-point problem ρ=D(V(ρ)),
In this chapter we will investigate the convergence properties of density-mixing SCF algorithms, that is damped, preconditioned fixed-point iterations ρn+1=ρn+αP−1(D(V(ρn))−ρn),
Our presentation follows [^HL2021], where more details can be found.
We investigate the convergence properties of damped, preconditioned iterations, in order to understand the choices for the preconditioning stratege P−1 as well as the damping parameter α to be made.
Near the fixed point ρ∗=D(V(ρ∗)) the error en=ρn−ρ∗ is small and we can expand to first order: D(V(ρ∗+en))≃D[V(ρ∗)+V′|ρ∗en]≃D(V(ρ∗))+D′|V(ρ∗)V′|ρ∗en=ρ∗+D′|V(ρ∗)V′|ρ∗en
The derivatives D′ and V′ are again important quantities and are given special symbols:
where for simplicity it has been dropped that these quantities are evaluated at the fixed-point, i.e. at ρ∗ and V(ρ∗), respectively.
The above expansion allows to relate the error between SCF iterations (near the fixed point): en+1=ρn+1−ρ∗=ρn−ρ∗+αP−1[D(V(ρ∗+en))−ρn]≃ρn−ρ∗+αP−1[ρ∗+χ0Ken−ρn]=en−αP−1[1−χ0K]en.
Introducing the dielectric matrix adjoint ε†=[1−χ0K]
In other words: SCF converges⇔eigenvalues of 1−αP−1ε† are between −1 and 1
This implies that the convergence properties of an SCF are related to ε, the dielectric operator, which depends on the dielectric properties of the system under study.
In other words it depends on the conduction and screening properties, i.e. whether the material is an insulator, a semiconductor, a metal etc.
It turns out that for the largest chunk of cases the eigenvalues of ε† are positive. Moreover near a local minimiser ε† always has non-degative spectrum.
To make the SCF converge one can therefore:
Note: If the preconditioner is very bad, the eigenvalues of (P−1ε†) might even be worse than ε†, such that convergence is actually hampered by the preconditioner.
We start our practical investigation of typical SCF algorithms using the most simple case of P=I and α=1, i.e. the fixed-point iterations ρn+1=F(ρn),
DFTK automatically provides a reasonable guess density as well as function to evaluate F(ρ), such that we only need to take care of the iterations themselves. In the language of DFTK this algorithm is written as:
using DFTK
using LinearAlgebra
function fixed_point_iteration(F, ρ0, info0; maxiter, tol=1e-10)
# F: The SCF step function
# ρ0: The initial guess density
# info0: The initial metadata
# maxiter: The maximal number of iterations to be performed
ρ = ρ0
info = info0
for n = 1:maxiter
Fρ, info = F(ρ, info)
# If the change is less than the tolerance, break iteration.
if norm(Fρ - ρ) < tol
break
end
ρ = Fρ
end
# Return some stuff DFTK needs ...
(; fixpoint=ρ, info)
end;
Convergence checks in DFTK
The ad-hoc convergence criterion in the example above is included only for pedagogical purposes. It does not yet include the correct scaling, which depends on the discretization. It is preferred to use the provided DFTK utilities for specifying convergence, that can be shared across different solvers. For the more advanced version, see the tutorial on custom SCF solvers.
To test this algorithm we use the following simple setting, which builds and discretizes a PBE model for an aluminium supercell.
using AtomsBuilder
using PseudoPotentialData
function aluminium_setup(repeat=1; Ecut=13.0, kgrid=[2, 2, 2])
al_supercell = bulk(:Al; cubic=true) * (repeat, 1, 1)
pd_pbe_family = PseudoFamily("dojo.nc.sr.pbe.v0_4_1.standard.upf")
model = model_DFT(al_supercell;
functionals=PBE(), temperature=1e-3, symmetries=false,
pseudopotentials=pd_pbe_family)
PlaneWaveBasis(model; Ecut, kgrid)
end;
Now use this setup together with the fixed_point_iteration
solver above
within an SCF. Note that the damping
and mixing
parameters are needed
here to disable some DFTK defaults we don't want to use just yet.
self_consistent_field(aluminium_setup(1); solver=fixed_point_iteration, damping=1.0,
maxiter=30, mixing=SimpleMixing());
n Energy log10(ΔE) log10(Δρ) Diag Δtime --- --------------- --------- --------- ---- ------ 1 -9.186851634535 -1.09 6.8 7.40s 2 -9.187803424194 -3.02 -2.40 1.0 241ms 3 -9.187984828647 -3.74 -2.68 8.5 474ms 4 -9.187869555984 + -3.94 -2.20 2.0 210ms 5 -9.186929201981 + -3.03 -1.74 3.2 303ms 6 -9.183449872345 + -2.46 -1.41 5.1 415ms 7 -9.176395067738 + -2.15 -1.21 4.9 440ms 8 -9.162810743391 + -1.87 -1.06 5.0 457ms 9 -9.140007516547 + -1.64 -0.93 5.4 803ms 10 -9.103021128748 + -1.43 -0.82 7.1 569ms 11 -9.018617559277 + -1.07 -0.69 8.0 585ms 12 -8.764436074438 + -0.59 -0.51 7.4 598ms 13 -8.415516763926 + -0.46 -0.35 9.4 673ms 14 -7.800420117148 + -0.21 -0.22 10.8 751ms 15 -7.551794032574 + -0.60 -0.15 9.0 776ms 16 -7.001873507578 + -0.26 -0.10 10.9 820ms 17 -7.025585446598 -1.63 -0.07 11.2 857ms 18 -6.773461294544 + -0.60 -0.06 10.4 838ms 19 -6.900061424449 -0.90 -0.06 10.9 1.15s 20 -6.679746406121 + -0.66 -0.05 10.5 582ms 21 -6.763168034150 -1.08 -0.05 11.5 853ms 22 -6.620839154162 + -0.85 -0.04 10.5 809ms 23 -6.651888301532 -1.51 -0.04 11.4 893ms 24 -6.596149165262 + -1.25 -0.04 11.0 858ms 25 -6.625083710254 -1.54 -0.04 11.4 855ms 26 -6.601592239557 + -1.63 -0.04 12.2 886ms 27 -6.617981866340 -1.79 -0.04 11.1 871ms 28 -6.606876320377 + -1.95 -0.04 11.9 872ms 29 -6.615320237043 -2.07 -0.04 11.2 1.15s 30 -6.609815084069 + -2.26 -0.04 12.1 664ms ┌ Warning: SCF not converged. └ @ DFTK ~/work/DFTK.jl/DFTK.jl/src/scf/scf_callbacks.jl:60
As can be observed this algorithm is not very good and in fact even fails to converge albeit we are only looking at a very simple system.
This is a known limitation of this algorithm, which is indeed why in practice one at least includes the damping parameter α (with values usually between 0 and 1). One now iterates as follows: ρn+1=ρn+α(F(ρn)−ρn)
Exercise 1
Modify
fixed_point_iteration
such that it supports this damped fixed-point iteration. In other words implement damping inside your algorithm and not by changing thedamping
parameter of theself_consistent_field
function driving the SCF.
Using your algorithm try different values for $\alpha$ between $0$ and
$1$ and estimate roughly the $\alpha$ which gives fastest convergence.
For which $\alpha$ do you observe no convergence at all?
Before moving on to cases with P≠I we will briefly consider an orthogonal strategy leading to faster SCF convergence, namely acceleration techniques. Our discussion will focus on Anderson acceleration. In this approach one exploits that the update R(ρ)=F(ρ)−ρ is also the residual of the fixed-point problem F(ρ)=ρ, i.e. how far away we are from the fixed-point density. A good next density ρn+1 therefore should be found by minimising an approximation for R(ρn+1). Assuming the SCF was linear in the density (which it is not), a good idea is to find a linear combination of residuals minβi‖∑iβiR(ρi)‖2
This simple idea has been rediscovered a few times in different communities with only minor variations to the theme. This makes Anderson acceleration variably known as e.g. Anderson mixing, Pulay mixing or direct inversion of the iterative subspace (DIIS). The mathematical analysis of such methods is not yet exhaustive, see M. Chupin, M. Dupuy, G. Legendre, É. Séré. Math. Model. Num. Anal. 55 2785 (2021) DOI: 10.1051/m2an/2021069 for a recent paper providing a good review of the state of the art.
A key result has been obtained by Walker and Ni, namely the equivalence of Anderson to the GMRES algorithm for linear problems. Based on this analysis Anderson-accelerated SCF procedures Anderson can be expected to inherit the GMRES convergence properties near a fixed point, resulting in a rate of convergence of r≃1−2√κ.
function anderson_iteration(F, ρ0, info0; maxiter)
# F: The SCF step function
# ρ0: The initial guess density
# info0: The initial metadata
# maxiter: The maximal number of iterations to be performed
info = info0
ρ = ρ0
ρs = []
Rs = []
for n = 1:maxiter
Fρ, info = F(ρ, info)
if info.converged
break
end
Rρ = Fρ - ρ
ρnext = vec(ρ) .+ vec(Rρ)
if !isempty(Rs)
M = hcat(Rs...) .- vec(Rρ)
βs = -(M \ vec(Rρ))
for (iβ, β) in enumerate(βs)
ρnext .+= β .* (ρs[iβ] .- vec(ρ) .+ Rs[iβ] .- vec(Rρ))
end
end
push!(ρs, vec(ρ))
push!(Rs, vec(Rρ))
ρ = reshape(ρnext, size(ρ0)...)
end
# Return some stuff DFTK needs ...
(; fixpoint=ρ, info)
end;
To work with this algorithm we will use DFTK's intrinsic mechanism to choose a damping. The syntax for this is
repeat = 1
self_consistent_field(aluminium_setup(repeat);
solver=anderson_iteration,
damping=0.8, maxiter=40,
mixing=SimpleMixing());
to choose a damping of α=0.8 and run for at most maxiter
iterations.
Exercise 2
Based on this Anderson implementation verify (by making a few experiments) that the algorithm converges for
repeat=1
for any 0<α≤2. You may now use thedamping
parameter for changing the value α used by the SCF. State the number of iterations and runtimes you observe.
Exercise 3
Pick α=0.8 and make the problem harder by increasing
repeat
(e.g.2
,4
,6
,8
). Can you make Anderson fail to converge? What do you notice in terms of the number of iterations and runtimes?
DFTK actually employs a numerically more stable Anderson acceleration implementation by default
if the solver
keyword argument is not provided to self_consistent_field
.
For practical calculations this should be used instead of a custom version.
Anderson allows us to push the boundary for SCF methods, but for larger or more challenging systems it is not fully sufficient. The next ingredient for a stable SCF procedure is based on the insight that the convergence properties of an SCF provably depend on the dielectric properties of materials, which is simulated. Amongst others this is to say that insulators (like glass), semiconductors (like silicon) or metals (like aluminium) have rather differing SCF behaviours. As a result the ideal SCF procedure should be slightly different for each material.
To investigate this further theoretically Recall the definition ε†=1−χ0K. The Hartree-exchange-correlation kernel can be further decomposed into K=vc+Kxc,
For well-behaved systems the smallest eigenvalue of ε† is around 1 while the largest eigenvalue is of order 10 (or less). Due to a number of instabilities in the modelled systems either the smallest eigenvalue can decrease or the largest eigenvalue can increase, thus giving a larker condition number κ and worse convergence. For a detailed discussion, see Section 2 of [^HL2021].
In this discussion we will restrict ourselves to a single source of instabilities, namely the one due to the long range divergence of the Coulomb kernel vc. Indeed, if ˆρ(q) are the Fourier coefficients of the density, then ^(vcρ)(q)=4πˆρ(q)|q|2,
For metals it turns out that χ0 is approximately constant in the long-wavelength limit (q→0), namely the negative density of states D (per volume) at the Fermi level: limq→0χ0(q)≃−D.
In contrast for insulators and semiconductors a good approximation of χ0(q) for small q is −qTσ0q, where σ0 is a material-dependent symmetric positive matrix. Therefore the 1/q2 instability is compensated and treating larger cells is less difficult.
The natural approach to deal with the large-wavelength instabilities mentioned in the previous section, is to develop an approximate dielectric model P≃ε†, which can be used to compensate the large eigenvalues of ε†, i.e. such that P−1ε† keeps a manageable condition number κ and thus a fast SCF convergence.
To simplify the discussion we will employ the so-called random-phase approximation, where we set K≃vc, i.e. ε†=1−χ0vc. This approximatiion is justified for analysing the large-wavelength limit, where vc dominates over K.
The most practical strategy is to directly propose analytical expressions directly for P−1. The rationale is that these are fast to evaluate and their cost can thus be neglected in an SCF. For bulk materials this is feasible building on the justified approximation to ignore lattice-scale details. Mathematically one may (a) take the q→0 limit and (b) model χ0 as a translation-independent operator. Such operators (compare to the Kinetic energy) are diagonal in Fourier space and are fully characterised by their Fourier multiplier χ0(q). Since vc is also translation-independent, this implies that the resulting model for the dielectric operator P≃ε† is translation-independent and its inverse can be directly computed as P−1(q)=1P(q) enabling an analytical computation of the inverse of the preconditioner from a dielectric model.
For metals the observation limq→0χ0(q)≃−D directly leads to the Kerker preconditioner P−1Kerker(q)=(1−4π−D|q|2)−1=|q|2|q|2+4πD.
KerkerMixing
and KerkerDosMixing
(which automatically determines the density of states from the current orbitals and occupations).
For semiconductors and insulators one can identify εr=ε(q=0)=1+4πσ0 (in the case of isotropic σ0), where εr can be interpreted as the macroscopic electronic dielectric constant. From this limit for long wavelengths a number of empirical models for χ0(q) and ε†(q) have been proposed in the literature. A simple, two-parameter model is P−1Dielectric(q)=εr+(εr−1)|q|2k2TF1+(εr−1)|q|2k2TF,
DielectricMixing
.
Below is a sketch of the three models using the appropriate setups for aluminium (a metal), gallium arsenide (a semiconductor, εr=14.9) and silica (an insulator, εr=1.5) for comparison:
using Plots
χ0_metal(q, kTF=1) = -kTF^2 / 4π
χ0_dielectric(q, εr, C₀=1-εr, kTF=1) = C₀*q^2 / (4π * (1 - C₀*q^2/kTF^2))
χ0_GaAs(q) = χ0_dielectric(q, 14.9)
χ0_SiO2(q) = χ0_dielectric(q, 1.5)
ε(χ0, q) = (1 - 4π/q^2 * χ0(q))
p = plot(xlims=(1e-2, 1.5), ylims=(0, 16), xlabel="q", ylabel="ε(q)", lw=4)
plot!(p, x -> ε(χ0_metal, x), label="aluminium (Al)")
plot!(p, x -> ε(χ0_GaAs, x), label="gallium arsenide (GaAs)", ls=:dash)
plot!(p, x -> ε(χ0_SiO2, x), label="silica (SiO₂)", ls=:dashdot)
As expected from this discussion, for insulators P=I (no preconditioner) is sufficient, while for the two other cases, an appropriate preconditioner is needed to ensure good convergence properties for large systems.
Here we show the results for three large bulk systems (40 unit cells) of the three materials:
Aluminium (a metal) | Gallium arsenide (a semiconductor) | Silica (an insulator) |
---|---|---|
![]() |
![]() |
![]() |
While the bulk preconditioning models mentioned above are a good step forward, they have two key disadvantages:
In order to overcome the second point it is important to realise that we need to give up on the translational independence of χ0, i.e. we no longer are able to compute P−1(q) by 1/P(q). Our strategy will therefore be to construct more sophisticated approximations to χ0, denoted by ~χ0. The preconditioned density xn=P−1ρn=(1−~χ0vc)−1 is then obtained by solving (1−~χ0vc)xn=ρn
Note that χ0(r,r′) has unit-cell internal fluctuations, but is overall diagonal dominant
Starting from the Adler-Wiser formula χ0(→r,→r′)=∑n,mfFD(εn)−fFD(εm)εn−εmψn(→r)ψ∗m(→r)ψm(→r′)ψ∗n(→r′)
Note that this approach yields a generic framework, where further terms (capturing other physics of χ0) could be easily employed on top of the −LDOS term.
To conclude this chapter we show some results for mixed systems featuring various combinations of metals, insulators and semiconductors. In the table both the number of iterations as well as the condition number is shown. Cases where the condition number does not more than double as the system size is doubled are coloured.
For the two metal-insulator systems, exemplary convergence curves are shown below:
Aluminium + Vacuum | Aluminium + Silica |
---|---|
![]() |
![]() |
![]() |
![]() |
The LDOS preconditioning strategy is available in DFTK as LdosMixing
. Since it is parameter-free and applicable to a wide range of systems (any mixture of insulator and metals), it is used by default in DFTK.
We return to our aluminium setting produced by aluminium_setup
.
In this case we are dealing with a prototypical
metal, such that KerkerMixing
is indeed appropriate. We will thus employ it as
the preconditioner P in the setting
ρn+1=ρn+αP−1(D(V(ρn))−ρn),
self_consistent_field(basis; damping=0.8, mixing=KerkerMixing());
If you wonder about the use of Anderson acceleration in this context:
It can simply be re-introduced by replacing the previous definition of R by
R(ρ)=P−1(F(ρn)−ρn). Again DFTK does exactly this by default
if no other solver
is passed to self_consistent_field
.
Exercise 4
Try the Anderson-accelerated and
KerkerMixing
-preconditioned setup for different values ofrepeat
inaluminium_setup
and check the number of iterations needed. Other mixings DFTK has to offer areDielectricMixing
(best for semiconductors),SimpleMixing
(which is P=I, i.e. no preconditioner at all, best for insulators) orLdosMixing
(self-adapting, suitable for both metals or insulators or inhomogeneous mixtures). Note thatLdosMixing
is the default in DFTK (i.e. used if themixing
parameter is not supplied toself_consistent_field
. Try these mixings (SimpleMixing
,DielectricMixing
,LdosMixing
andKerkerMixing
) and summarise your findings.
You should notice that choosing a preconditioner matching the material under study aids a fast SCF convergence, but that sometimes being off does not seem to do much harm for our case. For larger values of repeat
(beyond what you can probably effort on your laptop) this is no longer true and one needs to be very careful in selecting the right preconditioner. See for example the investigation in [^HL2021].
[^HL2021]: M. Herbst, A. Levitt. J. Phys.: Condens. Matter 33 085503 (2021) DOI: 10.1088/1361-648x/abcbdb [^HL2022]: M. Herbst, A. Levitt J. Comp. Phys. 459 111127 (2022). DOI 10.1016/j.jcp.2022.111127