Saving SCF results on disk and SCF checkpoints

For longer DFT calculations it is pretty standard to run them on a cluster in advance and to perform postprocessing (band structure calculation, plotting of density, etc.) at a later point and potentially on a different machine.

To support such workflows DFTK offers the two functions save_scfres and load_scfres, which allow to save the data structure returned by self_consistent_field on disk or retrieve it back into memory, respectively. For this purpose DFTK uses the JLD2.jl file format and Julia package. For the moment this process is considered an experimental feature and has a number of caveats, see the warnings below.

!!! warning "Saving scfres is experimental" The load_scfres and save_scfres pair of functions are experimental features. This means:

- The interface of these functions
  as well as the format in which the data is stored on disk can
  change incompatibly in the future. At this point we make no promises ...
- JLD2 is not yet completely matured
  and it is recommended to only use it for short-term storage
  and **not** to archive scientific results.
- If you are using the functions to transfer data between different
  machines ensure that you use the **same version of Julia, JLD2 and DFTK**
  for saving and loading data.

To illustrate the use of the functions in practice we will compute the total energy of the O₂ molecule at PBE level. To get the triplet ground state we use a collinear spin polarisation (see Collinear spin and magnetic systems for details) and a bit of temperature to ease convergence:

In [1]:
using DFTK
using LinearAlgebra
using JLD2

d = 2.079  # oxygen-oxygen bondlength
a = 9.0    # size of the simulation box
lattice = diagm(a * ones(3))
O = ElementPsp(:O, psp=load_psp("hgh/pbe/O-q6.hgh"))
atoms = [O => d / 2a * [[0, 0, 1], [0, 0, -1]]]
magnetic_moments = [O => [1., 1.]]

Ecut  = 10  # Far too small to be converged
model = model_PBE(lattice, atoms, temperature=0.02, smearing=smearing=Smearing.Gaussian(),
                  magnetic_moments=magnetic_moments)
basis = PlaneWaveBasis(model, Ecut; kgrid=[1, 1, 1])

ρspin  = guess_spin_density(basis, magnetic_moments)
scfres = self_consistent_field(basis, tol=1e-2, ρspin=ρspin)
save_scfres("scfres.jld2", scfres);
n     Free energy       Eₙ-Eₙ₋₁     ρout-ρin   Magnet   Diag
---   ---------------   ---------   --------   ------   ----
  1   -27.63449715101         NaN   9.78e-01    0.001    5.0 
  2   -28.49881469316   -8.64e-01   6.72e-01    0.900    6.0 
  3   -28.90872433789   -4.10e-01   1.52e-01    1.271    4.0 
  4   -28.93718136953   -2.85e-02   3.93e-02    1.731    3.0 
  5   -28.93852185210   -1.34e-03   2.85e-02    1.930    2.0 
In [2]:
scfres.energies
Out[2]:
Energy breakdown:
    Kinetic             16.9095608
    AtomicLocal         -58.8024616
    AtomicNonlocal      4.7451786 
    Ewald               -4.8994689
    PspCorrection       0.0044178 
    Hartree             19.5262202
    Xc                  -6.4195199
    Entropy             -0.0024488

    total               -28.938521852096

The scfres.jld2 file could now be transfered to a different computer, Where one could fire up a REPL to inspect the results of the above calculation:

In [3]:
using DFTK
using JLD2
loaded = load_scfres("scfres.jld2")
propertynames(loaded)
Out[3]:
(:ham, :basis, :energies, :converged, :ρ, :ρspin, :eigenvalues, :occupation, :εF, :n_iter, :n_ep_extra, :ψ, :diagonalization, :stage)
In [4]:
loaded.energies
Out[4]:
Energy breakdown:
    Kinetic             16.9095608
    AtomicLocal         -58.8024616
    AtomicNonlocal      4.7451786 
    Ewald               -4.8994689
    PspCorrection       0.0044178 
    Hartree             19.5262202
    Xc                  -6.4195199
    Entropy             -0.0024488

    total               -28.938521852096

Since the loaded data contains exactly the same data as the scfres returned by the SCF calculation one could use it to plot a band structure, e.g. plot_bandstructure(load_scfres("scfres.jld2")) directly from the stored data.

Checkpointing of SCF calculations

A related feature, which is very useful especially for longer calculations with DFTK is automatic checkpointing, where the state of the SCF is periodically written to disk. The advantage is that in case the calculation errors or gets aborted due to overrunning the walltime limit one does not need to start from scratch, but can continue the calculation from the last checkpoint.

To enable automatic checkpointing in DFTK one needs to pass the ScfSaveCheckpoints callback to self_consistent_field, for example:

In [5]:
callback = DFTK.ScfSaveCheckpoints()
scfres = self_consistent_field(basis, tol=1e-2, ρspin=ρspin, callback=callback);

Notice that using this callback makes the SCF go silent since the passed callback parameter overwrites the default value (namely DefaultScfCallback()) which exactly gives the familiar printing of the SCF convergence. If you want to have both (printing and checkpointing) you need to chain both callbacks:

In [6]:
callback = DFTK.ScfDefaultCallback()  DFTK.ScfSaveCheckpoints(keep=true)
scfres = self_consistent_field(basis, tol=1e-2, ρspin=ρspin, callback=callback);
n     Free energy       Eₙ-Eₙ₋₁     ρout-ρin   Magnet   Diag
---   ---------------   ---------   --------   ------   ----
  1   -27.63919580617         NaN   9.77e-01    0.001    5.0 
  2   -28.50042954168   -8.61e-01   6.71e-01    0.907    6.0 
  3   -28.90896333350   -4.09e-01   1.52e-01    1.278    4.0 
  4   -28.93721641608   -2.83e-02   3.95e-02    1.736    3.0 
  5   -28.93847821620   -1.26e-03   2.89e-02    1.932    2.0 

For more details on using callbacks with DFTK's self_consistent_field function see Monitoring self-consistent field calculations.

By default checkpoint is saved in the file dftk_scf_checkpoint.jld2, which is deleted automatically once the SCF completes successfully. If one wants to keep the file one needs to specify keep=true as has been done in the ultimate SCF for demonstration purposes: now we can continue the previous calculation from the last checkpoint as if the SCF had been aborted. For this one just loads the checkpoint with load_scfres:

In [7]:
oldstate = load_scfres("dftk_scf_checkpoint.jld2")
scfres   = self_consistent_field(oldstate.basis, ρ=oldstate.ρ, ρspin=oldstate.ρspin,
                                 ψ=oldstate.ψ, tol=1e-3);
n     Free energy       Eₙ-Eₙ₋₁     ρout-ρin   Magnet   Diag
---   ---------------   ---------   --------   ------   ----
  1   -28.93877493955         NaN   2.74e-02    1.985    1.0 
  2   -28.93921061100   -4.36e-04   1.78e-02    1.982    2.0 

!!! note "Availability of load_scfres, save_scfres and ScfSaveCheckpoints" As JLD2 is an optional dependency of DFTK these three functions are only available once one has both imported DFTK and JLD2 (using DFTK and using JLD2).

(Cleanup files generated by this notebook)

In [8]:
rm("dftk_scf_checkpoint.jld2")
rm("scfres.jld2")