# NBVAL_IGNORE_OUTPUT
%load_ext watermark
import os
import qutip
import numpy as np
from numpy import pi
import matplotlib
import matplotlib.pylab as plt
import newtonprop
import snakeviz
import numba
%watermark -v --iversions
While only available interactively, snakeviz
is a very good way to look at profiling data:
%load_ext snakeviz
The benchmarks are based on the Examples, so we rerun the entire Example notebook in this context:
%%capture
%run example.ipynb
tlist = np.linspace(0, 10, 100)
First, we measure low long the propagation with QuTiP takes. This happens mostly in compiled code, so it is pretty fast.
%%timeit
qutip.mesolve(L, rho0, tlist)
%%timeit
propagate_expm(L, rho0, tlist)
The Newton propagator, being a reference implementation, is implemented in pure Python, and is several orders of magnitude slower. To make the comparison fair, we limit the precision to $10^{-8}$, which is roughly the precision of mesolve
.
%%timeit
propagate(L, rho0, tlist, zero_qutip, norm_qutip, inner_qutip, tol=1e-8)
When using lower-level data types, things get considerably faster:
%%timeit
propagate(apply_cythonized_L, rho0_data, tlist, zero_vectorized, norm_vectorized, inner_vectorized, tol=1e-8)
%%timeit
propagate(L_vectorized, rho0_vectorized, tlist, zero_vectorized, norm_vectorized, inner_vectorized, tol=1e-8)
We can profile the how much time is spent in the various routines, comparing mesolve
and different variations of the Newton propagator. See https://docs.python.org/3/library/profile.html#instant-user-s-manual for the meaning of the colums.
First, we look the QuTiP's mesolve
:
stats = %prun -q -r qutip.mesolve(L, rho0, tlist);
We can look at which top-level routines we spent the most time in cumulativly, that is including sub-calls:
stats.sort_stats('cumtime').print_stats(10);
Or, the bottom-level routines where we actually spent time:
stats.sort_stats('tottime').print_stats(10);
This is dominated by the ODE solver and sparse matrix operations
If we're working interactively, we could use snakeviz
to analyze further details:
#%snakeviz qutip.mesolve(L, rho0, tlist)
Next, we look at the Newton propagator operating on high-level qutip objects:
#%snakeviz propagate(L, rho0, tlist, zero_qutip, norm_qutip, inner_qutip, tol=1e-8)
stats = %prun -q -r propagate(L, rho0, tlist, zero_qutip, norm_qutip, inner_qutip, tol=1e-8);
stats.sort_stats('cumtime').print_stats(10);
stats.sort_stats('tottime').print_stats(10);
Lastly, we can look at the more efficient propagation using a cythonized application of the QuTiP objects:
#%snakeviz propagate(apply_cythonized_L, rho0_data, tlist, zero_vectorized, norm_vectorized, inner_vectorized, tol=1e-8)
stats = %prun -q -r propagate(apply_cythonized_L, rho0_data, tlist, zero_vectorized, norm_vectorized, inner_vectorized, tol=1e-8);
stats.sort_stats('cumtime').print_stats(10);
stats.sort_stats('tottime').print_stats(10);
We now see that the (Python-level) arnoldi and step routines of Newton come out as the bottle neck, as the application of the Liouvillian has become efficient level (running at C speed).
Note that the Newton implementation, in particular the _extend_leja
function has been sped up through the use of numba. Without that, _extend_leja
would dominate.
Since the Newton propagator is ultimately still limited by being implemented in Python, it is more fair to measure the runtime in terms of the number of average number of applications of the Liouvillian per time step. This is under the assumption that in an efficient implementation (and for large Hilbert spaces), this is the dominating factor.
The number of applications depends on the chosen precision and on the length of the time step: longer time steps tend do be more efficient, as only then we're in a regime where the fast convergence of the Newton series kicks in.
We construct a dummy Qobj
that counts its own applications to a state:
class CountingQobj(qutip.Qobj):
def __init__(self, *args, **kwargs):
self.counter = 0
super().__init__(*args, **kwargs)
def __mul__(self, other):
if isinstance(other, qutip.Qobj):
self.counter += 1
return super().__mul__(other)
def count_applications_newton(nt, tol=1e-8):
tlist = np.linspace(0, 10, nt)
L_count = CountingQobj(L)
propagate(L_count, rho0, tlist, zero_qutip, norm_qutip, inner_qutip, tol=1e-8)
return L_count.counter / len(tlist)
count_applications_newton(nt=10)
count_applications_newton(nt=100)
To compare this to the average number of applications in mesolve
, we use a trimmed-down version of the mesolve
routine:
import scipy
from qutip.solver import Options
from qutip.superoperator import mat2vec
from qutip.cy.spmatfuncs import cy_ode_rhs
from qutip.mesolve import _generic_ode_solve
from qutip.ui.progressbar import BaseProgressBar
def mesolve(L, rho0, tlist):
opt = Options()
def func(t, rho, data, ind, ptr):
func.counter += 1
return cy_ode_rhs(t, rho, data, ind, ptr)
func.counter = 0
r = scipy.integrate.ode(func)
r.set_f_params(L.data.data, L.data.indices, L.data.indptr)
r.set_integrator('zvode', method=opt.method, order=opt.order,
atol=opt.atol, rtol=opt.rtol, nsteps=opt.nsteps,
first_step=opt.first_step, min_step=opt.min_step,
max_step=opt.max_step)
initial_vector = mat2vec(rho0.full()).ravel('F')
r.set_initial_value(initial_vector, tlist[0])
dt = tlist[1] - tlist[0]
for step in range(len(tlist)-1):
r.integrate(r.t + dt)
return func.counter
def count_applications_mesolve(nt):
tlist = np.linspace(0, 10, nt)
return mesolve(L, rho0, tlist) / len(tlist)
count_applications_mesolve(10)
count_applications_mesolve(100)