Benchmark between Python and Julia

This small Jupyter notebook shows a simple benchmark comparing various implementations in Python and one in Julia of a specific numerical algorithm, the Romberg integration method.

For Python:

  • a recursive implementation,
  • a dynamic programming implementation,
  • also using Pypy instead,
  • (maybe a Numba version of the dynamic programming version)
    • (maybe a Cython version too)

For Julia:

  • a dynamic programming implementation will be enough.

The Romberg method

For mathematical explanations, see the Wikipedia page

We will use scipy.integrate.quad function to compare the result of our manual implementations.

In [1]:
from scipy.integrate import quad

Let try it with this function $f(x)$ on $[a,b]=[1993,2015]$:

$$ f(x) := \frac{12x+1}{1+\cos(x)^2} $$
In [2]:
import math

f = lambda x: (12*x+1)/(1+math.cos(x)**2)
a, b = 1993, 2017
In [3]:
quad(f, a, b)
Out[3]:
(409937.57869797916, 8.482168070998719e-05)

The first value is the numerical value of the integral $\int_{a}^{b} f(x) \mathrm{d}x$ and the second value is an estimate of the numerical error.

$0.4\%$ is not much, alright.

In [4]:
def romberg_rec(f, xmin, xmax, n=8, m=None):
    if m is None:  # not m was considering 0 as None
        m = n
    assert n >= m
    if n == 0 and m == 0:
        return ((xmax - xmin) / 2.0) * (f(xmin) + f(xmax))
    elif m == 0:
        h = (xmax - xmin) / float(2**n)
        N = (2**(n - 1)) + 1
        term = math.fsum(f(xmin + ((2 * k) - 1) * h) for k in range(1, N))
        return (term * h) + (0.5) * romberg_rec(f, xmin, xmax, n - 1, 0)
    else:
        return (1.0 / ((4**m) - 1)) * ((4**m) * romberg_rec(f, xmin, xmax, n, m - 1) - romberg_rec(f, xmin, xmax, n - 1, m - 1))
In [5]:
romberg_rec(f, a, b, n=0)  # really not accurate!
romberg_rec(f, a, b, n=1)  # alreay pretty good!
romberg_rec(f, a, b, n=2)
romberg_rec(f, a, b, n=3)
romberg_rec(f, a, b, n=8)  # Almost the exact value.
romberg_rec(f, a, b, n=10)  # Almost the exact value.
romberg_rec(f, a, b, n=12)  # Almost the exact value.
Out[5]:
404122.6272072803
Out[5]:
372300.32065714896
Out[5]:
373621.9973380798
Out[5]:
373650.4784348298
Out[5]:
409937.6105242601
Out[5]:
409937.57869815244
Out[5]:
409937.57869797904

It converges quite quickly to the "true" value as given by scipy.integrate.quad.

It is not hard to make this function non-recursive, by storing the intermediate results.

In [6]:
def romberg(f, xmin, xmax, n=8, m=None):
    assert xmin <= xmax
    if m is None:
        m = n
    assert n >= m >= 0
    # First value:
    r = {(0, 0): 0.5 * (xmax - xmin) * (f(xmax) + f(xmin))}

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / float(2**i)
        xsamples = [xmin + ((2 * k - 1) * h_i) for k in range(1, 1 + 2**(i - 1))]
        r[(i, 0)] = (0.5 * r[(i - 1, 0)]) + h_i * math.fsum(f(x) for x in xsamples)

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            try:
                r[(i, j)] = (((4**j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / float((4**j) - 1)
            except:
                raise ValueError("romberg() with n = {}, m = {} and i = {}, j = {} was an error.".format(n, m, i, j))

    return r[(n, m)]
In [7]:
romberg(f, a, b, n=0)  # really not accurate!
romberg(f, a, b, n=1)  # alreay pretty good!
romberg(f, a, b, n=2)
romberg(f, a, b, n=3)
romberg(f, a, b, n=8)  # Almost the exact value.
romberg(f, a, b, n=10)  # Almost the exact value.
romberg(f, a, b, n=12)  # Almost the exact value.
Out[7]:
404122.6272072803
Out[7]:
372300.32065714896
Out[7]:
373621.99733807985
Out[7]:
373650.47843482986
Out[7]:
409937.61052426015
Out[7]:
409937.5786981525
Out[7]:
409937.5786979791

It converges quite quickly to the "true" value as given by scipy.integrate.quad.


The Romberg method, better dynamic programming version in Python

Instead of using a dictionary, which gets filled up dynamically (and so, slowly), let us use a numpy arrays, as we already know the size of the array we need ($n+1 \times m+1$).

Note that only half of the array is used, so we could try to use sparse matrices maybe, for triangular matrices? From what I know, it is not worth it, both in term of memory information (if the sparsity measure is $\simeq 1/2$, you don't win anything from LIL or other sparse matrices representation... We could use numpy.tri but this uses a dense array so nope.

In [8]:
import numpy as np

def romberg_better(f, xmin, xmax, n=8, m=None):
    assert xmin <= xmax
    if m is None:
        m = n
    assert n >= m >= 0
    # First value:
    r = np.zeros((n+1, m+1))
    r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / 2.**i
        r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
            f(xmin + ((2 * k - 1) * h_i))
            for k in range(1, 1 + 2**(i - 1))
        )

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)

    return r[n, m]
In [9]:
romberg_better(f, a, b, n=0)  # really not accurate!
romberg_better(f, a, b, n=1)  # alreay pretty good!
romberg_better(f, a, b, n=2)
romberg_better(f, a, b, n=3)
romberg_better(f, a, b, n=8)  # Almost the exact value.
romberg_better(f, a, b, n=10)  # Almost the exact value.
romberg_better(f, a, b, n=12)  # Almost the exact value.
Out[9]:
404122.62720728031
Out[9]:
372300.32065714896
Out[9]:
373621.99733807985
Out[9]:
373650.47843482986
Out[9]:
409937.61052426015
Out[9]:
409937.5786981525
Out[9]:
409937.5786979791

It converges quite quickly to the "true" value as given by scipy.integrate.quad.


First benchmark

In [10]:
%timeit quad(f, a, b)
390 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [11]:
%timeit romberg_rec(f, a, b, n=10)
52.8 ms ± 1.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [12]:
%timeit romberg(f, a, b, n=10)
819 µs ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [13]:
%timeit romberg_better(f, a, b, n=10)
844 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

We already see that the recursive version is much slower than the dynamic programming one!

But there is not much difference between the one using dictionary (romberg()) and the one using a numpy array of a known size (romberg_better()).


Using Pypy for speedup

In [14]:
%%time

import numpy as np
import math
import random

f = lambda x: (12*x+1)/(1+math.cos(x)**2)

# Same code
def romberg(f, xmin, xmax, n=8, m=None):
    assert xmin <= xmax
    if m is None:
        m = n
    assert n >= m >= 0
    # First value:
    r = np.zeros((n+1, m+1))
    r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / 2.**i
        r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
            f(xmin + ((2 * k - 1) * h_i))
            for k in range(1, 1 + 2**(i - 1))
        )

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)

    return r[n, m]

for _ in range(100000):
    a = random.randint(-2000, 2000)
    b = a + random.randint(0, 100)
    romberg(f, a, b)
CPU times: user 24.1 s, sys: 0 ns, total: 24.1 s
Wall time: 24.1 s

And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)

In [15]:
%%time
%%pypy

import math
import random

f = lambda x: (12*x+1)/(1+math.cos(x)**2)

# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
    assert xmin <= xmax
    if m is None:
        m = n
    assert n >= m >= 0
    # First value:
    r = [[0 for _ in range(n+1)] for _ in range(m+1)]
    r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / 2.**i
        r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
            f(xmin + ((2 * k - 1) * h_i))
            for k in range(1, 1 + 2**(i - 1))
        )

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)

    return r[n][m]

for _ in range(100000):
    a = random.randint(-2000, 2000)
    b = a + random.randint(0, 100)
    romberg_pypy(f, a, b)
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 7.09 s

This version uses the improved memoization trick (no dictionary), but uses nested lists and not numpy arrays, I didn't bother to install numpy on my Pypy installation (even thought it should be possible).


Numba version for Python

In [16]:
from numba import jit
In [17]:
@jit
def romberg_numba(f, xmin, xmax, n=8):
    assert xmin <= xmax
    m = n
    # First value:
    r = {(0, 0): 0.5 * (xmax - xmin) * (f(xmax) + f(xmin))}

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / float(2**i)
        xsamples = [xmin + ((2 * k - 1) * h_i) for k in range(1, 1 + 2**(i - 1))]
        r[(i, 0)] = (0.5 * r[(i - 1, 0)]) + h_i * math.fsum(f(x) for x in xsamples)

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            try:
                r[(i, j)] = (((4**j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / float((4**j) - 1)
            except:
                raise ValueError("romberg() with n = {}, m = {} and i = {}, j = {} was an error.".format(n, m, i, j))

    return r[(n, m)]
In [18]:
romberg_numba(f, a, b, n=8)  # Almost the exact value.
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-18-b4426dd668a3> in <module>()
----> 1 romberg_numba(f, a, b, n=8)  # Almost the exact value.

/usr/local/lib/python3.5/dist-packages/numba/dispatcher.py in _compile_for_args(self, *args, **kws)
    284                 argtypes.append(self.typeof_pyval(a))
    285         try:
--> 286             return self.compile(tuple(argtypes))
    287         except errors.TypingError as e:
    288             # Intercept typing error that may be due to an argument

/usr/local/lib/python3.5/dist-packages/numba/dispatcher.py in compile(self, sig)
    530 
    531             self._cache_misses[sig] += 1
--> 532             cres = self._compiler.compile(args, return_type)
    533             self.add_overload(cres)
    534             self._cache.save_overload(sig, cres)

/usr/local/lib/python3.5/dist-packages/numba/dispatcher.py in compile(self, args, return_type)
     79                                       impl,
     80                                       args=args, return_type=return_type,
---> 81                                       flags=flags, locals=self.locals)
     82         # Check typing error if object mode is used
     83         if cres.typing_error is not None and not flags.enable_pyobject:

/usr/local/lib/python3.5/dist-packages/numba/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library)
    691     pipeline = Pipeline(typingctx, targetctx, library,
    692                         args, return_type, flags, locals)
--> 693     return pipeline.compile_extra(func)
    694 
    695 

/usr/local/lib/python3.5/dist-packages/numba/compiler.py in compile_extra(self, func)
    348         self.lifted = ()
    349         self.lifted_from = None
--> 350         return self._compile_bytecode()
    351 
    352     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

/usr/local/lib/python3.5/dist-packages/numba/compiler.py in _compile_bytecode(self)
    656         """
    657         assert self.func_ir is None
--> 658         return self._compile_core()
    659 
    660     def _compile_ir(self):

/usr/local/lib/python3.5/dist-packages/numba/compiler.py in _compile_core(self)
    643 
    644         pm.finalize()
--> 645         res = pm.run(self.status)
    646         if res is not None:
    647             # Early pipeline completion

/usr/local/lib/python3.5/dist-packages/numba/compiler.py in run(self, status)
    234                     # No more fallback pipelines?
    235                     if is_final_pipeline:
--> 236                         raise patched_exception
    237                     # Go to next fallback pipeline
    238                     else:

/usr/local/lib/python3.5/dist-packages/numba/compiler.py in run(self, status)
    226                 try:
    227                     event(stage_name)
--> 228                     stage()
    229                 except _EarlyPipelineCompletion as e:
    230                     return e.result

/usr/local/lib/python3.5/dist-packages/numba/compiler.py in stage_analyze_bytecode(self)
    362         Analyze bytecode and translating to Numba IR
    363         """
--> 364         func_ir = translate_stage(self.func_id, self.bc)
    365         self._set_and_check_ir(func_ir)
    366 

/usr/local/lib/python3.5/dist-packages/numba/compiler.py in translate_stage(func_id, bytecode)
    754 def translate_stage(func_id, bytecode):
    755     interp = interpreter.Interpreter(func_id)
--> 756     return interp.interpret(bytecode)
    757 
    758 

/usr/local/lib/python3.5/dist-packages/numba/interpreter.py in interpret(self, bytecode)
     89         # Control flow analysis
     90         self.cfa = controlflow.ControlFlowAnalysis(bytecode)
---> 91         self.cfa.run()
     92         if config.DUMP_CFG:
     93             self.cfa.dump()

/usr/local/lib/python3.5/dist-packages/numba/controlflow.py in run(self)
    513                 fn(inst)
    514             else:
--> 515                 assert not inst.is_jump, inst
    516 
    517         # Close all blocks

AssertionError: Failed at object (analyzing bytecode)
SETUP_EXCEPT(arg=82, lineno=17)

It fails! Almost as always when trying Numba, it fails cryptically, too bad. I don't want to spend time debugging this.


Naive Julia version

Thanks to this page for a nice and short introduction to Julia.

In [19]:
%%script julia

function f(x)
    (12*x + 1) / (1 + cos(x)^2)
end

a = 1993
b = 2017

function romberg_julia(f, xmin, xmax; n=8)
    m = n
    # First value:
    r = Dict()
    r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in 1 : n
        h_i = (xmax - xmin) / (2^i)
        sum_f_x = 0
        for k in 1 : 2^(i - 1)
            sum_f_x += f(xmin + ((2 * k - 1) * h_i))
        end
        r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
    end

    # All the other values:
    for j in 1 : m
        for i in j : n
            r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
        end
    end

    r[(n, m)]
end


println(romberg_julia(f, a, b, n=0))  # really not accurate!
println(romberg_julia(f, a, b, n=1))  # alreay pretty good!
println(romberg_julia(f, a, b, n=2))
println(romberg_julia(f, a, b, n=3))
println(romberg_julia(f, a, b, n=8))  # Almost the exact value.
println(romberg_julia(f, a, b, n=10))  # Almost the exact value.
println(romberg_julia(f, a, b, n=12))  # Almost the exact value.
f (generic function with 1 method)
1993
2017
romberg_julia (generic function with 1 method)
404122.6272072803
372300.32065714896
373621.99733807985
373650.47843482986
409937.6105242601
409937.57869815256
409937.5786979804

It seems to work well, like the Python implementation. We get the same numerical result:

In [20]:
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
a, b = 1993, 2017

quad(f, a, b)
romberg(f, a, b, n=12)
Out[20]:
(409937.57869797916, 8.482168070998719e-05)
Out[20]:
409937.5786979791

Let try a less naive version using a fixed-size array instead of a dictionary. (as we did before for the Python version)

In [21]:
%%script julia

function f(x)
    (12*x + 1) / (1 + cos(x)^2)
end

a = 1993
b = 2017

function romberg_julia_better(f, xmin, xmax; n=8)
    m = n
    # First value:
    r = zeros((n+1, m+1))  # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
    r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in 1 : n
        h_i = (xmax - xmin) / (2^i)
        sum_f_x = 0
        for k in 1 : 2^(i - 1)
            sum_f_x += f(xmin + ((2 * k - 1) * h_i))
        end
        r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
    end

    # All the other values:
    for j in 1 : m
        for i in j : n
            r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
        end
    end

    r[n + 1, m + 1]
end


println(romberg_julia_better(f, a, b, n=0))  # really not accurate!
println(romberg_julia_better(f, a, b, n=1))  # alreay pretty good!
println(romberg_julia_better(f, a, b, n=2))
println(romberg_julia_better(f, a, b, n=3))
println(romberg_julia_better(f, a, b, n=8))  # Almost the exact value.
println(romberg_julia_better(f, a, b, n=10))  # Almost the exact value.
println(romberg_julia_better(f, a, b, n=12))  # Almost the exact value.
f (generic function with 1 method)
1993
2017
romberg_julia_better (generic function with 1 method)
404122.6272072803
372300.32065714896
373621.99733807985
373650.47843482986
409937.6105242601
409937.57869815256
409937.5786979804

Benchmark between Python, Pypy and Julia

First with Python:

In [22]:
%%time

import numpy as np
import math
import random

f = lambda x: (12*x+1)/(1+math.cos(x)**2)

# Same code
def romberg(f, xmin, xmax, n=8, m=None):
    assert xmin <= xmax
    if m is None:
        m = n
    assert n >= m >= 0
    # First value:
    r = np.zeros((n+1, m+1))
    r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / 2.**i
        r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
            f(xmin + ((2 * k - 1) * h_i))
            for k in range(1, 1 + 2**(i - 1))
        )

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)

    return r[n, m]

for _ in range(100000):
    a = random.randint(-2000, 2000)
    b = a + random.randint(0, 100)
    romberg(f, a, b)
CPU times: user 25.6 s, sys: 0 ns, total: 25.6 s
Wall time: 25.6 s

And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)

In [23]:
%%time
%%pypy

import math
import random

f = lambda x: (12*x+1)/(1+math.cos(x)**2)

# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
    assert xmin <= xmax
    if m is None:
        m = n
    assert n >= m >= 0
    # First value:
    r = [[0 for _ in range(n+1)] for _ in range(m+1)]
    r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / 2.**i
        r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
            f(xmin + ((2 * k - 1) * h_i))
            for k in range(1, 1 + 2**(i - 1))
        )

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)

    return r[n][m]

for _ in range(100000):
    a = random.randint(-2000, 2000)
    b = a + random.randint(0, 100)
    romberg_pypy(f, a, b)
CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 7.12 s

And finally with Julia:

In [24]:
%%time
%%script julia

function f(x)
    (12*x + 1) / (1 + cos(x)^2)
end

function romberg_julia(f, xmin, xmax; n=8)
    m = n
    # First value:
    r = Dict()
    r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in 1 : n
        h_i = (xmax - xmin) / (2^i)
        sum_f_x = 0
        for k in 1 : 2^(i - 1)
            sum_f_x += f(xmin + ((2 * k - 1) * h_i))
        end
        r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
    end

    # All the other values:
    for j in 1 : m
        for i in j : n
            r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
        end
    end

    r[(n, m)]
end

for _ in 1:100000
    a = rand(-2000:2000)
    b = a + rand(0:100)
    romberg_julia(f, a, b)
end
f (generic function with 1 method)
romberg_julia (generic function with 1 method)
CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 8.1 s

On this first test, it doesn't look faster than Pypy... But what if we use the improved version, with an array instead of dictionary?

In [25]:
%%time
%%script julia

function f(x)
    (12*x + 1) / (1 + cos(x)^2)
end

function romberg_julia_better(f, xmin, xmax; n=8)
    m = n
    # First value:
    r = zeros((n+1, m+1))  # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
    r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in 1 : n
        h_i = (xmax - xmin) / (2^i)
        sum_f_x = 0
        for k in 1 : 2^(i - 1)
            sum_f_x += f(xmin + ((2 * k - 1) * h_i))
        end
        r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
    end

    # All the other values:
    for j in 1 : m
        for i in j : n
            r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
        end
    end

    r[n + 1, m + 1]
end

for _ in 1:100000
    a = rand(-2000:2000)
    b = a + rand(0:100)
    romberg_julia_better(f, a, b)
end
f (generic function with 1 method)
romberg_julia_better (generic function with 1 method)
CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 2.7 s

Oh, this time it finally seems faster. Really faster? Yes, about 3 to 4 time faster than Pypy.

Remark also that this last cells compared by using the magic %%pypy and %%script julia, so they both need a warm-up time (opening the pipe, the sub-process, initializing the JIT compiler etc). But it is fair to compare Pypy to Julia this way.


Second benchmark

Let try the same numerical algorithm but with a different integrand function.

$$\int_{0}^{1} \exp(-x^2) \mathrm{d}x \approx 0.842700792949715$$

First with Python:

In [26]:
%%time

import numpy as np
import math
import random

f = lambda x: (2.0 / math.sqrt(math.pi)) * math.exp(-x**2)

# Same code
def romberg(f, xmin, xmax, n=8, m=None):
    assert xmin <= xmax
    if m is None:
        m = n
    assert n >= m >= 0
    # First value:
    r = np.zeros((n+1, m+1))
    r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / 2.**i
        r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
            f(xmin + ((2 * k - 1) * h_i))
            for k in range(1, 1 + 2**(i - 1))
        )

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)

    return r[n, m]

for _ in range(100000):
    a = 0
    b = 1
    romberg(f, a, b)
print(romberg(f, a, b))
0.84270079295
CPU times: user 20.7 s, sys: 8 ms, total: 20.8 s
Wall time: 20.8 s

And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)

In [27]:
%%time
%%pypy

import math
import random

f = lambda x: (2.0 / math.sqrt(math.pi)) * math.exp(-x**2)

# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
    assert xmin <= xmax
    if m is None:
        m = n
    assert n >= m >= 0
    # First value:
    r = [[0 for _ in range(n+1)] for _ in range(m+1)]
    r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in range(1, n + 1):
        h_i = (xmax - xmin) / 2.**i
        r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
            f(xmin + ((2 * k - 1) * h_i))
            for k in range(1, 1 + 2**(i - 1))
        )

    # All the other values:
    for j in range(1, m + 1):
        for i in range(j, n + 1):
            r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)

    return r[n][m]

for _ in range(100000):
    a = 0
    b = 1
    romberg_pypy(f, a, b)
print(romberg_pypy(f, a, b))
0.84270079295
CPU times: user 8 ms, sys: 0 ns, total: 8 ms
Wall time: 6.35 s

And finally with Julia:

In [28]:
%%time
%%script julia

function f(x)
    (2.0 / sqrt(pi)) * exp(-x^2)
end

function romberg_julia(f, xmin, xmax; n=8)
    m = n
    # First value:
    r = Dict()
    r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in 1 : n
        h_i = (xmax - xmin) / (2^i)
        sum_f_x = 0
        for k in 1 : 2^(i - 1)
            sum_f_x += f(xmin + ((2 * k - 1) * h_i))
        end
        r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
    end

    # All the other values:
    for j in 1 : m
        for i in j : n
            r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
        end
    end

    r[(n, m)]
end

for _ in 1:100000
    a = 0
    b = 1
    romberg_julia(f, a, b)
end
println(romberg_julia(f, 0, 1))
f (generic function with 1 method)
romberg_julia (generic function with 1 method)
0.8427007929497148
CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 7.19 s

Still not faster than Pypy... So what is the goal of Julia?

In [29]:
%%time
%%script julia

function f(x)
    (2.0 / sqrt(pi)) * exp(-x^2)
end


function romberg_julia_better(f, xmin, xmax; n=8)
    m = n
    # First value:
    r = zeros((n+1, m+1))  # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
    r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.

    # One side of the triangle:
    for i in 1 : n
        h_i = (xmax - xmin) / (2^i)
        sum_f_x = 0
        for k in 1 : 2^(i - 1)
            sum_f_x += f(xmin + ((2 * k - 1) * h_i))
        end
        r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
    end

    # All the other values:
    for j in 1 : m
        for i in j : n
            r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
        end
    end

    r[n + 1, m + 1]
end

for _ in 1:100000
    a = 0
    b = 1
    romberg_julia_better(f, a, b)
end
println(romberg_julia_better(f, 0, 1))
f (generic function with 1 method)
romberg_julia_better (generic function with 1 method)
0.8427007929497148
CPU times: user 8 ms, sys: 0 ns, total: 8 ms
Wall time: 2.42 s

This is also faster than Pypy, but not that much...


Conclusion

$\implies$ On this (baby) example of a real world numerical algorithms, tested on thousands of random inputs or on thousands time the same input, the speed-up is in favor of Julia, but it doesn't seem impressive enough to make me want to use it (at least for now).

If I have to use 1-based indexing and a slightly different language, just to maybe gain a speed-up of 2 to 3 (compared to Pypy) or even a 10x speed-up compare to naive Python, why bother?

Remark

Of course, this was a baby benchmark, on a small algorithm, and probably wrongly implemented in both Python and Julia.

But still, I am surprised to see that the naive Julia version was slower than the naive Python version executed with Pypy... For the less naive version (without dictionary), the Julia version was about 2 to 3 times faster than the Python version with Pypy.