This demo assumes you are familiar with the basics of running an Intrepydd, which is covered in the "Hello, world!" demo, as well as the principle of type specialization.
def sum_elements(xs):
'''Sums all elements in a 3-D array `xs`.'''
assert len(xs.shape) == 3
s = 0.0
for i in range(xs.shape[0]):
for j in range(xs.shape[1]):
for k in range(xs.shape[2]):
s += xs[i, j, k]
return s
Here is a function to test this function. We'll reuse this function later to test an Intrepydd version.
def test_code(fun=sum_elements):
from numpy import arange
xs = arange(10000000).reshape(1000,100,100).astype('double')
return fun(xs)
test_code()
To help find bottlenecks, you can use any of Python's standard tools for timing or profiling. Here, we show an example of using the line_profiler
, which gives you a line-by-line breakdown of where time is spent. Its use in Jupyter requires a magic command to load the module on first use:
%load_ext line_profiler
Once loaded, you can then use the %lprun
magic to invoke the profiler on any code statement. For example, let's apply it to the tester function (test_code()
) again. The additional -f <fun>
argument tells the profiler to only consider the statements, line-by-line, in the body of the specified function <fun>
. (You can supply the -f
argument multiple times to record these data for different function bodies.)
%lprun -f test_code test_code()
Let's apply the technique of type specialization to speed up this code. It requires minimal changes in this case: adding types to the signature and return value, and replacing Numpy field variable references with their corresponding functions (i.e., xs.shape[0]
with shape(xs, 0)
):
%%writefile demo3.pydd
# demo3.pydd
def sum_elements(xs: Array(double, 3)) -> double:
'''Sums all elements in a 3-D array `xs`. (Intrepydd version)'''
s = 0.0
for i in range(shape(xs, 0)):
for j in range(shape(xs, 1)):
for k in range(shape(xs, 2)):
s += xs[i, j, k]
return s
# eof
Then, compile with Intrepydd:
!pyddc demo3.pydd
And finally, let's load this new module and re-run the tester:
import demo3
test_code(demo3.sum_elements)
If everything went well, you should see the same numerical output as with the original version. Now let's see if we're any faster:
%lprun -f test_code test_code(demo3.sum_elements)
Other timers. Of course, since you are operating in Jupyter, you can use any timing or profiling tool at your disposal. For instance, here is how we can use the built-in %timeit
magic function to programmatically measure the time and report the speedup of the two versions:
baseline_time = %timeit -o test_code()
intrepydd_time = %timeit -o test_code(demo3.sum_elements)
print("Speedup: ~ {:.1f}x".format(baseline_time.best / intrepydd_time.average))
A key first step in enabling higher performance is type specialization. The first way you do that in Intrepydd is by modifying the signatures of your function definitions to include annotations.