Tasks for those who "feel like a pro":

TASK 1

Write the code to enumerate items in the list:

  • items are not ordered
  • items are not unique
  • don't use loops
  • try to be as short as possible (not considering import statements)

Example:

Input

items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']

Output

#something like:
[0, 1, 2, 0, 2, 1]

TASK 2

For each element in a list [0, 1, 2, ..., N] build all possible pairs with other elements of that list.

  • exclude "self-pairing" (e.g. 0-0, 1-1, 2-2)
  • don't use loops
  • try to be as short as possible (not considering import statements)

Example:

Input:

[0, 1, 2, 3] or just 4

Output:

0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3

1, 2, 3, 0, 2, 3, 0, 1, 3, 0, 1, 2

Learning Resources

Online

Learning by doing!

Reading (in the future)

  • Al Sweigart, "Automate the Boring Stuff with Python", https://automatetheboringstuff.com
  • Mark Lutz, "Python Pocket Reference" (250 pages)
  • Mark Lutz, "Learning Python" (1600 pages!)

Programming in python

Writing code

Some anti-patterns

Python basics

Verify your python version by running

python --version

This notebook is written in pyhton 2.

Basic types

variables

a = b = 3

c, d = 4, 5

c, d = d, c

strings

In [458]:
greeting = 'Hello'
guest = "John"
my_string = 'Hello "John"'
named_greeting = 'Hello, {name}'.format(name=guest)

named_greeting2 = '{}, {}'.format(greeting, guest)

print named_greeting
print named_greeting2
Hello, John
Hello, John

data containers

  • list
  • tuple
  • set
  • dictionary

lists

In [459]:
fruit_list = ['apple', 'orange', 'peach', 'mango', 'bananas', 'pineapple']

name_length = [len(fruit) for fruit in fruit_list]
print name_length
[5, 6, 5, 5, 7, 9]
In [460]:
name_with_p = [fruit for fruit in fruit_list if fruit[0]=='p']  #even better: fruit.startswith('p')
In [461]:
numbered_fruits = []
In [462]:
for i, fruit in enumerate(fruit_list):
    numbered_fruits.append('{}.{}'.format(i, fruit))
    
numbered_fruits
Out[462]:
['0.apple', '1.orange', '2.peach', '3.mango', '4.bananas', '5.pineapple']

Indexing starts with zero.

General indexing rule (mind the brackets): [start:stop:step]

In [463]:
numbered_fruits[0] = None
In [464]:
numbered_fruits[1:4]
Out[464]:
['1.orange', '2.peach', '3.mango']
In [465]:
numbered_fruits[1:-1:2]
Out[465]:
['1.orange', '3.mango']
In [466]:
numbered_fruits[::-1]
Out[466]:
['5.pineapple', '4.bananas', '3.mango', '2.peach', '1.orange', None]

tuples

immutable type!

In [467]:
p_fruits = (name_with_p[1], name_with_p[0])
p_fruits[1] = 'mango'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-467-a967184828ef> in <module>()
      1 p_fruits = (name_with_p[1], name_with_p[0])
----> 2 p_fruits[1] = 'mango'

TypeError: 'tuple' object does not support item assignment
In [468]:
single_number_tuple = 3,
single_number_tuple
Out[468]:
(3,)
In [469]:
single_number_tuple + (2,) + (1, 0)
Out[469]:
(3, 2, 1, 0)

sets

Immutable type. Stores only unique elements.

In [470]:
set([0, 1, 2, 1, 1, 1, 3])
Out[470]:
{0, 1, 2, 3}

dictionaries

In [471]:
fruit_list = ['apple', 'orange', 'mango', 'banana', 'pineapple']
quantities = [3, 5, 2, 3, 4]

order_fruits = {fruit: num \
                for fruit, num in zip(fruit_list, quantities)}
order_fruits
Out[471]:
{'apple': 3, 'banana': 3, 'mango': 2, 'orange': 5, 'pineapple': 4}
In [472]:
order_fruits['pineapple'] = 2
order_fruits
Out[472]:
{'apple': 3, 'banana': 3, 'mango': 2, 'orange': 5, 'pineapple': 2}
In [473]:
print order_fruits.keys()
print order_fruits.values()
['orange', 'mango', 'pineapple', 'apple', 'banana']
[5, 2, 2, 3, 3]
In [474]:
for fruit, amount in order_fruits.iteritems():
    print 'Buy {num} {entity}s'.format(num=amount, entity=fruit)
Buy 5 oranges
Buy 2 mangos
Buy 2 pineapples
Buy 3 apples
Buy 3 bananas

Functions

general patterns

In [475]:
def my_func(var1, var2, default_var1=0, default_var2 = False):
    """
    This is a generic example of python a function.
    You can see this string when do call: my_func?
    """
    #do something with vars
    if not default_var2:
        result = var1
    elif default_var1 == 0:
        result = var1
    else:
        result = var1 + var2
    return result

function is just another object (like almost everything in python)

In [476]:
print 'Function {} has the following docstring:\n{}'\
        .format(my_func.func_name, my_func.func_doc)
Function my_func has the following docstring:

    This is a generic example of python a function.
    You can see this string when do call: my_func?
    

functions as arguments

In [477]:
def function_over_function(func, *args, **kwargs):
    function_result = func(*args, **kwargs)
    return function_result
In [478]:
function_over_function(my_func, 3, 5, default_var1=1, default_var2=True)
Out[478]:
8

lambda evaluation

In [479]:
function_over_function(lambda x, y, factor=10: (x+y)*factor, 1, 2, 5)
Out[479]:
15

Don't assign lambda expressions to variables. If you need named instance - create standard function with def

In [480]:
my_simple_func = lambda x: x+1

vs

In [481]:
def my_simple_func(x):
    return x + 1

Numpy - scientific computing

Building matrices and vectors

In [482]:
import numpy as np
In [483]:
matrix_from_list = np.array([[1, 3, 4],
                             [2, 0, 5],
                             [4, 4, 1],
                             [0, 1, 0]])

vector_from_list = np.array([2, 1, 3])

print 'The matrix is\n{matrix}\n\nthe vector is\n{vector}'\
        .format(vector=vector_from_list, matrix=matrix_from_list)
The matrix is
[[1 3 4]
 [2 0 5]
 [4 4 1]
 [0 1 0]]

the vector is
[2 1 3]

Basic manipulations

matvec

In [484]:
matrix_from_list.dot(vector_from_list)
Out[484]:
array([17, 19, 15,  1])

broadcasting

In [485]:
matrix_from_list + vector_from_list
Out[485]:
array([[3, 4, 7],
       [4, 1, 8],
       [6, 5, 4],
       [2, 2, 3]])

forcing dtype

In [486]:
single_precision_vector = np.array([1, 3, 5, 2], dtype=np.float32)
single_precision_vector.dtype
Out[486]:
dtype('float32')

converting dtypes

In [487]:
vector_from_list.dtype
Out[487]:
dtype('int32')
In [488]:
vector_from_list.astype(np.int16)
Out[488]:
array([2, 1, 3], dtype=int16)

shapes (singletons)

mind dimensionality!

In [550]:
row_vector = np.array([[1,2,3]])

print 'New vector {} has dimensionality {}'\
        .format(row_vector, row_vector.shape)

print 'The dot-product is: ', matrix_from_list.dot(row_vector)
New vector [[1 2 3]] has dimensionality (1L, 3L)
The dot-product is: 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-550-286bbd2b9667> in <module>()
      3 print 'New vector {} has dimensionality {}'        .format(row_vector, row_vector.shape)
      4 
----> 5 print 'The dot-product is: ', matrix_from_list.dot(row_vector)

ValueError: shapes (4,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
In [551]:
singleton_vector = row_vector.squeeze()
print 'Squeezed vector {} has shape {}'.format(singleton_vector, singleton_vector.shape)
 Squeezed vector [1 2 3] has shape (3L,)
In [552]:
matrix_from_list.dot(singleton_vector)
Out[552]:
array([19, 17, 15,  2])

adding new dimension

In [553]:
print singleton_vector[:, np.newaxis]
[[1]
 [2]
 [3]]
In [554]:
mat = np.arange(12)
mat.reshape(-1, 4)
mat
Out[554]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
In [555]:
print singleton_vector[:, None]
[[1]
 [2]
 [3]]

Indexing, slicing

In [556]:
vector12 = np.arange(12)
vector12
Out[556]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

Guess what is the output:

vector12[:3]
vector12[-1]
vector12[:-2]
vector12[3:7]
vector12[::2]
vector12[::-1]
In [557]:
matrix43 = vector12.reshape(4, 3)
matrix43
Out[557]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

Guess what is the output:

matrix43[:, 0]
matrix43[-1, :]
matrix43[::2, :]
matrix43[:3, :-1]
matrix43[3:, 1]

Unlike Matlab, numpy arrays are column-major (or C-major) by default, not row-major (or F-major).

View vs Copy

Working with views is more efficient and is a preferred way.

view is returned whenever basic slicing is used

more details at http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

making copy is simple:

In [558]:
matrix43_copy = matrix43[:]

Reshaping

In [559]:
matrix_to_reshape = np.random.randint(10, 99, size=(6, 4))
matrix_to_reshape
Out[559]:
array([[34, 93, 79, 92],
       [39, 80, 92, 78],
       [91, 67, 78, 73],
       [90, 78, 51, 66],
       [86, 29, 60, 30],
       [88, 58, 10, 35]])
In [560]:
reshaped_matrix = matrix_to_reshape.reshape(8, 3)
reshaped_matrix
Out[560]:
array([[34, 93, 79],
       [92, 39, 80],
       [92, 78, 91],
       [67, 78, 73],
       [90, 78, 51],
       [66, 86, 29],
       [60, 30, 88],
       [58, 10, 35]])

reshape always returns view!

In [561]:
reshaped_matrix[-1, 0] = 1
In [562]:
np.set_printoptions(formatter={'all':lambda x: '_{}_'.format(x) if x < 10 else str(x)})
In [563]:
matrix_to_reshape[:]
Out[563]:
array([[34, 93, 79, 92],
       [39, 80, 92, 78],
       [91, 67, 78, 73],
       [90, 78, 51, 66],
       [86, 29, 60, 30],
       [88, _1_, 10, 35]])
In [564]:
np.set_printoptions()

Boolean indexing

In [565]:
idx = matrix43 > 4
matrix43[idx]
Out[565]:
array([ 5,  6,  7,  8,  9, 10, 11])

Useful numpy functions

eye, ones, zeros, diag

Example: Build three-diagonal matrix with -2's on main diagonal and 1's and subdiagonals

Is this code valid?

In [566]:
def three_diagonal(N):
    A = np.zeros((N, N), dtype=np.int)
    for i in range(N):
        A[i, i] = -2
        if i > 0:
            A[i, i-1] = 1
        if i < N-1:
            A[i, i+1] = 1
    return A

print three_diagonal(5)
[[-2  1  0  0  0]
 [ 1 -2  1  0  0]
 [ 0  1 -2  1  0]
 [ 0  0  1 -2  1]
 [ 0  0  0  1 -2]]
In [567]:
def numpy_three_diagonal(N):
    main_diagonal = -2 * np.eye(N)
    
    suddiag_value = np.ones(N-1,)
    lower_subdiag = np.diag(suddiag_value, k=-1)
    upper_subdiag = np.diag(suddiag_value, k=1)
    
    result = main_diagonal + lower_subdiag + upper_subdiag
    return result.astype(np.int)

numpy_three_diagonal(5)
Out[567]:
array([[-2,  1,  0,  0,  0],
       [ 1, -2,  1,  0,  0],
       [ 0,  1, -2,  1,  0],
       [ 0,  0,  1, -2,  1],
       [ 0,  0,  0,  1, -2]])

reducers: sum, mean, max, min, all, any

In [568]:
A = numpy_three_diagonal(5)
A[0, -1] = 5
A[-1, 0] = 3

print A
print A.sum()
print A.min()
print A.max(axis=0)
print A.sum(axis=0)
print A.mean(axis=1)
print (A > 4).any(axis=1)
[[-2  1  0  0  5]
 [ 1 -2  1  0  0]
 [ 0  1 -2  1  0]
 [ 0  0  1 -2  1]
 [ 3  0  0  1 -2]]
6
-2
[3 1 1 1 5]
[2 0 0 0 4]
[ 0.8  0.   0.   0.   0.4]
[ True False False False False]

numpy math functions

In [569]:
print np.pi
3.14159265359
In [570]:
args = np.arange(0, 2.5*np.pi, 0.5*np.pi)
In [571]:
print np.sin(args)
[  0.00000000e+00   1.00000000e+00   1.22464680e-16  -1.00000000e+00
  -2.44929360e-16]
In [572]:
print np.round(np.sin(args), decimals=2)
[ 0.  1.  0. -1.  0.]

managing output

In [573]:
'{}, {:.1%}, {:e}, {:.2f}, {:.0f}'.format(*np.sin(args))
Out[573]:
'0.0, 100.0%, 1.224647e-16, -1.00, -0'
In [574]:
np.set_printoptions(formatter={'all':lambda x: '{:.2f}'.format(x)})
print np.sin(args)
np.set_printoptions()
[0.00 1.00 0.00 -1.00 -0.00]

Meshes

linspace, meshgrid

Let's produce a function $$ f(x, y) = sin(x+y) $$ on some mesh.

In [575]:
linear_index = np.linspace(0, np.pi, 10, endpoint=True)
mesh_x, mesh_y = np.meshgrid(linear_index, linear_index)

values_3D = np.sin(mesh_x + mesh_y)
In [576]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline

fig = plt.figure(figsize=(10,6))
ax = fig.gca(projection='3d')

ax.plot_wireframe(mesh_x, mesh_y, values_3D)
ax.view_init(azim=-45, elev=30)

plt.title('The plot of $f(x, y) = sin(x+y)$')
Out[576]:
<matplotlib.text.Text at 0x4fc264e0>

Scipy - scientific computing 2

Building sparse matrix

In [577]:
import scipy.sparse as sp
In [578]:
def scipy_three_diagonal(N):
    main_diagonal = -2 * np.ones(N, )
    suddiag_values = np.ones(N-1,)
    
    diagonals = [main_diagonal, suddiag_values, suddiag_values]
    # Another option: use sp.eye(N) and add subdiagonals
    offsets = [0, 1, -1]
    
    result = sp.diags(diagonals, offsets, shape=(N, N), format='coo')
    return result

my_sparse_matrix = scipy_three_diagonal(5)

How does scipy represent sparse matrix?

In [579]:
my_sparse_matrix
Out[579]:
<5x5 sparse matrix of type '<type 'numpy.float64'>'
	with 13 stored elements in COOrdinate format>

Sparse matrix stores only non-zero elements (and their indices)

In [580]:
print my_sparse_matrix
  (0, 0)	-2.0
  (1, 1)	-2.0
  (2, 2)	-2.0
  (3, 3)	-2.0
  (4, 4)	-2.0
  (0, 1)	1.0
  (1, 2)	1.0
  (2, 3)	1.0
  (3, 4)	1.0
  (1, 0)	1.0
  (2, 1)	1.0
  (3, 2)	1.0
  (4, 3)	1.0

Restoring full matrix

In [581]:
my_sparse_matrix.toarray()
Out[581]:
array([[-2.,  1.,  0.,  0.,  0.],
       [ 1., -2.,  1.,  0.,  0.],
       [ 0.,  1., -2.,  1.,  0.],
       [ 0.,  0.,  1., -2.,  1.],
       [ 0.,  0.,  0.,  1., -2.]])
In [582]:
my_sparse_matrix.A
Out[582]:
array([[-2.,  1.,  0.,  0.,  0.],
       [ 1., -2.,  1.,  0.,  0.],
       [ 0.,  1., -2.,  1.,  0.],
       [ 0.,  0.,  1., -2.,  1.],
       [ 0.,  0.,  0.,  1., -2.]])
In [583]:
from scipy.linalg import toeplitz, hankel
In [584]:
hankel(xrange(4), [-1, -2, -3, -4])
Out[584]:
array([[ 0,  1,  2,  3],
       [ 1,  2,  3, -2],
       [ 2,  3, -2, -3],
       [ 3, -2, -3, -4]])
In [585]:
toeplitz(xrange(4))
Out[585]:
array([[0, 1, 2, 3],
       [1, 0, 1, 2],
       [2, 1, 0, 1],
       [3, 2, 1, 0]])

Timing - measuring performance

Simplest way to measure time

In [586]:
N = 1000
%timeit three_diagonal(N)
%timeit numpy_three_diagonal(N)
%timeit scipy_three_diagonal(N)
1000 loops, best of 3: 1.53 ms per loop
10 loops, best of 3: 20.6 ms per loop
1000 loops, best of 3: 272 µs per loop

You can also use %%timeit magic to measure run time of the whole cell

In [587]:
%%timeit
N = 1000
calc = three_diagonal(N)
calc = scipy_three_diagonal(N)
del calc
100 loops, best of 3: 2.17 ms per loop

Storing timings in a separate variable

Avoid using time.time() or time.clock() directly as their behaviour's different depending on platform; default_timer makes the best choice for you. It measures wall time though, e.g. not very precise.

In [588]:
from timeit import default_timer as timer
In [589]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
timings = {bench:[] for bench in bench_names}

for n in dims:
    start_time = timer()
    calc = three_diagonal(n)
    time_delta = timer() - start_time
    timings['loop'].append(time_delta)
    
    start_time = timer()
    calc = numpy_three_diagonal(n)
    time_delta = timer() - start_time
    timings['numpy'].append(time_delta)
    
    start_time = timer()
    calc = scipy_three_diagonal(n)
    time_delta = timer() - start_time
    timings['scipy'].append(time_delta)

Let's make the code less redundant

In [590]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
timings = {bench_name: [] for bench_name in bench_names}

def timing_machine(func, *args, **kwargs):
    start_time = timer()
    result = func(*args, **kwargs)
    time_delta = timer() - start_time
    return time_delta

for n in dims:
    timings['loop'].append(timing_machine(three_diagonal, n))
    timings['numpy'].append(timing_machine(numpy_three_diagonal, n))
    timings['scipy'].append(timing_machine(scipy_three_diagonal, n))

timeit with -o parameter

In [612]:
timeit_result = %timeit -q -r 5 -o three_diagonal(10)
print 'Best of {} runs: {:.8f}s'.format(timeit_result.repeat,
                                        timeit_result.best)
Best of 5 runs: 0.00000565s

Our new benchmark procedure

In [592]:
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
bench_funcs = [three_diagonal, numpy_three_diagonal, scipy_three_diagonal]
timings_best = {bench_name: [] for bench_name in bench_names}

for bench_name, bench_func in zip(bench_names, bench_funcs):
    print '\nMeasuring {}'.format(bench_func.func_name)
    for n in dims:
        print n,
        time_result = %timeit -q -o bench_func(n)
        timings_best[bench_name].append(time_result.best)
Measuring three_diagonal
300 1000 3000 10000 
Measuring numpy_three_diagonal
300 1000 3000 10000 
Measuring scipy_three_diagonal
300 1000 3000 10000

Matplotlib - plotting in python

Configuring matplotlib

In [593]:
import matplotlib.pyplot as plt
%matplotlib inline 

%matplotlib inline ensures all graphs are plotted inside your notebook

Global controls

In [594]:
# plt.rcParams.update({'axes.labelsize': 'large'})
plt.rcParams.update({'font.size': 14})

Combined plot

In [595]:
plt.figure(figsize=(10,8))

for bench_name, values in timings_best.iteritems():
    plt.semilogy(dims, values, label=bench_name)
    
plt.legend(loc='best')
plt.title('Benchmarking results with best of timeit', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
Out[595]:
<matplotlib.text.Text at 0x4fc49cc0>
In [596]:
plt.figure(figsize=(10,8))

for bench_name, values in timings.iteritems():
    plt.semilogy(dims, values, label=bench_name)
    
plt.legend(loc='best')
plt.title('Benchmarking results with default_timer', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
Out[596]:
<matplotlib.text.Text at 0x375a2630>

Think, why:

  • "loop" was faster then "numpy"
  • "scipy" is almost constant
  • results for default_timer and "best of timeit" are different

You might want to read the docs:

Remark: starting from python 3.3 it's recommended to use time.perf_counter() and time.process_time() https://docs.python.org/3/library/time.html#time.perf_counter

Also note, that for advanced benchmarking it's better to use profiling tools.

Combined plot "one-liner"

Use plt.plot? to get detailed info on function usage.

Task: given lists of x-values, y-falues and plot format strings, plot all three graphs in one line.

Hint: use list comprehensions

In [597]:
k = len(timings_best)
iter_xyf = [item for sublist in zip([dims]*k,
                                    timings_best.values(),
                                    list('rgb'))\
                                for item in sublist]

plt.figure(figsize=(10, 8))
plt.semilogy(*iter_xyf)

plt.legend(timings_best.keys(), loc=2, frameon=False)
plt.title('Benchmarking results - "one-liner"', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
Out[597]:
<matplotlib.text.Text at 0x2859bfd0>

Even simpler way - also gives you granular control on plot objects

In [598]:
plt.figure(figsize=(10, 8))

figs = [plt.semilogy(dims, values, label=bench_name)\
        for bench_name, values in timings.iteritems()];

ax0, = figs[0]
ax0.set_dashes([5, 10, 20, 10, 5, 10])

ax1, = figs[1]
ax1.set_marker('s')
ax1.set_markerfacecolor('r')

ax2, = figs[2]
ax2.set_linewidth(6)
ax2.set_alpha(0.3)
ax2.set_color('m')

Plot formatting

matplotlib has a number of different options for styling your plot

In [599]:
all_markers = [
'.', # point
',', # pixel
'o', # circle
'v', # triangle down
'^', # triangle up
'<', # triangle_left
'>', # triangle_right
'1', # tri_down
'2', # tri_up
'3', # tri_left
'4', # tri_right
'8', # octagon
's', # square
'p', # pentagon
'*', # star
'h', # hexagon1
'H', # hexagon2
'+', # plus
'x', # x
'D', # diamond
'd', # thin_diamond
'|', # vline
]

all_linestyles = [
'-',  # solid line style
'--', # dashed line style
'-.', # dash-dot line style
':',  # dotted line style
'None'# no line
]

all_colors = [
'b', # blue
'g', # green
'r', # red
'c', # cyan
'm', # magenta
'y', # yellow
'k', # black
'w', # white
]

Subplots

Iterating over subplots

In [622]:
n = len(timings)
experiment_names = timings.keys()

fig, axes = plt.subplots(1, n, sharey=True, figsize=(16,4))

colors = np.random.choice(list('rgbcmyk'), n, replace=False)
markers = np.random.choice(all_markers, n, replace=False)
lines = np.random.choice(all_linestyles, n, replace=False)

for ax_num, ax in enumerate(axes):
    key = experiment_names[ax_num]
    ax.semilogy(dims, timings[key], label=key,
            color=colors[ax_num],
            marker=markers[ax_num],
            markersize=8,
            linestyle=lines[ax_num],
            lw=3)
    ax.set_xlabel('matrix dimension')
    ax.set_title(key)

axes[0].set_ylabel('Time, s')
plt.suptitle('Benchmarking results', fontsize=16,  y=1.03)
Out[622]:
<matplotlib.text.Text at 0x581efb38>

Manual control of subplots

In [601]:
plt.figure()
plt.subplot(211)
plt.plot([1,2,3])

plt.subplot(212)
plt.plot([2,5,4])
Out[601]:
[<matplotlib.lines.Line2D at 0x2ef6ac88>]

Task: create subplot with 2 columns and 2 rows. Leave bottom left quarter empty. Scipy and numpy benchmarks should go into top row.

Other topics

function wrappers and decorators

installing packages

importing modules

ipyton magic

qtconsole

environment

extensions

profiles (deprecated in jupyter)

profiling

debugging

cython, numba

openmp

OOP

python 2 vs python 3

plotting in python - palletes and colormaps, styles

pandas (presenting results)

numpy strides, contiguousness, vectorize function, broadcasting, saving output

magic functions (applied to line and to code cell)

jupyter configuration

Solutions

Task 1

In [602]:
items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']

method 1

In [603]:
from collections import defaultdict

item_ids = defaultdict(lambda: len(item_ids))
map(item_ids.__getitem__, items)
Out[603]:
[0, 1, 2, 0, 2, 1]

method 2

In [604]:
import pandas as pd

pd.DataFrame({'items': items}).groupby('items', sort=False).grouper.group_info[0]
Out[604]:
array([0, 1, 2, 0, 2, 1], dtype=int64)

method 3

In [605]:
import numpy as np

np.unique(items, return_inverse=True)[1]
Out[605]:
array([2, 0, 1, 2, 1, 0])

method 4

In [606]:
last = 0
counts = {}
result = []
for item in items:
    try:
        count = counts[item]
    except KeyError:
        counts[item] = count = last
        last += 1
    result.append(count)

result
Out[606]:
[0, 1, 2, 0, 2, 1]

Task 2

In [607]:
N = 1000
In [608]:
from itertools import permutations

%timeit list(permutations(xrange(N), 2))
10 loops, best of 3: 78.6 ms per loop

Hankel matrix: $a_{ij} = a_{i-1, j+1}$

In [609]:
import numpy as np
from scipy.linalg import hankel

def pairs_idx(n):
    return np.vstack((np.repeat(xrange(n), n-1), hankel(xrange(1, n), xrange(-1, n-1)).ravel()))
In [610]:
%timeit pairs_idx(N)
100 loops, best of 3: 17.6 ms per loop