CRÜCIAL PŸTHON Week 5: Outers

In [11]:
%pylab inline

# We'll be using some pretty-printing later on
from pprint import pprint
Populating the interactive namespace from numpy and matplotlib
In [12]:
from IPython.core.display import Image 
Image(url='http://labrosa.ee.columbia.edu/crucialpython/logo.png', width=600) 
Out[12]:

Cast ye out, tight loops!

High-level programs sometimes have computational bottleneck at a tight loop, or even nested loops. This comes up frequently when comparing two sets of objects, say to build a similarity matrix or do collision detection.

A common way to eliminate these bottlenecks is to implement the critical portion in a low-level language, such as C, so that the python interpreter doesn't have to work as hard. Today we'll look at NumPy's outer functionality, which when used correctly, can replace many tight-loop patterns that occur in practice.

Say we have two lists of numbers a and b, and we wish to find all the pairs a[i], b[j] such that a[i] is within some distance r of b[i]. That is, find all $i$ and $j$ such that

$$ |a_i - b_i| \leq r $$

This can be done trivially by nested iteration, as seen below.

In [13]:
# Let's make a function to generate some example data
def make_point_data(n, m):
    a = np.arange(-n/2.0, 1 + n / 2.0)
    b = np.random.randn(m)

    # For convenience, let's sort b
    b = np.sort(b)

    return a, b
In [14]:
# Build a list and iterate over all pairs
def find_similar_points(a, b, r):
    '''Given two lists of numbers a and b, and a threshold r,
       returns a matrix `hits` such that:
       
       |a[ hits[i, 0] ] - b[ hits[j, 1] ]| <= r
       
    '''
    
    hits = []
    for i, ai in enumerate(a):
        for j, bj in enumerate(b):
            if np.abs(ai - bj) <= r:
                hits.append( (i, j) )
                
    # Return indices in ndarray format
    return np.asarray(hits)
In [15]:
# Generate and display the example data
a, b = make_point_data(5, 10)
# And a threshold
r = 0.25

print 'a: ', a
print 'b: ', b
print 'r: ', r
a:  [-2.5 -1.5 -0.5  0.5  1.5  2.5]
b:  [-0.81584428  0.27568022  0.3787181   0.39691845  0.45836127  0.8646873
  1.00730788  1.44877987  1.56445537  2.01475412]
r:  0.25
In [16]:
# What'd we get?
print 'Hits: '
print find_similar_points(a, b, r)
Hits: 
[[3 1]
 [3 2]
 [3 3]
 [3 4]
 [4 7]
 [4 8]]

Example 2: interval intersection

Let's consider a slightly more involved example. Say we have a set of real line intervals $\{(s_i, t_i)\}$ where each ordered pair denotes the start and end of the interval.

One quantity of interest is the length of intersection between any two intervals:

$$ c(i, j) = |(s_i, t_i) \cap (s_j, t_j)| = \max\left(0, \min(t_i, t_j) - \max(s_i, s_j) \right) $$

Again, this can be computed by nested iteration, as seen below.

In [17]:
def make_interval_data(n):
    # Generate random start times
    S = np.random.randn(n)
    
    # Sort it, for convenience
    S = np.sort(S)

    # Add a random positive length to each one
    T = S + np.abs(np.random.randn(n))
    
    return S, T
In [9]:
def find_interval_overlap(S, T):
    '''S and T are vectors that encode intervals (S[i], T[i])
    
    Returns an n-by-n matrix C such that C[i, j] is the length of the overlap between intervals i and j.
    '''
    
    # Get the shape
    n = S.shape[0]
    
    # Build an output matrix.
    C = np.zeros( (n, n) )
    
    for i in range(n):
        for j in range(n):
            C[i, j] = max(0, min(T[i], T[j]) - max(S[i], S[j]))
            
    return C
In [10]:
# Let's make up some test data
S, T = make_interval_data(5)

# What does it look like?
pprint( zip(S, T) )
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-d3e12f50d08d> in <module>()
      3 
      4 # What does it look like?
----> 5 pprint( zip(S, T) )

NameError: name 'pprint' is not defined
In [40]:
print find_interval_overlap(S, T)
[[ 1.53571796  0.26408445  0.99869268  0.          0.        ]
 [ 0.26408445  0.26408445  0.20020356  0.          0.        ]
 [ 0.99869268  0.20020356  1.95900701  0.46699172  0.48550766]
 [ 0.          0.          0.46699172  0.46699172  0.29865544]
 [ 0.          0.          0.48550766  0.29865544  0.53138621]]

In both of the examples above, we're doing an all-pairs comparison between two sets of objects.

While conceptually simple, the nested iteration approach leads to somewhat cumbersome code, which can also be computationally inefficient.

Fortunately, NumPy provides some tools to help us simplify and vectorize this kind of pairwise operation. The secret ingredient is called outer, and is a property of NumPy's unversal function ufunc abstraction.

Quoth the manual,

Universal functions (ufunc)

A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion, supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a “vectorized” wrapper for a function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs.

A ufunc f(a, b) implement the outer method, which returns a matrix X[i, j] = f(a[i], b[j]). The benefit of this approach is that ufuncs are typically implemented in C, and are often much more efficient in practice than implementing the equivalent loop in python.

Some examples of potentially interesting ufuncs include:

  • add, subtract
  • multiply, divide
  • less, less_equal
  • bitwise_and, bitwise_or,
  • etc.

A list of available ufuncs can be found in the numpy documentation.


In [41]:
# Here's how to write our similar_points function using the subtraction ufunc
def fast_similar_points(a, b, r):
    
    # subtract.outer builds a matrix of a[i] - b[j]
    # abs computes the absolute value
    difference = np.abs(np.subtract.outer(a, b))
    
    # <= r does a pointwise comparison against the threshold
    # argwhere finds all indices (i, j) such that hits[i, j] == True
    return np.argwhere(difference <= r)
In [42]:
# Make some more test data, 5 points in a and 10 in b
a, b = make_point_data(5, 10)
In [43]:
print a
print b
print r
[-2.5 -1.5 -0.5  0.5  1.5  2.5]
[-1.96563268 -1.68353098 -0.77402719 -0.27807597 -0.16635048  0.08998779
  0.12748518  0.79521268  0.98956769  1.98828443]
0.25
In [44]:
# For comparison purposes, our previous implementation
print find_similar_points(a, b, r)
[[1 1]
 [2 3]]
In [45]:
# And the new one
print fast_similar_points(a, b, r)
[[1 1]
 [2 3]]
In [46]:
# Timing benchmark
%timeit find_similar_points(a, b, r)
1000 loops, best of 3: 162 µs per loop
In [47]:
%timeit fast_similar_points(a, b, r)
10000 loops, best of 3: 28.9 µs per loop
In [48]:
# How does it scale? Let's increase the size of our data
a, b = make_point_data(500, 1000)
In [49]:
%timeit find_similar_points(a, b, r)
1 loops, best of 3: 993 ms per loop
In [50]:
%timeit fast_similar_points(a, b, r)
100 loops, best of 3: 4.96 ms per loop
In [51]:
# Similarly, our interval-overlap calculator can be simplified by using outers to compute the intermediate steps
def fast_interval_overlap(S, T):
    
    # For each pair of intervals, find the later start: max(S[i], S[j])
    max_start = np.maximum.outer(S, S)
    
    # And the earlier end: min(T[i], T[j])
    min_end   = np.minimum.outer(T, T)
    
    # Subtract min_end[i, j] - max_start[i, j]
    # If the result is negative, then the intersection is 0
    return np.maximum(0, min_end - max_start)
In [52]:
S, T = make_interval_data(5)
pprint( zip(S, T) )
[(-1.047725458139019, -0.33908813387328129),
 (-0.40815662861575197, 0.41324316199846467),
 (-0.25504141148370885, 1.0775303283016173),
 (0.98266375287621432, 2.8308856990815423),
 (1.06482027877006, 2.1899451748162431)]
In [53]:
print fast_interval_overlap(S, T)
[[ 0.70863732  0.06906849  0.          0.          0.        ]
 [ 0.06906849  0.82139979  0.66828457  0.          0.        ]
 [ 0.          0.66828457  1.33257174  0.09486658  0.01271005]
 [ 0.          0.          0.09486658  1.84822195  1.1251249 ]
 [ 0.          0.          0.01271005  1.1251249   1.1251249 ]]
In [54]:
%timeit find_interval_overlap(S, T)
10000 loops, best of 3: 61.3 µs per loop
In [55]:
%timeit fast_interval_overlap(S, T)
100000 loops, best of 3: 14.4 µs per loop
In [56]:
# Scaling up?
S, T = make_interval_data(1000)
In [57]:
%timeit find_interval_overlap(S, T)
1 loops, best of 3: 2.32 s per loop
In [58]:
%timeit fast_interval_overlap(S, T)
10 loops, best of 3: 21.6 ms per loop

Note:

Outers are great for relatively small comparison sets and simple operations. However, they require $\Theta(nm)$ storage for an $n$-by-$m$ comparison, which can quickly eat up memory if you're not careful.

For many applications, though, the speedup due to vectorization is worth the increased memory complexity.