## Lesson 6 - Fancy indexing and index tricks¶

NumPy offers more indexing facilities than regular Python sequences. In addition to indexing by integers and slices, as we saw before, arrays can be indexed by arrays of integers and arrays of booleans.

In [1]:
import numpy as np


## Indexing with Arrays of Indices¶

Create an array of the first 12 square numbers:

In [2]:
a = np.arange(12)**2

In [3]:
a

Out[3]:
array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121])

Create an array of indices:

In [4]:
i = np.array([1, 1, 3, 8, 5])

In [5]:
i

Out[5]:
array([1, 1, 3, 8, 5])

Now the cool part - index the elements of a at the positions i:

In [6]:
a[i]

Out[6]:
array([ 1,  1,  9, 64, 25])

Let's try a different array of (bi-dimensional) indices:

In [7]:
j = np.array([[3, 4], [9, 7]])

In [8]:
a[j]

Out[8]:
array([[ 9, 16],
[81, 49]])

When the indexed array a is multidimensional, a single array of indices refers to the first dimension of a. The following example shows this behavior by converting an image of labels into a color image using a palette.

In [9]:
palette = np.array(
[[0, 0, 0], # black
[255, 0, 0], # red
[0, 255, 0], # green
[0, 0, 255], # blue
[255, 255, 255]] # white
)


Create an array of palette colors. e.g. 0: black, 1: red, etc.

In [10]:
image = np.array(
[[0, 1, 2, 0 ],
[0, 3, 4, 0 ]]
)

In [11]:
palette[image]

Out[11]:
array([[[  0,   0,   0],
[255,   0,   0],
[  0, 255,   0],
[  0,   0,   0]],

[[  0,   0,   0],
[  0,   0, 255],
[255, 255, 255],
[  0,   0,   0]]])

We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape.

In [12]:
a = np.arange(12).reshape(3,4)

In [13]:
a

Out[13]:
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
In [14]:
# indices for the first dim (row) of a
i = np.array(
[[0,1],
[1,2]]
)

In [15]:
# indices for the second dim (column) of a
j = np.array(
[[2,1],
[3,1]]
)

In [16]:
# i and j must have equal shape
a[i,j]

Out[16]:
array([[2, 5],
[7, 9]])

To elaborate more on the a[i,j] part:

We are trying to create a 2 by 2 matrix from matrix a.

$$\begin{bmatrix} a[(i_1, j_1)], a[(i_1, j_2)] \\ a[(i_2, j_1)], a[(i_2, j_2)] \\ \end{bmatrix}$$

This becomes:

$$\begin{bmatrix} a[(0, 2)], a[(1, 1)] \\ a[(1, 3)], a[(2, 1)] \\ \end{bmatrix}$$

Which get resolves to:

$$\begin{bmatrix} 2, 5 \\ 7, 9 \\ \end{bmatrix}$$

Let's try something else. Use row defined by i, and pick 2nd column of a only.

In [17]:
a[i, 2]

Out[17]:
array([[ 2,  6],
[ 6, 10]])

Now try something even more interesting.

In [18]:
# recall what a is
a

Out[18]:
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
In [19]:
# recall what j is
j

Out[19]:
array([[2, 1],
[3, 1]])
In [20]:
# Use all 3 rows of a, and pick columns defined by j
a[:,j]

Out[20]:
array([[[ 2,  1],
[ 3,  1]],

[[ 6,  5],
[ 7,  5]],

[[10,  9],
[11,  9]]])

Naturally, we can put i and j in a sequence (say a list) and then do the indexing with the list.

In [21]:
# recall what a is
a

Out[21]:
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
In [22]:
# recall what i is
i

Out[22]:
array([[0, 1],
[1, 2]])
In [23]:
# recall what j is
j

Out[23]:
array([[2, 1],
[3, 1]])
In [24]:
l = a[i, j]

In [25]:
l

Out[25]:
array([[2, 5],
[7, 9]])

However, we can not do this by putting i and j into an array, because this array will be interpreted as indexing the first dimension of a.

In [26]:
s = np.array([i, j])

In [27]:
s

Out[27]:
array([[[0, 1],
[1, 2]],

[[2, 1],
[3, 1]]])
In [28]:
# This will be wrong
# a[s]


The tuple function will work though.

In [29]:
tuple(s)

Out[29]:
(array([[0, 1],
[1, 2]]), array([[2, 1],
[3, 1]]))

i.e. a two dimensional tuple. The first element is i (row), and second element is j (column).

In [30]:
a[tuple(s)]

Out[30]:
array([[2, 5],
[7, 9]])

Cool hey?

Another common use of indexing with arrays is the search of the maximum value of time-dependent series :

In [31]:
# time scale  (from 20 to 145 inclusive, 5 increments)
time = np.linspace(20, 145, 5)

In [32]:
time

Out[32]:
array([  20.  ,   51.25,   82.5 ,  113.75,  145.  ])
In [33]:
# just to visualize what this is
np.sin(np.arange(20))

Out[33]:
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
-0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849,
-0.54402111, -0.99999021, -0.53657292,  0.42016704,  0.99060736,
0.65028784, -0.28790332, -0.96139749, -0.75098725,  0.14987721])
In [34]:
# 4 time-dependent series
data = np.sin(np.arange(20)).reshape(5,4)

In [35]:
data

Out[35]:
array([[ 0.        ,  0.84147098,  0.90929743,  0.14112001],
[-0.7568025 , -0.95892427, -0.2794155 ,  0.6569866 ],
[ 0.98935825,  0.41211849, -0.54402111, -0.99999021],
[-0.53657292,  0.42016704,  0.99060736,  0.65028784],
[-0.28790332, -0.96139749, -0.75098725,  0.14987721]])
In [36]:
# index of the maxima for each series (column.
ind = data.argmax(axis=0)

In [37]:
ind

Out[37]:
array([2, 0, 3, 1], dtype=int64)
In [38]:
# times corresponding to the maxima
time_max = time[ind]

In [39]:
time_max

Out[39]:
array([  82.5 ,   20.  ,  113.75,   51.25])
In [40]:
# => data[ind[0],0], data[ind[1],1]...
data_max = data[ind, xrange(data.shape[1])]

In [41]:
data_max

Out[41]:
array([ 0.98935825,  0.84147098,  0.99060736,  0.6569866 ])

Note the use of xrange(). Seems for be mroe preferable than range() according to this StackOverflow forum. Here is an answer by Brian I like alot:

range(n) creates a list containing all the integers 0 ... n-1. This is a problem if you do range(1000000), because you'll end up with a 4MB+ list. xrange() deals with this by returning an object that pretends to be a list, but just works out the number needed from the index asked for, and returns that.

In [42]:
all(data_max == data.max(axis=0))

Out[42]:
True
In [43]:
data.max(axis=0)

Out[43]:
array([ 0.98935825,  0.84147098,  0.99060736,  0.6569866 ])

For simplicity, the data.max() will come in handy.

You can also use indexing with arrays as a target to assign to:

In [44]:
a = np.arange(5)

In [45]:
a

Out[45]:
array([0, 1, 2, 3, 4])
In [46]:
a[[1,3,4]] = 0

In [47]:
a

Out[47]:
array([0, 0, 2, 0, 0])

However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:

In [48]:
a = np.arange(5)

In [49]:
a

Out[49]:
array([0, 1, 2, 3, 4])
In [50]:
a[[0,0,2]]=[1,2,3]

In [51]:
a

Out[51]:
array([2, 1, 3, 3, 4])

This is reasonable enough, but watch out if you want to use Python's += construct, as it may not do what you expect:

In [52]:
a = np.arange(5)

In [53]:
a

Out[53]:
array([0, 1, 2, 3, 4])
In [54]:
a[[0,0,2]]+=1

In [55]:
a

Out[55]:
array([1, 1, 3, 3, 4])

Even though 0 occurs twice in the list of indices, the 0th element is only incremented once. This is because Python requires "a+=1" to be equivalent to "a=a+1".

## Indexing with Boolean Arrays¶

When we index arrays with arrays of (integer) indices we are providing the list of indices to pick.

With boolean indices the approach is different; we explicitly choose which items in the array we want and which ones we don't.

The most natural way one can think of for boolean indexing is to use boolean arrays that have the same shape as the original array:

In [56]:
a = np.arange(12).reshape(3,4)

In [57]:
a

Out[57]:
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
In [58]:
# b is a boolean with a's shape
b = a > 4

In [59]:
b

Out[59]:
array([[False, False, False, False],
[False,  True,  True,  True],
[ True,  True,  True,  True]], dtype=bool)
In [60]:
# 1d array with the selected elements
a[b]

Out[60]:
array([ 5,  6,  7,  8,  9, 10, 11])

This property can be very useful in assignments:

In [61]:
# All elements of 'a' higher than 4 become 0
a[b] = 0

In [62]:
a[b] = 0

In [63]:
a

Out[63]:
array([[0, 1, 2, 3],
[4, 0, 0, 0],
[0, 0, 0, 0]])

You can look at the Mandelbrot set example to see how to use boolean indexing to generate an image of the Mandelbrot set.

The second way of indexing with booleans is more similar to integer indexing; for each dimension of the array we give a 1D boolean array selecting the slices we want.

In [64]:
a = np.arange(12).reshape(3,4)

In [65]:
a

Out[65]:
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
In [66]:
# first dimension selection  (row)
b1 = np.array([False,True,True])

In [67]:
# second dimension selection  (column)
b2 = np.array([True,False,True,False])

In [68]:
# selecting rows
a[b1,:]

Out[68]:
array([[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
In [69]:
a[b1]   # same thing

Out[69]:
array([[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
In [70]:
# selecting columns
a[:,b2]

Out[70]:
array([[ 0,  2],
[ 4,  6],
[ 8, 10]])
In [71]:
a[...,b2]   # same thing

Out[71]:
array([[ 0,  2],
[ 4,  6],
[ 8, 10]])

(This example below, I do not quite understand?)

In [72]:
# a weird thing to do
a[b1,b2]

Out[72]:
array([ 4, 10])
In [73]:
s = tuple([b1,b2])

In [74]:
s

Out[74]:
(array([False,  True,  True], dtype=bool),
array([ True, False,  True, False], dtype=bool))
In [75]:
a[s]

Out[75]:
array([ 4, 10])

## The ix_() function¶

The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet.

For example, if you want to compute all the a + b*c for all the triplets taken from each of the vectors a, b and c:

In [76]:
a = np.array([2,3,4,5])
b = np.array([8,5,4])
c = np.array([5,4,6,8,3])

In [77]:
ax,bx,cx = np.ix_(a,b,c)

In [78]:
ax

Out[78]:
array([[[2]],

[[3]],

[[4]],

[[5]]])
In [79]:
bx

Out[79]:
array([[[8],
[5],
[4]]])
In [80]:
cx

Out[80]:
array([[[5, 4, 6, 8, 3]]])
In [81]:
ax.shape, bx.shape, cx.shape

Out[81]:
((4L, 1L, 1L), (1L, 3L, 1L), (1L, 1L, 5L))
In [82]:
result = ax + bx*cx

In [83]:
result

Out[83]:
array([[[42, 34, 50, 66, 26],
[27, 22, 32, 42, 17],
[22, 18, 26, 34, 14]],

[[43, 35, 51, 67, 27],
[28, 23, 33, 43, 18],
[23, 19, 27, 35, 15]],

[[44, 36, 52, 68, 28],
[29, 24, 34, 44, 19],
[24, 20, 28, 36, 16]],

[[45, 37, 53, 69, 29],
[30, 25, 35, 45, 20],
[25, 21, 29, 37, 17]]])
In [84]:
result[3,2,4]

Out[84]:
17
In [85]:
a[3]+b[2]*c[4]

Out[85]:
17

You could also implement the reduce as follows:

In [86]:
def ufunc_reduce(ufct, *vectors):
vs = np.ix_(*vectors)
r = ufct.identity
for v in vs:
r = ufct(r,v)
return r

In [87]:
ufunc_reduce(np.add,a,b,c)

Out[87]:
array([[[15, 14, 16, 18, 13],
[12, 11, 13, 15, 10],
[11, 10, 12, 14,  9]],

[[16, 15, 17, 19, 14],
[13, 12, 14, 16, 11],
[12, 11, 13, 15, 10]],

[[17, 16, 18, 20, 15],
[14, 13, 15, 17, 12],
[13, 12, 14, 16, 11]],

[[18, 17, 19, 21, 16],
[15, 14, 16, 18, 13],
[14, 13, 15, 17, 12]]])

The advantage of this version of reduce compared to the normal ufunc.reduce is that it makes use of the Broadcasting Rules in order to avoid creating an argument array the size of the output times the num

New Concepts!

See [numpy.identity] (http://docs.scipy.org/doc/numpy/reference/generated/numpy.identity.html) to see an example of creating an identify array.

The use of the *arg and **keywordargs is explained in this StackOverflow forum.

The function repr() is a Python built in function. Explained here. i.e. Return a string containing a printable representation of an object.

In [88]:
def printlist(*args):
print repr(args)

In [89]:
printlist(1, 2, 3, 4, 5) # or as many more arguments as I'd like

(1, 2, 3, 4, 5)

In [90]:
def printdict(**kwargs):
print repr(kwargs)

In [91]:
printdict(john=10, jill=12, david=15)

{'john': 10, 'jill': 12, 'david': 15}

In [92]:
def printlistdict(self, *args, **kwargs):
print repr(args)
print repr(kwargs)

In [93]:
printlistdict(1, 2, 3, 4, 5, john=10, jill=12, david=15)

(2, 3, 4, 5)
{'john': 10, 'jill': 12, 'david': 15}


## Indexing with Strings¶

Numpy provides powerful capabilities to create arrays of structs or records. These arrays permit one to manipulate the data by the structs or by fields of the struct. A simple example will show what is meant.:

In [94]:
x = np.zeros((2,),dtype=('i4,f4,a10'))

In [95]:
x

Out[95]:
array([(0, 0.0, ''), (0, 0.0, '')],
dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', 'S10')])
In [96]:
x[:] = [(1, 2., 'Hello'), (2, 3., "World")]

In [97]:
x

Out[97]:
array([(1, 2.0, 'Hello'), (2, 3.0, 'World')],
dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', 'S10')])

Here we have created a one-dimensional array of length 2. Each element of this array is a record that contains three items, a 32-bit integer, a 32-bit float, and a string of length 10 or less. If we index this array at the second position we get the second record:

In [98]:
x[1]

Out[98]:
(2, 3.0, 'World')

Conveniently, one can access any field of the array by indexing using the string that names that field. In this case the fields have received the default names ‘f0’, ‘f1’ and ‘f2’.

In [99]:
y = x['f1']

In [100]:
y

Out[100]:
array([ 2.,  3.], dtype=float32)
In [101]:
y[:] = 2*y

In [102]:
x

Out[102]:
array([(1, 4.0, 'Hello'), (2, 6.0, 'World')],
dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', 'S10')])
In [103]:
y

Out[103]:
array([ 4.,  6.], dtype=float32)

See documentation for more on Structured arrays (aka "Record arrays").

## Conclusion¶

We have now completed Lesson 6 - Fancy Indexing and Index Tricks. A pretty tricky topic. Probably require revisiting later on / further studying.