NumPy offers more indexing facilities than regular Python sequences. (NumPyは、普通のPythonシーケンスよりも多くのindex付け機能を用意している。)
In addition to indexing by integers and slices, as we saw before, arrays can be indexed by arrays of integers and arrays of booleans. (整数やスライスによるindex付けに加え、 これまでに見てきたように、 整数配列や真偽値配列による配列のindex付けも可能である。)
import numpy as np
a = np.arange(12) ** 2 # the first 12 square numbers
a
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121])
i = np.array([1, 1, 3, 8, 5]) # an array of indices indexの配列
a[i] # the elements of a at the positions i
array([ 1, 1, 9, 64, 25])
j = np.array([[3, 4], [9, 7]])# a bidimensional array of indices(indexの2次元配列)
a[j] # the same shape as j (2*2)
array([[ 9, 16], [81, 49]])
When the indexed array a is multidimensional, a single array of indices refers to the first dimension of a. (index付けされた配列aが多次元配列の場合、 indexのsingle配列は a の最初の次元を参照する。)
The following example shows this behavior by converting an image of labels into a color image using a palette. (以下の例では、ラベル画像をパレットを用いてカラー画像に変換することでこの振舞いを示す。)
palette = np.array([
[0, 0, 0], # black
[255, 0, 0], # red
[0, 255, 0], # green
[0, 0, 255], # blue
[255,255,255] # white
])
image = np.array([
[0, 1, 2, 0], # each value corresponds to a color in the palette(各値が、パレット内の1つの色に対応する)
[0, 3, 4, 0]
])
palette[image] # the (2,4,3) color image (2*4*3のカラーimage)
array([[[ 0, 0, 0], [255, 0, 0], [ 0, 255, 0], [ 0, 0, 0]], [[ 0, 0, 0], [ 0, 0, 255], [255, 255, 255], [ 0, 0, 0]]])
We can also give indexes for more than one dimension. (2次元以上のindexを与えることも可能。 )
The arrays of indices for each dimension must have the same shape. (各次元のindex配列は同じshapeでなければならない。)
a = np.arange(12).reshape(3, 4)
a
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
i = np.array( [
[0, 1], # indices for the first dim of a
[1, 2]
])
j = np.array([
[2, 1], # indices for the second dim
[3, 3]
])
a[i] # 2 * 2 * 4行列
array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7]], [[ 4, 5, 6, 7], [ 8, 9, 10, 11]]])
a[i, j] # i and j must have equal shape
array([[ 2, 5], [ 7, 11]])
a[i, 2]
array([[ 2, 6], [ 6, 10]])
a[:, j] # 3 * 2 * 2, i.e., a[ : , j]
array([[[ 2, 1], [ 3, 3]], [[ 6, 5], [ 7, 7]], [[10, 9], [11, 11]]])
Naturally, we can put i
and j
in a sequence (say a list) and then do the indexing with the list.
(普通は、i
と j
をシーケンス(というかリスト)に入れた上で、
リストでindex付けを行う。)
l = [i, j]
a[l] # equivalent to a[i,j]
array([[ 2, 5], [ 7, 11]])
However, we can not do this by putting i
and j
into an array,
because this array will be interpreted as indexing the first dimension of a.
(しかし、i
と j
を配列に入れて同じことができるわけではない。
なぜなら、この配列は a の1次元目をindex付けするものとして解釈されるためである。)
s = np.array([i, j])
s
array([[[0, 1], [1, 2]], [[2, 1], [3, 3]]])
a[s] # not what we want,
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-12-77dec03ac05c> in <module>() ----> 1 a[s] # not what we want, IndexError: index 3 is out of bounds for axis 0 with size 3
a[tuple(s)] # same as a[i,j]
array([[ 2, 5], [ 7, 11]])
Another common use of indexing with arrays is the search of the maximum value of time-dependent series (もう1つの配列でのindex付けの一般的な使い方は、 時間に依存した連続データの中からの最大値の検索である):
time = np.linspace(20, 145, 5) # time scale
time
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
data = np.sin(np.arange(20)).reshape(5, 4) # 4 time-dependent series (4つの時間依存な連続データ)
data
array([[ 0. , 0.84147098, 0.90929743, 0.14112001], [-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ], [ 0.98935825, 0.41211849, -0.54402111, -0.99999021], [-0.53657292, 0.42016704, 0.99060736, 0.65028784], [-0.28790332, -0.96139749, -0.75098725, 0.14987721]])
ind = data.argmax(axis=0) # index of the maxima for each series(各連続データで最大値をとるindex)
ind
array([2, 0, 3, 1], dtype=int64)
time_max = time[ind] # times corresponding to the maxima 最大値に対応する時刻
time_max
array([ 82.5 , 20. , 113.75, 51.25])
data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...
data_max
array([ 0.98935825, 0.84147098, 0.99060736, 0.6569866 ])
np.all(data_max == data.max(axis=0))
True
You can also use indexing with arrays as a target to assign to (代入対象を表すために配列でindex付けすることも可能):
a = np.arange(5)
a
array([0, 1, 2, 3, 4])
a[[1, 3, 4]] = 0
a
array([0, 0, 2, 0, 0])
However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value (しかし、indexリストに繰り返しが含まれる場合、 代入は複数回行われ、最後の値が残る):
a = np.arange(5)
a[[0, 0, 2]] = [1, 2, 3] # 2が残る
a
array([2, 1, 3, 3, 4])
This is reasonable enough,
but watch out if you want to use Python’s+=
construct,
as it may not do what you expect
(これは十分納得がいくものであるが、
Pythonの +=
構造を使いたい場合、期待している動作をしないかもしれないので注意):
a = np.arange(5)
a
array([0, 1, 2, 3, 4])
a[[0, 0, 2]] += 1
a # a[0] は2にならない
array([1, 1, 3, 3, 4])
Even though 0 occurs twice in the list of indices, the 0th element is only incremented once. (indexのリストに 0 が2回現れるにもかかわらず、 0番目の要素は1回しかインクリメントされていない。)
This is because Python requires “a+=1”
to be equivalent to “a=a+1”
.
(これはPythonが "a+=1" に "a=a+1" と等価であるよう求めるためである。
)
When we index arrays with arrays of (integer) indices, we are providing the list of indices to pick. (整数の)indexの配列を使って配列をindex付けする時、 我々は拾いたいindexのリストを提供している。
With boolean indices the approach is different; we explicitly choose which items in the array we want and which ones we don’t. (bool配列を使う場合には異なるアプローチをとる; 配列の要素のうちどれを選び、どれを選ばないのかを明示的に選択する。)
The most natural way one can think of for boolean indexing is to use boolean arrays that have the same shape as the original array (boolのindex付けについて考えつく最も自然なやり方は、 元の配列と同じshapeのbool配列を用いることである):
a = np.arange(12).reshape(3, 4)
a
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
b = a > 4
b # b is a boolean with a's shape
array([[False, False, False, False], [False, True, True, True], [ True, True, True, True]], dtype=bool)
a[b] # 1d array with the selected elements
array([ 5, 6, 7, 8, 9, 10, 11])
This property can be very useful in assignments (この性質は、代入の際にとても便利):
a[b] = 0 # All elements of 'a' higher than 4 become 0
a
array([[0, 1, 2, 3], [4, 0, 0, 0], [0, 0, 0, 0]])
a = np.arange(12).reshape(3, 4)
b = a > 4
~b
array([[ True, True, True, True], [ True, False, False, False], [False, False, False, False]], dtype=bool)
a[~b] = 0 # 4以下を0
a
array([[ 0, 0, 0, 0], [ 0, 5, 6, 7], [ 8, 9, 10, 11]])
You can look at the following example to see how to use boolean indexing to generate an image of the Mandelbrot set: (以下の例を見れば、Mandelbrot集合 の画像の生成にbool index付けをどのように用いているか分かるだろう。
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def mandelbrot(h, w, maxit=20):
"""Returns an image of the Mandelbrot fractal of size (h, w)."""
(y, x) = np.ogrid[1:0:h * 1j, 0:1:w * 1j]
c = x + y * 1j
z = c
print('z.shape', z.shape)
divtime = np.zeros(z.shape, dtype=int) # h * w の0行列
for i in range(maxit):
z = z ** 2 + c # z_{n+1} = z_{n}^2 + c, z^2 = (x+yj)^2 = x^2 - y^2 + 2xyj
diverge = (z * np.conj(z)) > (2 ** 2) # who is diverging zの大きさが4より大きければTrue
div_now = diverge & (divtime == 0) # who is diverging now, Bool行列, divtime == 0はまだ更新されていない行列を表す
#print('div_now', div_now)
divtime[div_now] = i + 1 # note when, divegeの高いところに今のiterateを代入
z[diverge] = 2 # avoid diverging too much, 大きすぎるdivergeは避ける
#print(i, divtime)
return divtime
$x = 0$ のとき:
$$ z^2 +c = (x+yj)^2 = x^2 - y^2 + 2xyj = - y^2 + 0j + c $$$$ |z^2 + c| = (-y^2)^2 + c_y^2 = y^4 + c_y^2 $$$y = 0$のとき:
$$ z^2 + c = (x+yj)^2 = x^2 - y^2 + 2xyj = x^2 + 0j + c $$$$ |z^2 + c| = (x^2 + c_x)^2 = x^4 + 2c_x x^2 + c_x^2 $$plt.imshow(mandelbrot(500, 500, maxit=20))
z.shape (500, 500)
<matplotlib.image.AxesImage at 0x15314d50048>
The second way of indexing with booleans is more similar to integer indexing; for each dimension of the array, we give a 1D boolean array selecting the slices we want. (bool値によるindex付けの2つ目の方法は、 もっと整数での添字付けに似ている; 配列の各次元について、欲しいスライスを選択して、1次元のbool配列を与える。)
a = np.arange(12).reshape(3, 4)
a
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
b1 = np.array([False, True, True]) # first dim selection
b2 = np.array([True, False, True, False]) # second dim selection
a[b1, :] # selecting rows
array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]])
a[b1] #same thing 2*4
array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]])
a[:, b2] # selecting columns 3*2
array([[ 0, 2], [ 4, 6], [ 8, 10]])
a[b1, b2] # a weird thing to do
array([ 4, 10])
Note that the length of the 1D boolean array must coincide with the length of the dimension (or axis) you want to slice. (1D bool配列の長さは、スライスしたい次元(あるいは軸)の長さと合致していなければならないことに注意。)
In the previous example,
b1
is a 1-rank array with length 3 (the number of rows in a),
and b2
(of length 4) is suitable to index the 2nd rank (columns) of a
.
(前の例では、b1
はランク1で長さが3の配列(a に含まれる行数)であり、
b2
(長さ4)は a
の第2ランク(列)をindex付けするのにふさわしい。)
The ix
function can be used to combine different vectors so as to obtain the result for each n-uplet.
(ix_
関数は、各 n-uplet の結果を得るために、異なるベクトルを結合するのに用いられる。)
For example,
if you want to compute all the a+b*c
for all the triplets taken from each of the vectors a
, b
and c
(例えば、ベクトル a, b, c から取ってきた3つ組全てに対し a+b*c
をくまなく計算したいのであれば):
a = np.array([2, 3, 4, 5])
b = np.array([8, 5, 4])
c = np.array([5, 4, 6, 8, 3])
a + b * c
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-40-2fc63b0a8d79> in <module>() ----> 1 a + b * c ValueError: operands could not be broadcast together with shapes (3,) (5,)
(ax, bx, cx) = np.ix_(a, b, c)
ax
array([[[2]], [[3]], [[4]], [[5]]])
ax.shape # 4 * 1 * 1
(4, 1, 1)
bx
array([[[8], [5], [4]]])
bx.shape # 1 * 3 * 1
(1, 3, 1)
cx # 1 * 1 * 5
array([[[5, 4, 6, 8, 3]]])
cx.shape
(1, 1, 5)
bx * cx # 直積 1 * 3 * 5
array([[[40, 32, 48, 64, 24], [25, 20, 30, 40, 15], [20, 16, 24, 32, 12]]])
result = ax + bx * cx # 4 * 3 * 5
result
array([[[42, 34, 50, 66, 26], [27, 22, 32, 42, 17], [22, 18, 26, 34, 14]], [[43, 35, 51, 67, 27], [28, 23, 33, 43, 18], [23, 19, 27, 35, 15]], [[44, 36, 52, 68, 28], [29, 24, 34, 44, 19], [24, 20, 28, 36, 16]], [[45, 37, 53, 69, 29], [30, 25, 35, 45, 20], [25, 21, 29, 37, 17]]])
result.shape
(4, 3, 5)
result[3, 2, 4]
17
a[3] + b[2] * c[4]
17
You could also implement the reduce
as follows:
reduceを以下のように実装することもできるだろう
def ufunc_reduce(ufct, *vectors):
vs = np.ix_(*vectors)
r = ufct.identity
for v in vs:
print('r', r)
print('v', v)
r = ufct(r, v)
return r
and then use it as:
a, b, c
(array([2, 3, 4, 5]), array([8, 5, 4]), array([5, 4, 6, 8, 3]))
ufunc_reduce(np.add, a, b, c)
r 0 v [[[2]] [[3]] [[4]] [[5]]] r [[[2]] [[3]] [[4]] [[5]]] v [[[8] [5] [4]]] r [[[10] [ 7] [ 6]] [[11] [ 8] [ 7]] [[12] [ 9] [ 8]] [[13] [10] [ 9]]] v [[[5 4 6 8 3]]]
array([[[15, 14, 16, 18, 13], [12, 11, 13, 15, 10], [11, 10, 12, 14, 9]], [[16, 15, 17, 19, 14], [13, 12, 14, 16, 11], [12, 11, 13, 15, 10]], [[17, 16, 18, 20, 15], [14, 13, 15, 17, 12], [13, 12, 14, 16, 11]], [[18, 17, 19, 21, 16], [15, 14, 16, 18, 13], [14, 13, 15, 17, 12]]])
The advantage of this version of reduce
compared to the normal ufunc.reduce
is that it makes use of the Broadcasting Rules in order to avoid creating an argument array the size of the output times the number of vectors.
reduceのこのバージョンを通常の ufunc.reduce
と比較した場合の利点は、
出力サイズ×ベクトル数の引数配列の作成を回避するために Broadcasting Rules を用いている事にある。
See RecordArrays.