Numpy Basic Concepts

In [1]:
import addutils.toc ; addutils.toc.js(ipy_notebook=True)
Out[1]:
In [2]:
from addutils import css_notebook
css_notebook()
Out[2]:

1 What is Numpy ?

NumPy is the fundamental package for scientific computing with Python. It is:

  • a powerful Python extension for N-dimensional array
  • a tool for integrating C/C++ and Fortran code
  • designed for scientific computation: linear algebra and Signal Analysis

If you are a MATLAB® user we recommend to read Numpy for MATLAB Users and Benefit of Open Source Python versus commercial packages. For an idea of the Open Source Approach to science, we suggest the Science Code Manifesto

1.1 Documentation and reference:

Lets start by checking the Numpy version used in this Notebook:

In [3]:
import numpy as np
print ('numpy version: ', np.__version__)
numpy version:  1.14.0

2 Array Creation

NumPy's main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type. In Numpy dimensions are called axes. The number of axes is called rank. The most important attributes of an ndarray object are:

  • ndarray.ndim - the number of axes (dimensions) of the array.
  • ndarray.shape - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m).
  • ndarray.size - the total number of elements of the array.
  • ndarray.dtype - numpy.int32, numpy.int16, and numpy.float64 are some examples.
  • ndarray.itemsize - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8)
In [4]:
a = np.array([[0,1,2,3], [4,5,6,7], [8,9,10,11]])
rows, cols = np.shape(a)
print ('Rows:{0:03d} ; Cols:{0:03d}'.format(rows, cols))
Rows:003 ; Cols:003

Try by yourself the following commands (type or paste the commands in the cell below):

a.ndim                  # Number of dimensions
print a.dtype.name      # Type of data
a.itemsize              # Size in bytes of elements
a.size                  # Number of elements in the array
In [5]:
a.ndim
Out[5]:
2

The type of the array can be specified at creation time:

In [6]:
b = np.array([[2,3], [6,7]], dtype=np.complex64)
print (b)
[[2.+0.j 3.+0.j]
 [6.+0.j 7.+0.j]]

2.1 Array creation functions

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content.

The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.
Try by yourself the following commands:

zeros((3,4))
ones((3,4))
empty((2,3))
eye(3)
diag(np.arange(5))
np.tile(np.array([[6, 7], [8, 9]]), (2, 2))
In [7]:
np.zeros((3,4))
Out[7]:
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

zeros_like, ones_like and empty_like can be used to create arrays of the same type of a given one

In [8]:
np.zeros_like(b)
Out[8]:
array([[0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j]], dtype=complex64)

2.2 Sequences and reshaping

Arrays can be created with linspace, logspace (returning evenly spaced numbers, linear or logarithmic) or arange and then shaped in matrix form. mgrid is like the equivaled "meshgrid" in MATLAB.

In [9]:
np.logspace(1,5,3)
Out[9]:
array([1.e+01, 1.e+03, 1.e+05])
In [10]:
x = np.arange(4).reshape(2,2)
In [11]:
# Use List comprehention to create a matrix
c = np.array([[10*j+i for i in range(3)] for j in range(4)])
print (c)
[[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]]

Use 'newaxis' to add a dimension (as for turning a row vector in a column vector):

In [12]:
d = np.linspace(0, 12, 5)
print (d)
print (d[:, np.newaxis])       # make into a column vector
[ 0.  3.  6.  9. 12.]
[[ 0.]
 [ 3.]
 [ 6.]
 [ 9.]
 [12.]]
In [13]:
X, Y = np.mgrid[0:5, 0:5] # similar to meshgrid in MATLAB
X
Out[13]:
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])
In [14]:
Y
Out[14]:
array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

2.3 Sparse Matrices

We can create and manipulate sparse matrices as follows:

In [15]:
from scipy import sparse
X = np.random.random((5, 6)) # Create an array with many zeros
X[X < 0.85] = 0
print (X)
X_csr = sparse.csr_matrix(X) # turn X into a csr (Compressed-Sparse-Row) matrix
print (X_csr)
[[0.         0.98504109 0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.        ]
 [0.9162166  0.98541899 0.         0.         0.         0.92056172]
 [0.         0.         0.         0.         0.         0.        ]
 [0.96641771 0.         0.         0.85129445 0.         0.        ]]
  (0, 1)	0.9850410857596115
  (2, 0)	0.9162165982459893
  (2, 1)	0.9854189872420578
  (2, 5)	0.9205617206831753
  (4, 0)	0.9664177063354042
  (4, 3)	0.8512944469813816
In [16]:
print (X_csr.toarray())       # convert back to a dense array
[[0.         0.98504109 0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.        ]
 [0.9162166  0.98541899 0.         0.         0.         0.92056172]
 [0.         0.         0.         0.         0.         0.        ]
 [0.96641771 0.         0.         0.85129445 0.         0.        ]]

There are several other sparse formats that can be useful for various problems:

  • CSC (compressed sparse column)
  • BSR (block sparse row)
  • COO (coordinate)
  • DIA (diagonal)
  • DOK (dictionary of keys)

The scipy.sparse submodule also has a lot of functions for sparse matrices including linear algebra, sparse solvers, graph algorithms, and much more.

2.4 Random Numbers

In [17]:
np.random.rand(4,5) # uniform random numbers in [0,1]
Out[17]:
array([[0.00951438, 0.67024968, 0.31266326, 0.24734842, 0.88910001],
       [0.13621409, 0.07298403, 0.40775148, 0.3479568 , 0.77568373],
       [0.0965207 , 0.45848056, 0.92937254, 0.36836279, 0.08428623],
       [0.80448692, 0.62520354, 0.82029522, 0.90013625, 0.61436391]])
In [18]:
np.random.randn(4,5) # standard normal distributed random numbers
Out[18]:
array([[ 0.96330277, -1.03009588, -0.33519208, -0.57506637,  1.58926967],
       [ 0.03670511, -0.0458585 ,  0.44499843, -1.26472212,  0.68910083],
       [ 1.16036821, -0.26354943,  0.64565866,  1.48429238, -1.94999384],
       [-0.41986821,  1.70296555,  0.39815986, -0.07464821, -0.1471213 ]])

2.5 Casting

Forced casts:

In [19]:
a = np.array([1.7, 1.2, 1.6])
b = a.astype(int)           # <-- truncates to integer
b
Out[19]:
array([1, 1, 1])

Rounding:

In [20]:
a = np.array([1.2, 1.5, 1.6, 2.5, 3.5, 4.5])
b = np.around(a)
print (b)                     # still floating-point
c = np.around(a).astype(int)
print (c)
[1. 2. 2. 2. 4. 4.]
[1 2 2 2 4 4]

3 Basic Visualization with Bokeh

In [21]:
import bokeh.plotting as bk
bk.output_notebook()
Loading BokehJS ...

3.1 D Plotting

In [22]:
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
fig = bk.figure()
fig.line(x, y)
fig.circle(x, y, color='green')
bk.show(fig)

Somethimes, if the code is not relevant for the topic we are developing, we can put the code in a separate file and run it from the Notebook:

In [23]:
from utilities import plot_01
bk.show(plot_01())

At any time if you want to see the code, you can load it and run it directly from the cell in the Notebook:

In [24]:
# %load utilities/plot_utilities.py
import numpy as np
import bokeh.plotting as bk
bk.output_notebook()
    
def plot_01():
    image = np.random.randn(120, 120)
    fig = bk.figure()
    fig.image([image], x=[0], y=[0], dw=[1], dh=[1], palette='OrRd9')
    return fig

if __name__ == '__main__':
    fig = plot_01()
    bk.show(fig)
Loading BokehJS ...

4 Basic Linear Algebra

In [25]:
# Transpose
print (x.T)
[0.         0.15789474 0.31578947 0.47368421 0.63157895 0.78947368
 0.94736842 1.10526316 1.26315789 1.42105263 1.57894737 1.73684211
 1.89473684 2.05263158 2.21052632 2.36842105 2.52631579 2.68421053
 2.84210526 3.        ]

Explore the available commands for numpy arrays (press x.+TAB) Try by yourself:

x.min()
x.max()
x.mean()
x.cumsum()
In [26]:
x.min()
Out[26]:
0.0
In [27]:
print (x*5)         # Scalar expansion
print (x+3)
[ 0.          0.78947368  1.57894737  2.36842105  3.15789474  3.94736842
  4.73684211  5.52631579  6.31578947  7.10526316  7.89473684  8.68421053
  9.47368421 10.26315789 11.05263158 11.84210526 12.63157895 13.42105263
 14.21052632 15.        ]
[3.         3.15789474 3.31578947 3.47368421 3.63157895 3.78947368
 3.94736842 4.10526316 4.26315789 4.42105263 4.57894737 4.73684211
 4.89473684 5.05263158 5.21052632 5.36842105 5.52631579 5.68421053
 5.84210526 6.        ]
In [28]:
print (x*x.T)       # Elementwise product
print (np.dot(x,x.T))  # Dot (matrix) product
[0.         0.02493075 0.09972299 0.22437673 0.39889197 0.6232687
 0.89750693 1.22160665 1.59556787 2.01939058 2.49307479 3.0166205
 3.5900277  4.2132964  4.88642659 5.60941828 6.38227147 7.20498615
 8.07756233 9.        ]
61.578947368421055

4.1 Determinant of a square matrix

The scipy.linalg.det() function computes the determinant of a square matrix:

In [29]:
from scipy import linalg
arr = np.array([[1, 2],
               [3, 4]])
linalg.det(arr)
Out[29]:
-2.0

4.2 Inverse of a square matrix

The scipy.linalg.inv() function computes the inverse of a square matrix:

In [30]:
print (linalg.inv(arr))
[[-2.   1. ]
 [ 1.5 -0.5]]

4.3 Advanced Linear Algebra

In Scipy many advanced operations are available (check the Scipy Reference), for example singular-value decomposition (SVD):

In [31]:
arr = np.arange(9).reshape((3, 3)) + np.diag([1, 0, 1])
uarr, spec, vharr = linalg.svd(arr)

The resulting array spectrum is:

In [32]:
spec
Out[32]:
array([14.88982544,  0.45294236,  0.29654967])

5 Slicing -Indexing for MATLAB® Users-

For MATLAB® users: in Python, like many other languages, indexing start from zero and not from one like MATLAB.

Remember: slices (indexed subarrays) are references to memory in the original array, this means that if you modify a slice, you modify the original array. In other words a slice is a pointer to the original array.

In [33]:
b = np.arange(8).reshape(2,4)
print (b)
[[0 1 2 3]
 [4 5 6 7]]

5.1 Indexing single elements

Try by yourself:

print b[0,0]
print b[-1,-1]   # Last element
print b[:,1]     # column number 1 (second column)
In [34]:
# Indexing single elements

Figure 01

5.2 Indexing by rows and columns

In [35]:
# With reference to Figure 01:
a = np.array([[10*j+i for i in range(6)] for j in range(6)])

Try by yourself:

print a[0,3:5]     # Orange
print a[4:,4:]     # Blue
print a[:, 2]      # Red
print a[2::2, ::2] # Green
In [36]:
#Indexing multiple elements

To replicate an array use 'copy':

In [37]:
c = np.array(a, copy=True)

6 File Input / Output

Numpy has special functions for:

  • Load/Save text files: numpy.loadtxt()/numpy.savetxt()
  • Clever loading of text/csv files: numpy.genfromtxt()/numpy.recfromcsv()
  • Fast and efficient, but numpy-specific, binary format: numpy.save()/numpy.load()

In particular Numpy can load and save native MATLAB® files:

In [38]:
from scipy import io as spio
spio.savemat('temp/test.mat', {'c': c}, oned_as='row') # savemat expects a dictionary
data = spio.loadmat('temp/test.mat')
data['c']
Out[38]:
array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

Visit www.add-for.com for more tutorials and updates.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.