#!/usr/bin/env python # coding: utf-8 # # The Basics of NumPy Arrays # Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas ([Part 3](03.00-Introduction-to-Pandas.ipynb)) are built around the NumPy array. # This chapter will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays. # While the types of operations shown here may seem a bit dry and pedantic, they comprise the building blocks of many other examples used throughout the book. # Get to know them well! # # We'll cover a few categories of basic array manipulations here: # # - *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays # - *Indexing of arrays*: Getting and setting the values of individual array elements # - *Slicing of arrays*: Getting and setting smaller subarrays within a larger array # - *Reshaping of arrays*: Changing the shape of a given array # - *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many # ## NumPy Array Attributes # First let's discuss some useful array attributes. # We'll start by defining random arrays of one, two, and three dimensions. # We'll use NumPy's random number generator, which we will *seed* with a set value in order to ensure that the same random arrays are generated each time this code is run: # In[1]: import numpy as np rng = np.random.default_rng(seed=1701) # seed for reproducibility x1 = rng.integers(10, size=6) # one-dimensional array x2 = rng.integers(10, size=(3, 4)) # two-dimensional array x3 = rng.integers(10, size=(3, 4, 5)) # three-dimensional array # Each array has attributes including `ndim` (the number of dimensions), `shape` (the size of each dimension), `size` (the total size of the array), and `dtype` (the type of each element): # In[2]: print("x3 ndim: ", x3.ndim) print("x3 shape:", x3.shape) print("x3 size: ", x3.size) print("dtype: ", x3.dtype) # For more discussion of data types, see [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb). # ## Array Indexing: Accessing Single Elements # If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar. # In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists: # In[3]: x1 # In[4]: x1[0] # In[5]: x1[4] # To index from the end of the array, you can use negative indices: # In[6]: x1[-1] # In[7]: x1[-2] # In a multidimensional array, items can be accessed using a comma-separated `(row, column)` tuple: # In[8]: x2 # In[9]: x2[0, 0] # In[10]: x2[2, 0] # In[11]: x2[2, -1] # Values can also be modified using any of the preceding index notation: # In[12]: x2[0, 0] = 12 x2 # Keep in mind that, unlike Python lists, NumPy arrays have a fixed type. # This means, for example, that if you attempt to insert a floating-point value into an integer array, the value will be silently truncated. Don't be caught unaware by this behavior! # In[13]: x1[0] = 3.14159 # this will be truncated! x1 # ## Array Slicing: Accessing Subarrays # Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (`:`) character. # The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array `x`, use this: # ``` python # x[start:stop:step] # ``` # If any of these are unspecified, they default to the values `start=0`, `stop=`, `step=1`. # Let's look at some examples of accessing subarrays in one dimension and in multiple dimensions. # ### One-Dimensional Subarrays # # Here are some examples of accessing elements in one-dimensional subarrays: # In[14]: x1 # In[15]: x1[:3] # first three elements # In[16]: x1[3:] # elements after index 3 # In[17]: x1[1:4] # middle subarray # In[18]: x1[::2] # every second element # In[19]: x1[1::2] # every second element, starting at index 1 # A potentially confusing case is when the `step` value is negative. # In this case, the defaults for `start` and `stop` are swapped. # This becomes a convenient way to reverse an array: # In[20]: x1[::-1] # all elements, reversed # In[21]: x1[4::-2] # every second element from index 4, reversed # ### Multidimensional Subarrays # # Multidimensional slices work in the same way, with multiple slices separated by commas. # For example: # In[22]: x2 # In[23]: x2[:2, :3] # first two rows & three columns # In[24]: x2[:3, ::2] # three rows, every second column # In[25]: x2[::-1, ::-1] # all rows & columns, reversed # #### Accessing array rows and columns # # One commonly needed routine is accessing single rows or columns of an array. # This can be done by combining indexing and slicing, using an empty slice marked by a single colon (`:`): # In[26]: x2[:, 0] # first column of x2 # In[27]: x2[0, :] # first row of x2 # In the case of row access, the empty slice can be omitted for a more compact syntax: # In[28]: x2[0] # equivalent to x2[0, :] # ### Subarrays as No-Copy Views # # Unlike Python list slices, NumPy array slices are returned as *views* rather than *copies* of the array data. # Consider our two-dimensional array from before: # In[29]: print(x2) # Let's extract a $2 \times 2$ subarray from this: # In[30]: x2_sub = x2[:2, :2] print(x2_sub) # Now if we modify this subarray, we'll see that the original array is changed! Observe: # In[31]: x2_sub[0, 0] = 99 print(x2_sub) # In[32]: print(x2) # Some users may find this surprising, but it can be advantageous: for example, when working with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer. # ### Creating Copies of Arrays # # Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the `copy` method: # In[33]: x2_sub_copy = x2[:2, :2].copy() print(x2_sub_copy) # If we now modify this subarray, the original array is not touched: # In[34]: x2_sub_copy[0, 0] = 42 print(x2_sub_copy) # In[35]: print(x2) # ## Reshaping of Arrays # # Another useful type of operation is reshaping of arrays, which can be done with the `reshape` method. # For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following: # In[36]: grid = np.arange(1, 10).reshape(3, 3) print(grid) # Note that for this to work, the size of the initial array must match the size of the reshaped array, and in most cases the `reshape` method will return a no-copy view of the initial array. # A common reshaping operation is converting a one-dimensional array into a two-dimensional row or column matrix: # In[37]: x = np.array([1, 2, 3]) x.reshape((1, 3)) # row vector via reshape # In[38]: x.reshape((3, 1)) # column vector via reshape # A convenient shorthand for this is to use `np.newaxis` in the slicing syntax: # In[39]: x[np.newaxis, :] # row vector via newaxis # In[40]: x[:, np.newaxis] # column vector via newaxis # This is a pattern that we will utilize often throughout the remainder of the book. # ## Array Concatenation and Splitting # # All of the preceding routines worked on single arrays. NumPy also provides tools to combine multiple arrays into one, and to conversely split a single array into multiple arrays. # ### Concatenation of Arrays # # Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines `np.concatenate`, `np.vstack`, and `np.hstack`. # `np.concatenate` takes a tuple or list of arrays as its first argument, as you can see here: # In[41]: x = np.array([1, 2, 3]) y = np.array([3, 2, 1]) np.concatenate([x, y]) # You can also concatenate more than two arrays at once: # In[42]: z = np.array([99, 99, 99]) print(np.concatenate([x, y, z])) # And it can be used for two-dimensional arrays: # In[43]: grid = np.array([[1, 2, 3], [4, 5, 6]]) # In[44]: # concatenate along the first axis np.concatenate([grid, grid]) # In[45]: # concatenate along the second axis (zero-indexed) np.concatenate([grid, grid], axis=1) # For working with arrays of mixed dimensions, it can be clearer to use the `np.vstack` (vertical stack) and `np.hstack` (horizontal stack) functions: # In[46]: # vertically stack the arrays np.vstack([x, grid]) # In[47]: # horizontally stack the arrays y = np.array([[99], [99]]) np.hstack([grid, y]) # Similarly, for higher-dimensional arrays, `np.dstack` will stack arrays along the third axis. # ### Splitting of Arrays # # The opposite of concatenation is splitting, which is implemented by the functions `np.split`, `np.hsplit`, and `np.vsplit`. For each of these, we can pass a list of indices giving the split points: # In[48]: x = [1, 2, 3, 99, 99, 3, 2, 1] x1, x2, x3 = np.split(x, [3, 5]) print(x1, x2, x3) # Notice that *N* split points leads to *N* + 1 subarrays. # The related functions `np.hsplit` and `np.vsplit` are similar: # In[49]: grid = np.arange(16).reshape((4, 4)) grid # In[50]: upper, lower = np.vsplit(grid, [2]) print(upper) print(lower) # In[51]: left, right = np.hsplit(grid, [2]) print(left) print(right) # Similarly, for higher-dimensional arrays, `np.dsplit` will split arrays along the third axis.