A Quick Guide to Numpy

At the top of your python code, load Numpy like this:

import numpy as np

Now you can use numpy throughout your file via the np object.

{tip}
1. Jupyter Lab's "Help" menu has a link to Numpy documentation.
2. You can access an element of a numpy array just like a list: 

    ```py
    x=np.arange(1,5,1)
    x[1]
If the array is a matrix, `x[row,col]` works.

  1. Whirlwind has a more comprehensive dive into splitting, slicing, and other numpy operations. ```

Common methods:

{tip}
Copy this table into your cookbook notes folder.
Function Description

np.array([user defined list, or lists of lists]) | creates an array or matrix np.ones(how many) and np.ones([rows,cols]) | same but all elements are 1 np.zeros(how many) and np.zeros([rows,cols]) | same but all elements are 0 np.arange(start,end,stepsize) | creates array, note that the array will not include any elements >=end np.linspace(from,to,# of elements) | creates array covering the range specified np.eye(#) | creates an identity matrix of size # np.concatenate([x, y]) | combines arrays x and y np.nan | is a NaN object (e.g. like a missing element in a data table)

We will definitely use this in pandas np.ceil(#), np.floor(#) | if #=3.4, ceil will return 4, and floor will return 3. np.max(x), np.min(x), np.average(x), np.median(x) | many statistical operations work as you would expect np.reshape(x,[rows,cols]) | works like it looks np.random.<dist> | can draw random numbers from many distributions

use tab autocompletion to see all the options (type np.random. and then hit TAB)

YOU MUST NEVER EVER EVER EVER EVER DRAW RANDOM NUMBERS WITHOUT SETTING A SEED!!!

A warning about "random" numbers

{warning}
Let me repeat that: **YOU MUST NEVER EVER EVER EVER EVER DRAW RANDOM NUMBERS WITHOUT SETTING A SEED!!!**    

If you don't, your code will produce different outputs every single time you run it. And other people will get different answers too! 

And the point of code is that it is reproducible.
In [1]:
import numpy as np
np.set_printoptions(2) # just to control # of decimal places shown

np.random.seed(0) # this is how you set a seed
print("original random draw:    ",np.random.rand(4))
print("now it's different:      ",np.random.rand(4))
print("now it's different:      ",np.random.rand(4))
np.random.seed(0)
print("now it's the same again: ",np.random.rand(4))
original random draw:     [0.55 0.72 0.6  0.54]
now it's different:       [0.42 0.65 0.44 0.89]
now it's different:       [0.96 0.38 0.79 0.53]
now it's the same again:  [0.55 0.72 0.6  0.54]

Using Numpy within Pandas

Because pandas is built on top of numpy, all of these numpy functions work on pandas objects.

Numpy 🤝 Pandas

The dark side of vectors and numpy

  1. You can't vectorize every operation :(
  2. Numpy is a great solution for the issue of speed, but not for the issue of memory.

Numpy can be prohibitive, memory-wise: When you run an array operation, Python creates the entire array and puts it into memory, then runs it. A vector of length 1,000,000,000,000 is huge and requires substantial memory to create. By contrast, you can execute for i in range(1,000,000,000,000): pass without causing an issue, because Python never created that vector, it just iterated over numbers. This is because range(#) is a "generator" and not an explicit object.