A brief introduction to Numpy¶

Numpy is the fundamental library for scientific computing in Python. It contains list like objects that work like arrays, matrices, and data tables. This is how scientists typically expect data to behave. Numpy also provides linear algebra, Fourier transforms, random number generation, and tools for integrating C/C++ and Fortran code.

If you primarily want to work with tables of data, Pandas, which depends on Numpy, is probably the module that you want to start with.

Numpy Array Basics¶

Creating a Numpy array¶

In [1]:

import numpy as np

example_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
example_array

Out[1]:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Indexing an array¶

In [2]:

example_array[1, 1]

Out[2]:

Slicing an array¶

In [3]:

example_array[:, 0]

Out[3]:

array([1, 4, 7])

In [4]:

example_array[1, :]

Out[4]:

array([4, 5, 6])

In [5]:

example_array[1:3, 1:3]

Out[5]:

array([[5, 6],
       [8, 9]])

Subsetting an array¶

In [6]:

array1 = np.array([1, 1, 1, 2, 2, 2, 1])
array2 = np.array([1, 2, 3, 4, 5, 6, 7])
array2[array1==1]

Out[6]:

array([1, 2, 3, 7])

In [7]:

array3 = np.array(['a', 'a', 'a', 'b', 'b', 'b', 'b'])
array2[(array1==1) & (array3=='a')]

Out[7]:

array([1, 2, 3])

Math¶

Arrays¶

Math on arrays is vectorized and behaves exactly like most scientists would expect

In [8]:

array1 = np.array([1, 1, 1, 2, 2, 2, 1])
array2 = np.array([1, 2, 3, 4, 5, 6, 7])

array1 * 2 + 1

Out[8]:

array([3, 3, 3, 5, 5, 5, 3])

In [9]:

array1 * array2

Out[9]:

array([ 1,  2,  3,  8, 10, 12,  7])

Linear algebra (matrices)¶

Linear algebra is done using a different data structure called a matrix.

In [10]:

matrix1 = np.matrix([[1, 2, 3], [4, 5, 6]])
matrix2 = np.matrix([1, 2, 3])
matrix1 * matrix2.transpose()

Out[10]:

matrix([[14],
        [32]])

Importing and Exporting Data¶

The numpy function genfromtxt is a powerful way to import text data. It can use different delimiters, skip header rows, control the type of imported data, give columns of data names, and a number of other useful goodies. See the documentation for a full list of features of run help(np.genfromtxt) from the Python shell (after importing the module of course).

Basic Import and Export¶

Import¶

Basic imports using Numpy will treat all data as floats. If we're doing a basic import we'll typically want to skip the header row (since it's generally not composed of numbers.

In [11]:

data = np.genfromtxt('./data/examp_data.txt', delimiter=',', skip_header=1)
data

Out[11]:

array([[ 1. ,  2. ,  3. ],
       [ 2. ,  2.4,  6. ],
       [ 3. ,  1.9,  8. ]])

Export¶

In [12]:

np.savetxt('./data/examp_output.txt', data, delimiter=',')

Importing Data Tables¶

Lots of scientific data comes in the form of tables, with one row per observation, and one column per thing observed. Often the different columns to have different types (including text). The best way to work with this type of data is in a Structured Array.

Import¶

To do this we let Numpy automatically detect the data types in each column using the optional argument dtype=None. We can also use an existing header row as the names for the columns using the optional arugment Names=True.

In [13]:

data = np.genfromtxt('./data/examp_data_species_mass.txt', dtype=None, names=True, delimiter=',')
data

Out[13]:

array([(1, 'DS', 125), (1, 'DM', 70), (2, 'DM', 55), (1, 'CB', 40),
       (2, 'DS', 110), (1, 'CB', 45)], 
      dtype=[('site', '<i8'), ('species', '|S2'), ('mass', '<i8')])

Export¶

The easiest way to export a structured array is to treat it like a list of lists and export it using the csv module using a function like this.

In [14]:

def export_to_csv(data, filename):
    outputfile = open(filename, 'wb')
    datawriter = csv.writer(outputfile)
    datawriter.writerows(data)
    outputfile.close()

Structured Arrays¶

If we import data into a Structured Array we can do a lot of things that we often want to do with scientific data.

Selecting columns by name¶

In [15]:

data = np.genfromtxt('./data/examp_data_species_mass.txt', dtype=None, names=True, delimiter=',')
print data
data['species']

[(1, 'DS', 125) (1, 'DM', 70) (2, 'DM', 55) (1, 'CB', 40) (2, 'DS', 110)
 (1, 'CB', 45)]

Out[15]:

array(['DS', 'DM', 'DM', 'CB', 'DS', 'CB'], 
      dtype='|S2')

Subset columns based on the values in other columns¶

In [16]:

data['mass'][data['species'] == 'DM']

Out[16]:

array([70, 55])

In [17]:

data['mass'][(data['species'] == 'DM') & (data['site'] == 1)]

Out[17]:

array([70])

Random number generation¶

Random uniform (0 to 1)¶

In [18]:

np.random.rand(3, 5)

Out[18]:

array([[ 0.03414585,  0.83900235,  0.93206285,  0.06820967,  0.70145045],
       [ 0.552352  ,  0.76730225,  0.06316622,  0.71285231,  0.81976971],
       [ 0.39709379,  0.71772434,  0.21598482,  0.96412023,  0.69841293]])

Random normal (mean=0, stdev=1)¶

In [19]:

np.random.randn(4, 2)

Out[19]:

array([[ 0.04802043, -0.89025722],
       [ 0.46246887,  1.11994326],
       [-0.95655129,  0.76707094],
       [-1.61019706, -0.21933367]])

Random integers¶

In [20]:

min = 10
max = 20
np.random.randint(min, max, [10, 2])

Out[20]:

array([[13, 10],
       [18, 10],
       [10, 12],
       [10, 16],
       [17, 12],
       [17, 13],
       [19, 16],
       [11, 17],
       [11, 16],
       [16, 18]])