In [3]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


# Introduction to NumPy Arrays¶

J.R. Johansson and P.D. Nation

## Introduction¶

Until now we have been using lists as a way of storing multiple elements together. However, when doing numerical computations, lists are not very good. For example, what if I wanted to add one to a list of numbers? We couldn't write

In [5]:
a=[1,2,3]
a=a+1

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-d80cb8f6be3a> in <module>()
1 a=[1,2,3]
----> 2 a=a+1

TypeError: can only concatenate list (not "int") to list

Instead we would have to do

In [6]:
for k in range(3):
a[k]=a[k]+1


Working with lists would quickly become very complicated if we wanted to do numerical operations on many elements at the same time, or if, for example, we want to be able to construct vectors and matrices in our programs. All of these features, and more, come with using NumPy arrays as our preferred data structure.

## NumPy Arrays¶

When dealing with numerical data in Python, nearly 100% of the time one uses arrays from the NumPy module to store and manipulate data. NumPy arrays are very similar to Python lists, but are actually arrays in c-code that allow for very fast multi-dimensional numerical, vector, matrix, and linear algebra operations. Using arrays with slicing, and vectorization leads to very fast Python code, and can replace many of the for-loops that you would have use if you coded a problem using lists. As a general rule, minimizing the number of for-loops maximizes the performance of your code. To start using arrays, we can start with a simple list and use it as an argument to the array function

In [7]:
from numpy import *
a=array([1,2,3,4,5,6])
print(a)

[1 2 3 4 5 6]


We have now created our first array of integers. Notice how, when using print, the array looks the same as a list, however it is very much a different data structure. We can also create an array of floats, complex numbers or even strings

In [8]:
a=array([2.0,4.0,8.0,16.0])
b=array([0,1+0j,1+1j,2-2j])
c=array(['a','b','c','d'])
print(a)
print(b)
print(c)

[  2.   4.   8.  16.]
[ 0.+0.j  1.+0.j  1.+1.j  2.-2.j]
['a' 'b' 'c' 'd']


In general there are three different ways of creating arrays in Python:

• First create a list and then call the array function using the list as an input argument.

• Use NumPy functions that are designed to create arrays: zeros, ones, arange, linspace.

• Import data into Python from file.

### Arrays from Lists¶

We have already seen how to create arrays with simple lists, but now lets look at how to create more complicated lists that we can turn into arrays. A short way of creating a list, say from 0 to 9 is as follows:

In [9]:
output=[n for n in range(10)]
print(output)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


This code is doing the exact same thing as the longer expression

In [10]:
output=[]
for n in range(10):
output.append(n)
print(output)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


We could turn this into an array quite easy

In [11]:
array(output)

Out[11]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Or, we can save even more space and create the list inside of the array function:

In [12]:
array([n for n in range(10)])

Out[12]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

This can also be used to create more complicated arrays

In [13]:
array([2.0*k**0.563 for k in range(0,10,2)])

Out[13]:
array([ 0.        ,  2.95467613,  4.36505551,  5.48440035,  6.44866265])

NumPy has several extremely important array creation functions that will make you life much easier. For example, creating arrays of all zeros or ones is trivial.

In [14]:
zeros(5)

Out[14]:
array([ 0.,  0.,  0.,  0.,  0.])
In [15]:
ones(10)

Out[15]:
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

However, the most useful functions are arange which generates evenly spaced values within a given interval in a similar way that the range function did, and linspace that makes a linear array of points from a starting to an ending value.

In [16]:
arange(5)

Out[16]:
array([0, 1, 2, 3, 4])
In [17]:
arange(0,10,2)

Out[17]:
array([0, 2, 4, 6, 8])
In [18]:
linspace(0,10,20) #makes an array of 20 points linearly spaced from 0 to 10

Out[18]:
array([  0.        ,   0.52631579,   1.05263158,   1.57894737,
2.10526316,   2.63157895,   3.15789474,   3.68421053,
4.21052632,   4.73684211,   5.26315789,   5.78947368,
6.31578947,   6.84210526,   7.36842105,   7.89473684,
8.42105263,   8.94736842,   9.47368421,  10.        ])
In [19]:
linspace(-5,5,15) #15 points in range from -5 to 5

Out[19]:
array([-5.        , -4.28571429, -3.57142857, -2.85714286, -2.14285714,
-1.42857143, -0.71428571,  0.        ,  0.71428571,  1.42857143,
2.14285714,  2.85714286,  3.57142857,  4.28571429,  5.        ])

## Differences Between Arrays and Lists¶

Having played with arrays a bit, it is now time to explain the main differences between NumPy arrays and Python lists.

Python lists are very general and can hold any combination of data types. However, NumPy arrays can only hold one type of data (integers, floats, strings, complex). If we try to combine different types of data, then the array function will upcast the data in the array such that it all has the same type

In [20]:
array([1,2,3.14]) # [int,int,float] -> [float,float,float]

Out[20]:
array([ 1.  ,  2.  ,  3.14])

Upcasting between integers and floats does not cause too much trouble, but mixing strings and numbers in an array can create problems

In [21]:
array([1.0,1+1j,'hello']) #array data is upcast to strings

Out[21]:
array(['1.0', '(1+1j)', 'hello'],
dtype='<U64')

If we want, we can manually change the type of the data inside the array using the dtype ("data type") keyword argument. Frequently used dtypes are: int, float, complex, bool, str, object, etc. For example, to convert a list of integers to floats we can write

In [22]:
array([1,2,3,4,5],dtype=float)

Out[22]:
array([ 1.,  2.,  3.,  4.,  5.])
In [23]:
arange(2,10,2,dtype=complex)

Out[23]:
array([ 2.+0.j,  4.+0.j,  6.+0.j,  8.+0.j])
In [24]:
array([k for k in range(10)],dtype=str)

Out[24]:
array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
dtype='<U1')

Unlike Python lists, we can not remove or add elements to an array once it has been created. Therefore, we must know the size of the array before creating it.

Because arrays hold only one type of data, mathematical functions such as multiplication and addition of arrays can be implemented in at the c-code level. This means that these kinds of operations are very fast and memory efficient. The mathematical operations on arrays are performed elementwise, which means that each element gets acted on in the same way. This is an example of vectorization. For example:

In [25]:
a=array([1,2,3,4])
5.0*a #This gets upcasted because 5.0 is a float

Out[25]:
array([  5.,  10.,  15.,  20.])
In [26]:
5*a**2-4

Out[26]:
array([ 1, 16, 41, 76])

Recall that none of these operations worked on Python lists.

## Using NumPy Functions on Arrays¶

Remember that NumPy has a large builtin collection of mathematical functions. When using NumPy arrays as our data structure, these functions become even more powerful as we can apply the same function elementwise over the entire array very quickly. Again, this is called vectorization and can speed up your code by many times.

In [27]:
x=linspace(-pi,pi,10)
sin(x)

Out[27]:
array([ -1.22464680e-16,  -6.42787610e-01,  -9.84807753e-01,
-8.66025404e-01,  -3.42020143e-01,   3.42020143e-01,
8.66025404e-01,   9.84807753e-01,   6.42787610e-01,
1.22464680e-16])
In [28]:
x=array([x**2 for x in range(4)])
sqrt(x)

Out[28]:
array([ 0.,  1.,  2.,  3.])
In [29]:
x=array([2*n+1 for n in range(10)])
sum(x) #sums up all elements in the array

Out[29]:
100

## Boolean Operations on Arrays¶

Like other mathematical functions, we can also use conditional statements on arrays to check whether each individual element satisfies a given expression. For example, to find the location of array elements that are less than zero we could do

In [30]:
a=array([0,-1,2,-3,4])
print(a<0)

[False  True False  True False]


The result in another array of boolean (True/False) values indicating whether a given element is less than zero. Or, for example, we can find all of the odd numbers in an array.

In [31]:
a=arange(10)
print((mod(a,2)!=0))

[False  True False  True False  True False  True False  True]


## Slicing NumPy Arrays¶

Just like lists, arrays can be sliced to get certain elements of the array, or to modify certain elements of the array. For example, lets try to get every third element from a given array

In [32]:
a=arange(20)
a[3::3]

Out[32]:
array([ 3,  6,  9, 12, 15, 18])

Now lets set each of these elements equal to -1.

In [33]:
a[3::3]=-1
print(a)

[ 0  1  2 -1  4  5 -1  7  8 -1 10 11 -1 13 14 -1 16 17 -1 19]


We can also slice the array so that it returns the original array in reverse

In [34]:
a=arange(10)
a[::-1]

Out[34]:
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

Finally, what if we want to get only those elements in the array that satisfy a certain conditional statement? Recall that conditional statements on an array return another array of boolean values. We can use this boolean array as an index to pick out only those elements where the boolean value is True.

In [35]:
a=linspace(-10,10,20)
print(a[a<=-5])

[-10.          -8.94736842  -7.89473684  -6.84210526  -5.78947368]


We must be careful though. Checking for multiple conditionals is not allowed

In [36]:
print(a[-8<a<=-5])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-36-76eb4cd96a9b> in <module>()
----> 1 print(a[-8<a<=-5])

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The reason for this error is because the computer does not know how to take an array of many True/False values and return just a single value.

## Example: Rewriting Sieve of Eratosthenes¶

Here we will replace most of the for-loops used when writing the Sieve of Eratosthenes using lists will arrays. This will make the code much easier to read and actually much faster for computing large prime numbers. The main part of the original code is:

In [42]:
N=20
# generate a list from 2->N
numbers = []
for i in range(2,N+1):                 # This can be replaced by array
numbers.append(i)
# Run Seive of Eratosthenes algorithm marking nodes with -1
for j in range(N-1):
if numbers[j]!=-1:
p=numbers[j]
for k in range(j+p,N-1,p):     # This can be replaced by array
numbers[k]=-1
# Collect all elements not -1 (these are the primes)
primes = []
for i in range(N-1):                   # This can be replaced by array
if numbers[i]!=-1:
primes.append(numbers[i])
print(primes)

[2, 3, 5, 7, 11, 13, 17, 19]


Using arrays instead of lists simplifies the code:

In [41]:
N=20
# generate a list from 2->N
numbers=arange(2,N+1)                 # replaced for-loop with call to arange
# Run Seive of Eratosthenes algorithm
# by marking nodes with -1
for j in range(N-1):
if numbers[j]!=-1:
p=numbers[j]
numbers[j+p:N-1:p]=-1         # replaced for-loop by slicing array
# Collect all elements not -1 (these are the primes)
primes=numbers[numbers!=-1]           # Used conditional statement to get elements !=-1
print(primes)

[ 2  3  5  7 11 13 17 19]


# End of Tutorial

from IPython.core.display import HTML